CSV Ingest

For some clients, the most efficient way to ingest large content libraries into DSP will be through CSV Ingest. This method uses multiple CSV spreadsheets to populate a dashboard with videos, channels, and associated taxonomy.  The spreadsheet must follow dotstudioPRO’s specifications and follow strict formatting rules, but the system is an efficient method to ingest multiple delivery components and automatically associate title taxonomy on a regular basis.

This article outlines the onboarding requirements, general function, and user experience of CSV ingestion for dotstudioPRO clients.

Overview

Who is it best for?

CSV ingest is best for clients who will be receiving content from multiple sources/content owners and are not capable of generating their own MRSS feeds. Clients who are developing their own content platform may wish to use CSV ingest. The infrastructure requirements for CSV ingest are somewhat steep, as content validation must be managed by the client, and the client must have access to their own Amazon S3 storage bucket.

Advantages

  • Volume: CSV ingest allows multiple files to be delivered simultaneously.
  • Automation: Once connected, ingest can be regularly scheduled to pull all newly added content.
  • Management: CSV ingest works very well with Amazon Transfer Family. Users can have their partner studios deliver to individual subfolders in an S3 bucket, but ingest into the same dashboard.
  • Time: Metadata, images, timed-text, and taxonomy can all be ingested, so videos arrive in the dotstudioPRO dashboard as complete packages.

Disadvantages

  • Infrastructure: Clients are responsible for setting up their S3 infrastructure and managing how content partners deliver to their S3 network. 
  • Setup: Some development may be necessary from the client’s end to generate CSV documents to dotstudioPRO’s spec. 
  • Maintenance: Content and document validation, S3 network and permissions, and S3 workflows are all managed by the client and may require development to streamline.

How does it work?

 

image5.png

 

DSP Clients will provide access to an S3 environment containing media files (videos, captions, images) and CSV metadata manifests which conform to dotstudioPRO’s specifications. DSP will authenticate into the client’s bucket and automatically ingest each new video and channel on a regular interval. Videos, playlists, and channels will automatically be created for each new line item in the CSV’s.

 

Onboarding Requirements

To onboard for CSV ingest, clients will need to provide DSP access to an Amazon S3 bucket. At minimum, the bucket must include the following. These terms are defined later in this article.

DSP will need access to the client's S3 bucket. The DSP user credentials must have the ability to create and delete files and folders, including temporary URL's. Including the following information:

  • Bucket Name
  • Region
  • Key
  • Secret
  • The ingest path OR the filename & path to the Map Document 

Definitions

This section outlines important definitions for understanding the CSV Ingest:

The Storage Path OR Map Document

The location that files will be ingested from. If all CSV's will be stored in a single location, only the path to that location must be provided. This can be entered in the CSV Ingest options in the user's dashboard.
If the user intends to run ingest from multiple locations, a Map Document will need to be provided that points to each of the ingest locations. An example use case would be a client who works with multiple content owners, and each content owner delivers to a different location in the client's S3 bucket.

The Map Document

A CSV document containing URL elements, which directs DSP to the subfolders containing "CSV Manifests". The map document allows DSP to read from multiple separate folders and is ideal for clients who receive deliveries from multiple disparate third parties.

Each client must only provide one map document, and it should be stored within their S3 bucket.

The Map Document explicitly lists the location of a client’s “Manifest Subfolder(s)”. It is possible to use multiple Manifest Subfolders to correspond to different content providers. Most clients will only need to provide a Map Document which points to a single Manifest Subfolder.

Download Sample Map Document

NOTE: This document is optional. If the client will only ingest CSV's from one location, they can provide the path directly in the dashboard.

 

Manifest Subfolder(s)

Manifest subfolders are subfolders within a client’s S3 bucket where all ingestible CSV Manifests will be stored. When DSP runs an ingest, each Manifest Subfolder will be probed for any CSV documents. Each CSV will then be parsed and (if the document is to spec) the metadata and media files will be ingested into the dashboard.

Screen_Shot_2022-07-07_at_2.08.15_PM.png

Example: In the image above, the folder path Disney/CSVs/ represents a Manifest Subfolder.

  • To ingest from multiple locations, each Manifest Subfolder must be listed in the Map Document. A Manifest Subfolder may use any name, provided the correct name and location is listed in the Map Document.
  • To ingest from a single location, the path can be provided directly in the dotstudioPRO dashboard.

Manifest Subfolder names should not use spaces or special characters; they must correspond exactly to the path/names specified in the map document.

 

CSV Manifests

CSV Manifest files are metadata spreadsheets, saved as .csv files (UTF-8), and can easily be created using a program like Microsoft Excel or Google Sheets. They contain the information that DSP will use to ingest the content, and must be stored in a Manifest Subfolder listed on the user's dashboard or the Map Document

These documents list the titles that will be ingested into DSP, all of the metadata for those titles. They also tell our system where it can find all of the videos, images, and timed-text files that will be ingested.

The dotstudioPRO user is responsible for creating these documents and uploading them to their Amazon S3 bucket. Clients will need create a new CSV Manifest for each ingest. Sometimes more than one CSV Manifest may be required for ingest.

CSV Manifest Specifications

Download Sample CSV Manifest

 

CSV Manifest Best Practices

CSV Manifest Format:

Each row in the document corresponds to a video, season (child channel), or series (parent channel). The values in each field correspond to the same validation rules as the dotstudioPRO MRSS ingest spec. If information is not included in the CSV Manifest, it will not be pulled into the DSP dashboard. All CSV Manifests must follow specific formatting rules.

 

Episodic Content:

For serial/episodic content, all seasons, videos, and series level data for a given series should be included in the same CSV Manifest. Breaking up a series into multiple documents can cause taxonomy issues (see below CHANNEL AND PLAYLIST GENERATION for more info).

 

Number of Unique Items:

Each CSV Manifest should be limited to no more than 50 unique rows. Documents containing more than this risk timing out, and can result in prolonged ingest times or failed ingest.

 

Media Asset Paths:

Full paths for media assets are only required for if the files are stored in a location that is different from the corresponding CSV manifest. Filenames can be provided without full path if the media and csv are stored in the exact same subfolder..

If providing a full path, the full HTTPS url is required.

Paths and filenames must not contain spaces. Spaces can be denoted with “%20”.

  • Incorrect: Star Wars folder/Star Wars.jpg
  • Correct: Star%20Wars%20folder/Star%20Wars.jpg

 

ID:

Each item (video, season, series) must be assigned a unique ID. This is used for the ingest process only and will not appear on the client’s dashboard. The ID is used to determine which assets have been ingested, and which ones are new. The same ID value must never be reused, even across different documents. Without an ID, an asset will not be ingested.

 

Parent ID (AKA Series ID):

The Parent ID value is only used for episodic content. It refers to the parent asset of a given video or season. Parent ID’s should not be used for standalone content or series level data. To ingest tv series properly:

  • episodic videos must list the ID of their parent season
  • seasons must list the ID of their parent series

 

Upload Order:

Clients should ALWAYS fully upload all media files before adding their CSV, otherwise it could trigger partial ingest. When a document is ingested, but the media files have not been uploaded, DSP treats the document as though it contains errors.

 

Dashboard Automation 

When dotstudioPRO reads from a CSV Manifest, it uses the "taxonomy" field to determine what objects to create in the user's dashboard:

Taxonomy Description
Video The data in this row will create a video. if it contains no "series name", "episode number", or "season number", a single channel will also be created and associated with the video. They will use the same metadata and images.
Season The data in this row will create a child channel. Episodes are associated with the season using the "Parent ID" field on each video.
Series The data in this row will create a parent channel. Seasons are associated with the series using the "Parent ID" field on each season.

 

Channel Settings:

New channels are created with the following settings:

  • Search-ability disabled
  • Show in Recommendations disabled
  • No assigned categories
  • All channels are set to “unpublished”
  • If no wallpaper image has been supplied, one is automatically generated.
    • For Movies (single channel): the video thumbnail is used
    • For TV Seasons (child channel): the video thumbnail from the first episode is used
    • For TV Seasons (parent channel): the video thumbnail from the first episode of the series is used

 

Playlist Generation:

When a CSV Manifest contains more than one video that share a “series name” and “season number” value, it will automatically create a playlist using these videos. The videos will be programmed into a playlist in ascending order by episode number. The playlist's name will use the format “[Shared Series name} - [Shared Season Number]”.

 

Missing Series Data:

If episodic videos are provided, but no "Season" or "Series" is listed in the CSV Manifest, channels will be created using series title and the images from the first episode of the series.

 

New Episodes of an Existing Series:

When new episodes or seasons of an existing series are ingested, they are not automatically connected to playlists or channels. These assets will need to be manually associated in the dashboard. This is a safeguard to prevent trojan horses that may cause an asset to be published prematurely.

  • New episodes of an existing season will not be added to any playlist.
  • New seasons (child channel) of an existing series will not be associated with the Series (parent channel).

 

Ingest Archive (S3 Automation)

After ingestion, CSV Manifests are renamed and moved to one of two subfolder in the client's S3. These folders will be automatically created ad hoc by DSP in the client’s S3 by the CSV Ingest System. 

 

Archive Folders

The subfolders denote whether the document contained any errors that would prevent a row from being ingested.

_IngestArchive 

Files in this subfolder were ingested successfully without errors. All titles were either already present on the dashboard or new records were created. Videos that were ingested to DSP, but failed the encode process are considered successful by the ingest system.

_IngestErrors

Files in this subfolder contained errors and records were unable to be generated for at least one title. If the document contains 100 titles, and 99 of them ingest successfully, but one fails, the whole document is moved to this subfolder.

 

image2.png

If no map document was provided, the folders will be generated in the path value provided in the dashboard.

If a map document is provided, DSP will read the client’s Map Document to determine where to create the new subfolders. Unique subfolders will be created for each row in the client’s Map Document CSV, and will be stored in the “S3 Relative Path” location. Whenever a CSV is ingested, it will be moved to the archive folders that share the S3 relative path with the document.

image4.png

The document pictured contains three rows, DSP will create three sets of Archive and Errors folders in the following locations:

  • s3://dsp-csv-ingest/Studio1
  • s3://dsp-csv-ingest/Studio2
  • s3://dsp-csv-ingest/Studio3

 

File Renaming

The new filename will only append a timestamp to the end of the filename using the convention [client's csv file name]-IngestedAt[yymmddhhmmssmss].csv.

  • Client’s File Name: MyMetadata.csv
  • Renamed: MyMetadata-IngestedAt220616092617000.csv

 

CSV Audit Tool (Error Reporting)

DSP users can review and manage all recent ingests through the Ingest Audit Tool. This view can be accessed directly by URL ([your dashboard]/admin/csv-ingest-audit) or through the User Settings.

This view displays a history of CSV Ingest logs for the user’s dashboard. Logs that contain 0 new records and 0 errors are purged after 90 days. All logs have a limited lifespan and are purged after 6 months.

 

image1.png

 

  1. Update List Button: When clicked, this button triggers the “store-csv-record” lambda, then refreshes the page. While the lambda is running, the loading animation plays to prevent users from clicking around. On refresh, the list is updated with incomplete CSV Manifests.
  2. CSV Audit Table: This table lists all available CSV Ingest logs for the active dashboard. The table lists the location of each CSV Manifest, the status of each log, the number of total videos, the number of videos that were ingested, and the number of videos that contained errors. If a document is pending ingestion, its status value will be replaced with an “Ingest” button.
  3. Ingest Button: When clicked, this button triggers the “ingest-csv-record” lambda for the selected document, then refreshes the page. The loading animation will play while the document is being read.
  4. View Details Button: when clicked, the user will be shown a diagnostic report of the ingest, including a list of all successful and all errored row items.

 

CSV Ingest Audit Status Legend: 

Status

Meaning

Success

DSP was able to successfully read the entire document and no errors were detected. All new videos were ingested, and any videos extant to the client’s dashboard were skipped over.

 

After ingest, the document was moved to the “_IngestArchive” folder.

Success with Errors

DSP was able to successfully read the entire document, but some errors were detected. Row items with errors were not ingested. After ingest, the document was moved to the “_IngestErrors” folder.


Details about the errors can be found by clicking the ”View Details” button.

Ingest Button

image6.png

dotstudioPRO has not attempted to ingest this document. Clicking the button will trigger the document to be ingested.

In Progress

dotstudioPRO is currently ingesting from this document.

Timeout Failure

The document contains too much data and dotstudioPRO was unable to read it entirely. Some items were not ingested. The document has not been moved and will be reread on subsequent ingests.

Failure

dotstudioPRO was unable to locate or read from the document, or every item in the document contained errors. No items were ingested. After ingest, the document was moved to the “_IngestErrors” folder.

 

The View Details Page:

By clicking the “VIEW DETAILS” button beside any completed log, users can review a list of assets that were ingested to dotstudioPRO. The report lists the name of the document, the date it was accessed, and breaks reports ingest data by taxonomy level (video, parent channel, child channel, single channel).

image8.png

 

Management and Workflow

CSV Ingest is a flexible ingest tool, so the below workflow should be considered as a recommendation for best practices. The below assumes all user credentials set up has already been completed. It represents the delivery of one example CSV document.

 

Step

Description

Responsibility

1

Media files are uploaded to the S3 bucket.

Dashboard Owner

2

A corresponding CSV Manifest is uploaded to a location specified in the client’s Dashboard/Map Document.

Dashboard Owner

3

dotstudioPRO looks at the client’s Dashboard/Map Document, then notes all new CSV Manifests. 


A list of new documents can be found in the CSV Ingest Audit Tool.

Automated by DSP

4

dotstudioPRO begins reading the new CSV Manifest.

Automated by DSP

5

For each new item, dotstudioPRO creates a corresponding video/channel record in the client’s dashboard. All new  videos begin transcoding in the client’s dashboard.

Automated by DSP

6

The CSV Manifest is moved and renamed.

  • Documents without errors are sent to “_IngestArchive”
  • Documents with errors are sent to “_IngestErrors”

Automated by DSP

7

The ingest is verified for accuracy in the user’s dashboard.

Dashboard Owner

8

If any, the user fixes errors for failed items, and redlivers media/CSV manifests for outlying titles.

Dashboard Owner

 

Updating Titles & Redeliveries:

CSV ingest can only be used to create new records in DSP, NOT update old ones. When DSP reads from a CSV Manifest, it will only ingest new titles into the dashboard.  Because of this, titles can be completely modified/edited in the dashboard without any fear of the data reverting to its original state.

If a title needs to be updated, it is usually faster and easier to make changes directly in the dotstudioPRO dashboard. 

Occasionally, users may wish to run full/partial reingestion of one or more titles using CSV Ingest. To do so they will need to make changes in both the dashboard and S3.

Step

Description

Location

1

Delete the video from the dashboard.  This will remove its ID from the system so it is considered a new video on the next ingest cycle.

Dashboard

2

The user will then need to upload a CSV with their changes to the Manifest Subfolder. They can do so by creating a new document, or by editing and moving a document from either the Archive/Error folders back into the Manifest Subfolder.

Amazon S3

 

Troubleshooting and Limitations

CSV ingest is designed as a starting point for curation and bulk delivery into a DSP dashboard. It cannot be used to update or modify existing metadata or media. If a title or group of titles partially ingest, corrections must be made within the dashboard directly; alternatively, the title can be deleted from the dashboard, and re-ingested from an updated CSV. The video will incur additional encoding costs.

 

Encoding Errors:

Occasionally, a video will ingest properly but fail to encode once it has arrived in the dashboard. This could be a problem with the source video or just an internal issue with the encoder. When a CSV Ingest video fails to encode, it needs to be retriggered by following the steps above.

 

Metadata and Ingest Errors:

Metadata and media assets are pulled directly from the spreadsheets, so if any metadata or media are missing, that information was likely missing or formatted incorrectly in the source CSV file.  DSP does not check for spelling, formatting, or other syntax errors. If metadata is inaccurate, DSP will inherit the bad data. It is important for users to fully understand the specifications for CSV Manifests.

 

When a video or channel does not generate at all in DSP. It is likely because its row contained an error in the CSV Manifest. If a row in the CSV Manifest is missing a mandatory field, that row to be skipped on ingest, the document is then considered to contain an error.

 

Errored CSV’s are noted in the CSV-Ingest-Audit tool. On ingest, they  are moved to the client’s _IngestErrors folder.

 

Video Failure Conditions:

From time to time a user may deliver a non-functional video (failed video). This issue can be caused by a number of issues:

  • A URL was not provided in Manifest CSV (title will not be ingested)
  • An invalid URL was provided in Manifest CSV (protected video, or wrong location)
  • The video fails to encode properly

 

If a video is file should fail encoding, the dashboard will still ingest the video metadata, ancillary deliverables, and established taxonomy, with the following exceptions:

  • The failed video will have its status set as though it had failed to encode.
  • The CMS on the row for that video will change to clearly identify failed videos so that clients may troubleshoot

 

Outages and Downtime:

CSV ingest functionality is tied to the DSP dashboard. If for any reason the dashboard goes down, CSV ingest will be inaccessible until the dashboard is restored.

 

Downloads and Templates








Updated