Overview

In this guide, we'll walk through the steps to upload your 2D video data to Nucleus. Nucleus supports VideoScenes, which are either mp4 videos or sequences of DatasetItems where each DatasetItem represents a video frame.

Create a Dataset for VideoScenes
Grant Scale access to your data
Create VideoScenes from video files
Create video DatasetItems if you are uploading a video as an array of images
Create VideoScenes from DatasetItems
Upload VideoScenes to Dataset

Creating a `Dataset`

Get started by creating a new Dataset to which to upload your video data. In order to create a Dataset to which you can upload VideoScenes you need to set the is_scene flag to True during creation.

from nucleus import NucleusClient

client = NucleusClient(YOUR_API_KEY)

dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=True)

Granting Scale Access to Your Data

If you are using non-public cloud storage for your images or videos, then you need to make sure your remote data is accessible to Scale by following this guide.

Creating `VideoScenes` from video files

We can create a VideoScene directly from a video mp4 file by providing the video url to the video_location parameter. We can skip uploading video data to Scale using privacy mode by following this guide. Note: privacy mode is only available to enterprise customers.

from nucleus import VideoScene

video_url = "s3://your-bucket-name/001/00.mp4"

scene = VideoScene(
    reference_id="video-1",
    video_location=video_url,
  	metadata= {"example_boolean_metadata_field": True, "example_scalar": 4}
)

Creating Video `DatasetItems` if you are uploading a video as an array of images

Alternatively, we can create a VideoScene from an array of DatasetItems, each representing a frame of a video, via API. This step should be skipped if you are uploading videos directly as mp4 files.

When uploading items via API, you'll first need to construct DatasetItem payloads. The best way to do so is using the Python SDK's DatasetItem constructor, which takes in a few parameters:

Property	Type	Description
image_location	string (required)	The remote URL to the video frame image. For large uploads we require the data to be stored within AWS S3, Google Cloud Storage, or Azure Blob Storage for faster concurrent & asynchronous processing. See here for info on how to grant Scale access to your remote data.
reference_id	string (required)	A user-specified identifier for the video frame. Typically this is an internal filename or any unique, easily identifiable moniker.
metadata	dict	Optional metadata pertaining to the frame, e.g. time of day, weather. These attributes will be queryable in the Nucleus platform. Metadata can be updated after uploading (via reference ID).

from nucleus import DatasetItem

frame_urls = [
    "s3://your-bucket-name/001/00.jpeg",
    "s3://your-bucket-name/001/01.jpeg",
    "s3://your-bucket-name/001/02.jpeg"
    "s3://your-bucket-name/001/03.jpeg"
]
reference_ids = ["video-1-frame-0", "video-1-frame-1", "video-1-frame-2", "video-1-frame-3"]

metadata_dicts = [
  	{"timestamp": "1645144073030", "is_raining": True},
  	{"timestamp": "1645144073060", "is_raining": True},
  	{"timestamp": "1645144073090", "is_raining": True},
  	{"timestamp": "1645144073120", "is_raining": True},
]

dataset_items = []
for url, ref_id, metadata in zip(frame_urls, reference_ids, metadata_dicts):
    item = DatasetItem(image_location=url, reference_id=ref_id, metadata=metadata)
    dataset_items.append(item)

Creating `VideoScenes` from `DatasetItems`

After creating DatasetItems for each frame, you can string up to 3000 frames together into a VideoScene. For longer sequences, consider reducing the frame rate or splitting videos into shorter sequences.

from nucleus import VideoScene

scene = VideoScene(
    reference_id="video-1",
  	frame_rate=30,
    items=dataset_items,
  	metadata= {"example_boolean_metadata_field": True, "example_scalar": 4}
)

You can also make changes to an existing VideoScene. The add_item method allows you to add or update items in the sequence by index. These methods can only be used if you created your VideoScene from an array of images.

# scene.items: [item0, item1, item2, item3]

# add a new item
scene.add_item(
    item=item4,
    index=4 # add to end of sequence
)

# overwrite existing item
scene.add_item(
    item=item0_new
    index=0,
    update=True # default is False, which will ignore updates on collisions
)

# scene.items: [item0_new, item1, item2, item3, item4]

Uploading `VideoScenes` to Nucleus

We'll upload to the Dataset created earlier in this guide. You can always retrieve a Dataset by its dataset ID. You can list all of your datasets' IDs using NucleusClient.datasets, or extract one from the Nucleus dashboard's URL upon clicking into the Dataset.

from nucleus import NucleusClient

client = NucleusClient(YOUR_API_KEY)

dataset = client.get_dataset(YOUR_DATASET_ID)

With your video scenes and dataset ready, you can now upload to Nucleus using Dataset.append.

# after creating or retrieving a Dataset
job = dataset.append(
    items=[scene0, scene1, scene2, ...],
  	update=True,
  	asynchronous=True # required for video uploads
)

# async jobs will run in the background, poll using:
job.status()

# or block until job completion using:
job.sleep_until_complete()

By setting the update flag to True, your upload will overwrite any existing scene or item-level metadata for any collisions on reference_id.

Overview

Creating a Dataset

Granting Scale Access to Your Data

Creating VideoScenes from video files

Creating Video DatasetItems if you are uploading a video as an array of images

Creating VideoScenes from DatasetItems

Uploading VideoScenes to Nucleus

Creating a `Dataset`

Creating `VideoScenes` from video files

Creating Video `DatasetItems` if you are uploading a video as an array of images

Creating `VideoScenes` from `DatasetItems`

Uploading `VideoScenes` to Nucleus