Datasets in Nucleus

Datasets

In Nucleus, a Dataset represents a collection of your data. Datasets enable you to explore and interact with your data in the Nucleus dashboard in a number of ways:

  • Adding new data
  • Updating or removing old data
  • Grouping and tracking items of interest
  • Querying, filtering, and sorting
  • Visualizing images, pointclouds, videos, etc.
  • Overviewing high-level charts
  • ...and much more!

You can also perform various actions on your Datasets via the API (we recommend using the Python SDK) to create an automated end-to-end pipeline that fits your workflows. Check out the rest of the guides in the sidebar to learn more.

DatasetItems

Datasets are comprised of DatasetItems. A DatasetItem represents a single image, pointcloud, or video frame. DatasetItems can also be grouped together into more complex sequences such as LidarScenes or VideoScenes.

Each DatasetItem houses three main components:

  • The remote URL or local path pointing to the data file
  • A "reference ID" which serves as a human readable identifier for the item
  • Optional metadata (stored as arbitrary key-value pairs)

Uploading Your Data

To upload your data to Nucleus, you simply need to:

  1. Create a Dataset
  2. Construct DatasetItems with the aforementioned parameters
  3. Optionally organize them into LidarScenes or VideoScenes if applicable
  4. Upload the items to your Dataset

Alternatively, if you have an existing labeling project with Scale, check out the Ingest From Labeling guide.

Check out the below guides for specific tutorials on uploading image data, video data, and LiDAR scene data into Nucleus!