Batch Export

We provide a batch-export endpoint for slices (and datasets) that will pull down all the data that is required for training a model.

  • all DatasetItems
  • all Annotations

In order to speed up the export, the following pieces of data for annotations are not currently pulled down, although they may be in the future based on user needs.

  • Annotation IDs
  • Annotation-level metadata

This endpoint will time out for very large datasets or slices (>200k items) for now, but an async endpoint is coming soon. For item only export, we support paginated export of dataset items for datasets or slices of any size by using items_generator.

The ID of a slice or a dataset can be retrieved by inspecting the URL while using Nucleus. Dataset ids being with ds, and slice ids begin with slc.

Request Parameters

This endpoint does not take a payload.

Response Format

The response will be a list of dictionaries. The following is the format of the dictionary, which is the same as the format returned from getting a single dataset item.

KeyTypeDescription
itemdictA DatasetItem object.
annotationsdictA dict where the keys represent annotation type, and the values are arrays of Annotation objects of the corresponding type

import nucleus

client = nucleus.NucleusClient("YOUR_API_TOKEN")

example_slice = client.get_slice("YOUR_SLICE_ID")
example_dataset = client.get_dataset("YOUR_DATASET_ID")

exported_rows_from_slice = example_slice.items_and_annotations()
exported_rows_from_dataset = example_dataset.items_and_annotations()

image_url = exported_rows_from_dataset[0]['item'].image_location
box_annotations = exported_rows_from_dataset[0]['annotations']['box']

for item in dataset.items_generator():
  print(item.reference_id)