Dataset.append( items: List[DatasetItem], asynchronous: bool, update: bool ) -> Union[dict, AsyncJob]
A Dataset
can be populated with labeled and unlabeled data. Using Nucleus, you can filter down the data inside your dataset using custom metadata about your images.
For instance, your local dataset may contain Sunny
, Foggy
, and Rainy
folders of images. All of these images can be uploaded into a single Nucleus Dataset
, with added metadata like {"weather": "Sunny"}
.
To update an item's metadata, you can re-ingest the same items with the update
argument set to true. Existing metadata will be overwritten for DatasetItems
in the payload that share a reference_id
with a previously uploaded DatasetItem
. To retrieve your existing reference_id
s, see Get Dataset Items.
Remote vs. Local Data Upload
Remote uploads take in a list of DatasetItems
whereas local uploads must occur item-by-item, one API call at a time.
Uploading remotely hosted data
- Keep the
content-type
of the request asapplication/json
. - Specify the URL of the image location. Make sure Scale can access this URL.
- We currently support remote URLs with prefixes:
gs:
,s3:
,http:
, orhttps:
.
Uploading from local storage
- Change the
content-type
of the request tomultipart/form-data
. - In the
image
field, provide a local path on disk to the image. - In the
item
field, provide information about the image such as your self-definedreference_id
and any associated metadata.
curl "https://api.scale.com/v1/nucleus/dataset/ds_bw6de8s84pe0vbn6p5zg/append" \\
-u "YOUR_SCALE_API_KEY:" \
-H "Content-Type:multipart/form-data" \
-X POST \
-F "item={
\"reference_id\": \"image_ref_300000\",
\"metadata\": {
\"License Plate\": \"ZPH-J27\",
\"Recording Date\": \"2019-09-15\",
\"Recording Time\": \"14:24:21\",
\"weather\": \"sunny\",
\"camera\": \"back\"
}
};type=application/json" \
-F "image=@{PATH_TO_IMAGE}"
The asynchronous endpoint returns an AsyncJob
object that can sleep until complete, return current status, or return errors. There are two stages: first the metadata for the images is pulled into a reupload queue. Then the images are processed in batches of 3000. During this phase, the status will be updated every 3000 images.