Updating Metadata for Scale-Imported Data
In this tutorial, we'll discuss how to update metadata for DatasetItems that were imported from a Scale labeling project.
At a high level, the only requirement is a mapping from filepath to metadata fields to update. This filepath should be the same one initially used to upload your data to the Scale labeling project, e.g. s3://path/to/file.jpg. The metadata fields to update should take the form of a Python dict, e.g. {"color": "red", "new_field": "new_value"}.
You can either add new metadata fields or update existing metadata field values.
If the field (key in the metadata dict) does not yet exist, the Nucleus API will append the new key-value pair as a queryable metadata field to the item.
If the field already exists on the item, the Nucleus API will replace the old value with the newly supplied value.
Currently, the Nucleus API does not support deletion of metadata fields.
Suppose we have a metadata field color which can take on values red, blue, yellow, pink, or green that we want to attach to each image in our Dataset (which was imported from Scale).
- Internally, construct a mapping from each image filepath to the metadata fields you wish to add/update, e.g.:
filepath_to_metadata = { "s3://some/path/image_0.jpg": {"color": "red"}, "s3://some/path/image_1.jpg": {"color": "blue"}, ... } - Iterate through your
DatasetItemsvia Nucleus API (API Reference). - Retrieve the
image_locationfor eachDatasetItem(orpointcloud_locationfor pointclouds). - Use the mapping from (1) to map to construct a new Python dict mapping each
DatasetItem.reference_idto its dict of metadata field to update, e.g.:
Therefid_to_metadata = {} for item in dataset.items_generator(): refid_to_metadata[item.reference_id] = filepath_to_metadata[item.image_location]reference ID -> new metadatadict mapping should look like this:>>> print(refid_to_metadata) { "61e878916666940043f06d20": {"color": "red"}, "61e878916666940043f06d21": {"color": "blue"}, ... } - Use
Dataset.update_item_metadataand pass in the dict from (4).- To update scene-level metadata of LidarScenes or VideoScenes, use
Dataset.update_scene_metadata.
- To update scene-level metadata of LidarScenes or VideoScenes, use
Below is an example of the full pipeline code to update metadata for images imported from a Scale labeling project:
import nucleus
# === Step 1 ===
# Construct mapping: filepath -> dict of metadata values to add/update (e.g. color)
filepath_to_metadata = {
"s3://some/path/image_0.jpg": {"color": "red"},
"s3://some/path/image_1.jpg": {"color": "blue"},
"s3://some/path/image_2.jpg": {"color": "yellow"},
"s3://some/path/image_3.jpg": {"color": "red", "new_field": "foo"},
"s3://some/path/image_4.jpg": {"color": "pink", "new_field": "bar"},
}
# alternatively, you can define a function
def get_new_metadata_for_filepath(filepath: str) -> str:
"""Fetches the corresponding metadata fields for a given filepath."""
pass
# === Steps 2-4 ===
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
imported_dataset = nucleus.get_dataset("YOUR_DATASET_ID")
refid_to_metadata = {
item.reference_id: filepath_to_metadata[item.image_location]
for item in imported_dataset.items_generator()
}
# === Step 5 ===
imported_dataset.update_item_metadata(refid_to_metadata)
As a sanity check, DatasetItems imported from Scale tasks will have reference_id = task_id (which takes the form of an arbitrary hash, e.g. 61e878916666940043f06d20). Thus you can retrieve the same reference ID (task ID) -> filepath mapping using Scale labeling APIs as well, rather than retrieving it via Nucleus's DatasetItem export API.
You can then compose this reference ID -> filepath mapping with your filepath -> new metadata mapping to form the desired mapping: reference ID -> new metadata for use with Dataset.update_item_metadata.
Updated over 3 years ago