Upload Metadata

Nucleus currently supports uploading metadata as a dictionary with each dataset item, scene, ground truth annotation, model prediction, and segmentation mask. The metadata schema is inferred and grouped separately for dataset items, scenes, ground truth annotations, and model predictions.

We recommend using Dataset.update_scene_metadata and Dataset.update_item_metadata.

You can also re-run Dataset.append(), Dataset.annotate(), or Dataset.upload_model_predictions() with the parameter update=True. This will overwrite any existing metadata, deleting any metadata fields that are not present in the new payload. We have a cache keyed on image_url that will skip image re-uploads, so subsequent uploads to update metadata will be significantly faster.

Metadata types

Nucleus groups all numerical and text metadata fields into three types:

Numerical

All numerical metadata fields are treated as numerical metadata. These fields will have histogram visualizations generated in the insights page, and can be queried on in the query bar.

Categorical

All string metadata fields are initially treated as categorical metadata. These fields will also have histogram visualizations in the insights page showing the frequency of each category throughout your dataset. These can also be queried on, with autocomplete suggesting each category in the search bar. Once more than 250 unique choices have been seen for the same metadata field, it will no longer be treated as categorical, and instead be treated as text metadata.

Text

String metadata fields that have more than 250 unique options will not have visualizations in the insights bar, but will still be queryable in the search bar, without autocomplete suggestions.

Geographic

Coordinate metadata can be specified on Scene, DatasetItem, Annotation or Prediction objects. If the metadata dictionary contains the keys latitude and longitude (or lat and lon), then the geographic information will be automatically extracted for the respective object. Coordinates should be provided in WGS 84 as decimal degrees. For example, metadata = {"lat": 52.5, "lon": 13.3}.

Context Attachments

The context_attachments key is specifically reserved for Context Attachments, which are images displayed in the UI that provide additional context for the DatasetItem they describe. This is useful for augmenting low fidelity dataset items for curation. Context attachments may be supplied as follows:

item = DatasetItem(
  image_location="http://farm6.staticflickr.com/5295/5465771966_76f9773af1_z.jpg",
  reference_id="dog1",
  metadata={
    "context_attachments": [
      {
        "attachment": "http://farm1.staticflickr.com/107/309278012_7a1f67deaa_z.jpg", # Required
        "frame": 0,                                                                   # Optional
        "dims": { "width": 100, "height": 100 },                                      # Optional
        "camera_position": { "x": 100, "y": 100, "z": 100 },                          # Optional
        "metadata": { "foo": "bar" },                                                 # Optional
      },
      {
        "attachment": "http://farm9.staticflickr.com/8001/7679588594_4e51b76472_z.jpg",
      },
      # ... (no limit)
    ]
  }
)

Other

Nucleus will automatically cast boolean metadata fields as the strings "true" and "false" which can be queried as categorical fields.

All other types of metadata, such as lists and dictionaries will still be stored and viewable in the image detail page, but will not support querying or visualizations.

🚧

Field Type Consistency

It is important that string and numerical metadata fields are consistent — if a metadata field has a string value, then all metadata fields with the same key should also have string values, and vice versa for numerical metadata. If conflicting types are found, Nucleus will return an error during upload!


What's Next

Query on your metadata in the dashboard!