Upload Metadata
Nucleus currently supports uploading metadata as a dictionary with each dataset item, scene, ground truth annotation, model prediction, and segmentation mask. The metadata schema is inferred and grouped separately for dataset items, scenes, ground truth annotations, and model predictions.
We recommend using Dataset.update_scene_metadata
and Dataset.update_item_metadata
.
You can also re-run Dataset.append()
, Dataset.annotate()
, or Dataset.upload_model_predictions()
with the parameter update=True
. This will overwrite any existing metadata, deleting any metadata fields that are not present in the new payload. We have a cache keyed on image_url
that will skip image re-uploads, so subsequent uploads to update metadata will be significantly faster.
Metadata types
Nucleus groups all numerical and text metadata fields into three types:
Numerical
All numerical metadata fields are treated as numerical metadata. These fields will have histogram visualizations generated in the insights page, and can be queried on in the query bar.
Categorical
All string metadata fields are initially treated as categorical metadata. These fields will also have histogram visualizations in the insights page showing the frequency of each category throughout your dataset. These can also be queried on, with autocomplete suggesting each category in the search bar. Once more than 250 unique choices have been seen for the same metadata field, it will no longer be treated as categorical, and instead be treated as text metadata.
Text
String metadata fields that have more than 250 unique options will not have visualizations in the insights bar, but will still be queryable in the search bar, without autocomplete suggestions.
Geographic
Coordinate metadata can be specified on Scene
, DatasetItem
, Annotation
or Prediction
objects. If the metadata dictionary contains the keys latitude
and longitude
(or lat
and lon
), then the geographic information will be automatically extracted for the respective object. Coordinates should be provided in WGS 84 as decimal degrees. For example, metadata = {"lat": 52.5, "lon": 13.3}
.
Context Attachments
The context_attachments
key is specifically reserved for Context Attachments, which are images displayed in the UI that provide additional context for the DatasetItem
they describe. This is useful for augmenting low fidelity dataset items for curation. Context attachments may be supplied as follows:
item = DatasetItem(
image_location="http://farm6.staticflickr.com/5295/5465771966_76f9773af1_z.jpg",
reference_id="dog1",
metadata={
"context_attachments": [
{
"attachment": "http://farm1.staticflickr.com/107/309278012_7a1f67deaa_z.jpg", # Required
"frame": 0, # Optional
"dims": { "width": 100, "height": 100 }, # Optional
"camera_position": { "x": 100, "y": 100, "z": 100 }, # Optional
"metadata": { "foo": "bar" }, # Optional
},
{
"attachment": "http://farm9.staticflickr.com/8001/7679588594_4e51b76472_z.jpg",
},
# ... (no limit)
]
}
)
Other
Nucleus will automatically cast boolean metadata fields as the strings "true"
and "false
" which can be queried as categorical fields.
All other types of metadata, such as lists and dictionaries will still be stored and viewable in the image detail page, but will not support querying or visualizations.
Field Type Consistency
It is important that string and numerical metadata fields are consistent — if a metadata field has a string value, then all metadata fields with the same key should also have string values, and vice versa for numerical metadata. If conflicting types are found, Nucleus will return an error during upload!
Updated about 2 years ago