Adding Category Annotations and Predictions

Overview

In this guide, we'll walk through the steps to upload category ground truth annotations and model predictions to Nucleus.

Uploading Ground Truth Annotations and Model Predictions

You can upload categorical annotations and predictions to your dataset in much the same way that you would other annotation types.

from nucleus import (
  NucleusClient,
  CategoryAnnotation,
  CategoryPrediction,
)

# setup
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("ds_bw6de8s84pe0vbn6p5zg")

# create + upload ground truth annotation
category_gt = CategoryAnnotation(
  label="dress", 
  reference_id="image_1", 
  metadata={"dress_color": "navy"}
)

response = dataset.annotate(
  annotations=[category_gt],
  update=True,
  asynchronous=False # async is recommended, but sync jobs are easier to debug
)

# create model
model = client.create_model(
  name="My Model",
  reference_id="My-CNN",
  metadata={"timestamp": "121012401"}
)

# create + upload model prediction
category_pred = CategoryPrediction(
  label="dress",
  reference_id="image_1",
  confidence=0.5,
  class_pdf={"shirt": 0.4, "trousers": 0.1, "dress": 0.5}
)

job = dataset.upload_predictions(
  	model=model,
  	predictions=[category_pred],
  	update=True,
    asynchronous=True # async is recommended, but sync jobs are easier to debug
)

# poll current status
job.status()

# block until upload completes
job.sleep_until_complete()

Unlike with other annotation types, you don't need to call the calculate model metrics endpoint to update matches against ground truth annotations and calculate various metrics such as IOU for categorization. Sorting by metrics, as well as the evaluation plots and metrics present in the Insights page will be automatically refreshed when you upload new category annotations and/or predictions. This is because by default you can only have a single category ground truth annotation per dataset item, and a single category prediction per dataset item per model, making matching category ground truth annotations and predictions trivial.

Adding Taxonomies

Adding taxonomies to your dataset allows you to upload more than one category annotation and prediction to a single dataset item. The taxonomy ensures that category annotations and predictions are properly matched for model debugging. The taxonomy needs to be added to the dataset before you can upload category annotations and predictions from that taxonomy to the items in your dataset.

from nucleus import (
  NucleusClient,
  CategoryAnnotation,
  CategoryPrediction,
)

# setup
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("ds_bw6de8s84pe0vbn6p5zg")

# create taxonomy
dataset.add_taxonomy(
  taxonomy_name="clothing_type", 
  taxonomy_type="category", 
  labels=["shirt", "trousers", "dress"]
)

# create + upload ground truth annotation
category_gt = CategoryAnnotation(
  label="dress", 
  reference_id="image_1", 
  taxonomy_name="clothing_type", 
  metadata={"dress_color": "navy"}
)

response = dataset.annotate
  annotations=[category_gt],
  update=True,
  asynchronous=False # async is recommended, but sync jobs are easier to debug
)

# create model
model = client.create_model(
  name="My Model",
  reference_id="My-CNN",
  metadata={"timestamp": "121012401"}
)

# create + upload model prediction
category_pred = CategoryPrediction(
  label="dress",
  reference_id="image_1",
  taxonomy_name="clothing_type",
  confidence=0.5,
  class_pdf={"shirt": 0.4, "trousers": 0.1, "dress": 0.5}
)

job = dataset.upload_predictions(
  	model=model,
  	predictions=[category_pred],
  	update=True,
    asynchronous=True # async is recommended, but sync jobs are easier to debug
)

# poll current status
job.status()

# block until upload completes
job.sleep_until_complete()

Multi-Category Annotations and Predictions

Currently, the best way to add multi-category annotations or predictions to an item is to use multiple taxonomies. Rather than a single taxonomy with x classes, you can add x binary taxonomies, each with classes: True or False.

For example, consider a multi-category model predicting classes: shirt, trousers, and dress. You would first add three True/False taxonomies for each of the three classes. To add a prediction for an item of both the shirt and dress class, you would simply add True predictions for the shirt and dress taxonomies, and a False prediction for the trousers taxonomy. Unfortunately, the confidence and class PDFs will not be calibrated across taxonomies; one way to get around this is to attach actual PDF values or multiplicative weights as metadata.

Note this is a temporary workaround for native multi-category support in Nucleus, which is coming soon!