Description

With this workflow, Nucleus enables you to find rare items starting with only a single manually identified positive. This is very powerful to efficiently mine large amounts of rare edge cases, which will massively increase model performance when added to training data.

Pre-reqs: Dataset with embeddings

Steps

Open an indexed dataset
Select any image of interest e.g. one having police car
Click the “autotag” button & select “create new image autotag”
Enter a name and press “create tag” to proceed
From the grid, select more samples which are similar to the original image
Nucleus will use these samples to further refine the results being shown in the grid
Continue refinement until you are satisfied with the items being shown on the grid
Once you are satisfied, press “review” and “commit” the autotag
Nucleus will use these manual positives to calculate similarity score for all dataset items
Click on the “Autotag” button in the top navigation and select “Manage Autotags”
Select your newly created autotag to view the similarity score distribution
Similarity is normalized to a range of -1:1. Higher the score, more similar the image
Click on “Query Autotag”, adjust the query threshold to your liking & get similar images

The final results will show samples which have a similarity score matching the query thresholds. You can find out about more advanced Autotag usage here.

Find Rare Edge Cases

Description

Steps

What's Next