Find Rare Edge Cases

Gathering rare items based on adjustable visual similarity


With this workflow, Nucleus enables you to find rare items starting with only a single manually identified positive. This is very powerful to efficiently mine large amounts of rare edge cases, which will massively increase model performance when added to training data.

Pre-reqs: Dataset with embeddings


  1. Open an indexed dataset
  2. Select any image of interest e.g. one having police car
  3. Click the “autotag” button & select “create new image autotag”
  4. Enter a name and press “create tag” to proceed
  5. From the grid, select more samples which are similar to the original image
  6. Nucleus will use these samples to further refine the results being shown in the grid
  7. Continue refinement until you are satisfied with the items being shown on the grid
  8. Once you are satisfied, press “review” and “commit” the autotag
  9. Nucleus will use these manual positives to calculate similarity score for all dataset items
  10. Click on the “Autotag” button in the top navigation and select “Manage Autotags”
  11. Select your newly created autotag to view the similarity score distribution
  12. Similarity is normalized to a range of -1:1. Higher the score, more similar the image
  13. Click on “Query Autotag”, adjust the query threshold to your liking & get similar images

The final results will show samples which have a similarity score matching the query thresholds. You can find out about more advanced Autotag usage here.