Data filtering for Scenario Tests
A standard ScenarioTest
evaluates the attached metrics on all predictions and annotations of the selected dataset items in a slice. We introduced data filters in order to let the users specify the test data for evaluations in a more granular way. Using these filters, users can specify exactly which annotations and predictions to take into account for the tests.
Data filter types
The test data can be either filtered using a FieldFilter
or a MetadataFilter
. As the name suggests, a FieldFilter
is applied on the attributes of annotations or predictions, e.g. the label class. A MetadataFilter
is applied on the metadata stored in the key-value store within the annotations and predictions.
Applying filters to tests
The filters are attached to the tests by passing them as arguments of the evaluation functions. A test is defined on a selection of dataset items (through a Slice
), while each evaluation can further filter down the specific data taken into account to compute the evaluation metrics. Therefore, a user could apply different filters to different evaluation functions.
Filters are defined in a disjunctive normal form where the inner filter lists are combined via AND
and the outer one (if defined) combines the inner lists with an or. Written out in a SQL fashion it combines to a (cond AND cond AND cond ...) or (cond AND cond AND cond ...) or ...
manner, as shown in the example for a 3D / cuboid dataset below:
import nucleus
from nucleus.metrics import MetadataFilter, FieldFilter
client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
test_slice_id = "slc_c96r9h6qr23g0e98zabc" # this is just a dummy ID, please use the ID for the slice you want to define the test on
# Defining filters for the relative distance of objects, the number of points within the cuboid and the object class
# relative_distance < 70 AND n_points > 15 AND label == 'pedestrian'
close_pedestrians = [MetadataFilter("relative_distance", "<", 70), MetadataFilter("n_points", ">", 15), FieldFilter("label", "=", "pedestrian")]
# relative_distance < 70 AND n_points > 5 AND label == 'pedestrian'
farther_away_pedestrians = [MetadataFilter("relative_distance", ">", 70), MetadataFilter("n_points", ">", 5), FieldFilter("label", "=", "pedestrian")]
# close_pedestrians OR farther_away_pedestrians
either_close_or_farther = [close_pedestrians, farther_away_pedestrians]
only_pedestrian_annotations = [FieldFilter("label", "=" "pedestrian")]
# Filters can be applied to both predictions and annotations
evaluation_functions = [
eval_functions.cuboid_precision(prediction_filters=[either_close_or_farther], annotation_filters=[only_pedestrian_annotations]),
eval_functions.cuboid_recall(prediction_filters=[either_close_or_farther], annotation_filters=[only_pedestrian_annotations])]
]
client.validate.create_scenario_test("filtered data test", test_slice_id, evaluation_functions=evaluation_functions)
After applying the filters and attaching the evaluation functions to a test, the tests can be evaluated in the standard way as described in the Evaluating Scenario Tests section.
Segmentation specific filters
We have two specific filter types SegmentFieldFilter
and SegmentMetadataFilter
for Segmentation masks. These filter the underlying data based on information on the Segment annotations for the SegmentationAnnotation and SegmentationPrediction.
If an annotation_filter
is applied to a segmentation function you are instructing it to ignore the areas in the image that don't pass the filter conditions. No false positives or false negatives will be counted in the areas filtered out.
The filter annotation_filters=[SegmentFieldFilter('label', 'not in' ['sky', 'tree'])]
will ignore the areas annotated with the labels sky
and tree
.
import nucleus
from nucleus.metrics import SegmentFieldFilter
client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
test_slice_id = "slc_insert_your_id" # this is just a dummy ID, please use the ID for the slice you want to define the test on
not_sky_and_trees = [SegmentFieldFilter("label", "not in", ["sky", "tree"])]
ef = client.validate.eval_functions
# NOTE: Here we're only filtering the underlying GT annotated image
configured_functions = [
ef.seg_map(annotation_filters=not_sky_and_trees),
ef.seg_precision(annotation_filters=not_sky_and_trees),
ef.seg_recall(annotation_filters=not_sky_and_trees),
ef.seg_fwavacc(annotation_filters=not_sky_and_trees),
ef.seg_iou(annotation_filters=not_sky_and_trees),
]
client.validate.create_scenario_test("Not sky and trees GT", test_slice_id, evaluation_functions=configured_functions)
If a prediction_filter
is applied to a segmentation filter we're ignoring certain predictions for the aggregate. If we want to specifically monitor performance on animals we can set up the filter prediction_filters=[SegmentFieldFilter('label', 'in', ['cat', 'dog', 'animal'])]
.
import nucleus
from nucleus.metrics import SegmentFieldFilter
client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
test_slice_id = "slc_insert_your_id" # this is just a dummy ID, please use the ID for the slice you want to define the test on
animals_in_taxonomy = [SegmentFieldFilter('label', 'in', ['cat', 'dog', 'animal'])]
ef = client.validate.eval_functions
# NOTE: Here we're only filtering the underlying GT annotated image
configured_functions = [
ef.seg_map(prediction_filters=animals_in_taxonomy),
ef.seg_precision(prediction_filters=animals_in_taxonomy),
ef.seg_recall(prediction_filters=animals_in_taxonomy),
ef.seg_fwavacc(prediction_filters=animals_in_taxonomy),
ef.seg_iou(prediction_filters=animals_in_taxonomy),
]
client.validate.create_scenario_test("Not sky and trees GT", test_slice_id, evaluation_functions=configured_functions)
Both annotation_filters
and prediction_filters
can be combined to create very specific performance metrics.
Updated over 2 years ago