Creating Scenario Tests
A Validate ScenarioTest
is a way to monitor model performance in critical scenarios. Each test is defined on a subset (Slice
) of data and can consist of multiple evaluation metrics. Within a test user can either compare the model performance against other baseline models (e.g. on which metrics is model X better than model Y) or hard thresholds (e.g. is the IoU of model X > 0.8 on the slice of interest).
In this guide we'll create a scenario test on a fictional Slice
with several evaluation functions and walk through how baseline models and pass/fail thresholds can be set.
Interacting with Validate SDK
You can use the Validate Python SDK via the NucleusClient.validate
module. You set up the SDK exactly like you would when interacting with Nucleus (see the Getting Started section). You can list existing scenario tests (will be empty in the beginning) and available evaluation functions.
import nucleus
client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
existing_tests = client.validate.scenario_tests
print(existing_tests)
eval_functions = client.validate.eval_functions
print(eval_functions)
Creating a new ScenarioTest
ScenarioTest
Selecting the slice
We select the Slice
we want as the basis of our ScenarioTest
data as we would normally select slices from the SDK. You can also find the Slice ID by viewing the slice in the UI and copying the slc_...
ID from the URL.
# NOTE: This slice does not exist, please update with a valid
# slice ID from your dataset
pedestrians_slice = client.get_slice("slc_c2dfzaxyr4kh0na1ms")
Alternatively, you can list the slices associated with a given dataset.
# NOTE: This dataset does not exist, please update with a valid dataset ID
dataset = client.get_dataset("ds_c6k9faxtz45009103xz0")
dataset.info
Selecting the EvaluationFunction
EvaluationFunction
Validate comes with a growing set of standard evaluation functions which are listed on the AvailableEvaluationFunctions
object returned by validate.eval_functions.public_functions
. These can all be found as members of the client.validate.eval_functions
object. If the public evaluation functions don't satisfy your needs and you need to define more private evaluation functions, please contact us and we'll help you to get the private evaluation functions set up.
We currently support the following list of evaluation functions:
- 2D object detection: bounding box precision, recall, IOU, mAP
- 3D object detection: cuboid precision, recall, IOU (both 3D and birds-eye-view 2D)
- Image categorization: Categorical F1 score
Defining a ScenarioTest
ScenarioTest
We finally create the ScenarioTest
with the pedestrians slice and the criterion for the IOU and mean average precision evaluation functions. We add the two additional metrics in a subsequent step. Note, that all of them could be added at once in the list. Each test needs to at least contain one evaluation function upon instantiation.
scenario_test = client.validate.create_scenario_test(
name="Pedestrians on a crosswalk",
slice_id=pedestrians_slice.id,
evaluation_functions=[
client.validate.eval_functions.bbox_iou(),
client.validate.eval_functions.bbox_map()
]
)
scenario_test.add_eval_function(client.validate.eval_functions.bbox_precision())
scenario_test.add_eval_function(client.validate.eval_functions.bbox_recall())
scenario_test.get_eval_functions()
Once setup, the evaluations on the test can be easily run as described in the Evaluating Scenario Tests page.
Editing an existing ScenarioTest
ScenarioTest
If you want to further refine the scenario test and get pass/fail insights for the model performance you can either (1) define a baseline model to compare agains (requires 2+ models to be uploaded) or (2) define a manual threshold for evaluation functions. Both approaches will be introduced below.
Defining a baseline model
You should go with this approach if you want to conduct relative model comparison and compare your new models against an existing baseline model. After setting the baseline model, all other models will be compared against the baseline model on all evaluation functions attached to the scenario test of interest.
# You can get started to list all of your available models in order to pick the baseline
client.list_models()
# From the returned models, pick the model_id of the model of choice and run
scenario_test.set_baseline_model("prj_c6rjnmyejnvg078j12r0") # this model_id won't work, just a placeholder
Setting a pass/fail threshold
In case you not only want to compare against a baseline model but evaluate your model against an absolute threshold for pass/fail decisions, this threshold can also be defined using the Validate SDK.
# get the attached evaluation metrics
metrics = scenario_test.get_eval_functions()
# set a threshold for all of them
for m in metrics:
m.set_threshold(0.6)
Updated over 2 years ago