Evaluating Scenario Tests
In this guide we'll walk you through the evaluation of model predictions on subsets of data by using the Validate ScenarioTest
s. Given that the scenario tests are carefully assembled and cover representative samples of the dataset, this will allow you to easily assess and compare the performance of the new model predictions compared to existing models.
As pre-requisite to this guide you should learn how to:
- Upload model predictions (see Ground Truth Annotations in Nucleus and Model Predictions in Nucleus)
- Create
ScenarioTest
s out of NucleusSlices
(see Creating Scenario Tests)
How to evaluate Scenario Tests
Assuming that you already uploaded several model predictions and created scenario tests, the workflow of evaluating model predictions on the assembled scenario tests.
import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
models = client.list_models()
print(models)
scenario_tests = client.validate.scenario_tests
print(scenario_tests)
model_for_eval = models[0] # assuming we want to evaluate the first model
job = client.validate.evaluate_model_on_scenario_tests(
model_id=model_for_eval.id, # running the evaluation of one specific model
scenario_test_names=[scenario_tests[0].name, scenario_test[1].name], # evaluating on the first two specific scenario tests
)
job.sleep_until_complete()
This will trigger an asynchronous evaluation job. The status of this evaluation job can be accessed through the returned AsyncJob
object.
Completed evaluations
Once the evaluation job is completed, the evaluation history of the specific scenario test will be updated accordingly.
eval_history = scenario_tests[0].get_eval_history()
print(eval_history)
In addition, the evaluation results will become accessible in the GUI under the section Model Evaluation -> Scenario Tests.
Updated over 2 years ago