In this guide we'll walk you through the evaluation of model predictions on subsets of data by using the Validate
ScenarioTests. Given that the scenario tests are carefully assembled and cover representative samples of the dataset, this will allow you to easily assess and compare the performance of the new model predictions compared to existing models.
As pre-requisite to this guide you should learn how to:
- Upload model predictions (see Ground Truth Annotations in Nucleus and Model Predictions in Nucleus)
ScenarioTests out of Nucleus
Slices(see Creating Scenario Tests)
Assuming that you already uploaded several model predictions and created scenario tests, the workflow of evaluating model predictions on the assembled scenario tests.
import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") models = client.list_models() print(models) scenario_tests = client.validate.scenario_tests print(scenario_tests) model_for_eval = models # assuming we want to evaluate the first model job = client.validate.evaluate_model_on_scenario_tests( model_id=model_for_eval.id, # running the evaluation of one specific model scenario_test_names=[scenario_tests.name, scenario_test.name], # evaluating on the first two specific scenario tests ) job.sleep_until_complete()
This will trigger an asynchronous evaluation job. The status of this evaluation job can be accessed through the returned
Once the evaluation job is completed, the evaluation history of the specific scenario test will be updated accordingly.
eval_history = scenario_tests.get_eval_history() print(eval_history)
In addition, the evaluation results will become accessible in the GUI under the section Model Evaluation -> Scenario Tests.
Updated 9 months ago