Evaluating Scenario Tests

In this guide we'll walk you through the evaluation of model predictions on subsets of data by using the Validate ScenarioTests. Given that the scenario tests are carefully assembled and cover representative samples of the dataset, this will allow you to easily assess and compare the performance of the new model predictions compared to existing models.

As pre-requisite to this guide you should learn how to:

How to evaluate Scenario Tests

Assuming that you already uploaded several model predictions and created scenario tests, the workflow of evaluating model predictions on the assembled scenario tests.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
models = client.list_models()
print(models)

scenario_tests = client.validate.scenario_tests
print(scenario_tests)

model_for_eval = models[0] # assuming we want to evaluate the first model
job = client.validate.evaluate_model_on_scenario_tests(
    model_id=model_for_eval.id, # running the evaluation of one specific model
    scenario_test_names=[scenario_tests[0].name, scenario_test[1].name], # evaluating on the first two specific scenario tests
)
job.sleep_until_complete()

This will trigger an asynchronous evaluation job. The status of this evaluation job can be accessed through the returned AsyncJob object.

Completed evaluations

Once the evaluation job is completed, the evaluation history of the specific scenario test will be updated accordingly.

eval_history = scenario_tests[0].get_eval_history()
print(eval_history)

In addition, the evaluation results will become accessible in the GUI under the section Model Evaluation -> Scenario Tests.