Creating a Test Suite
Intro
One of the core concepts within the Vals Platform is the Test Suite. A test suite is how you define what inputs you expect to receive from users, and how you expect your model to behave in response to those inputs.
You can view all of the Suites in your organization by going to platform.vals.ai/suite. To create a new Test Suite, click the ”+ New Suite” button in the top right.
Adding Tests
Test Suites are composed of “Tests”. Each Test tests one unique input to your model. It also has a one or more “checks” — each check looks for a specific thing in the model’s response. For example, you may have a check for grammar, or a check to make sure the model mentions a certain thing in its response.
To add a test, press “Add Test”. Then, you can enter the Test Input you would like to submit to your model. For example, if your application is a copilot for personal injury law, an example Test Input would be “How is fault determined in a car accident?”
Next, you can add your checks. Each check should verify one part or aspect of the model’s output. For example, in this instance, we want to make sure that the model’s output mentions something about how fault in an accident can be determined by looking at who, if anyone, broke traffic laws. We also may want to check that the model’s output is gramatically correct. We can codify both of these expectations as checks.
You should add multiple tests to your test suite - enough to give you confidence that your model is behaving as expected across a wide variety of user inputs.
There is a wide library of operators, each one can be used to check something different about the model. See the full list of operators here.
Global Checks
Often, there may be checks that should always be applied for every test. For example, you may always want to check that the model cites its sources in the output.
Instead of manually adding the same check to every single test, you can press “Add Global Checks”. Any global check will be applied to every test in the suite.
Augmentation
Sometimes, a small phrasing change in a question can make a noticeable difference in the model’s output. We provide a way to automatically test different phrasings of the same question.
To do this, simply go to the “Augment” dropdown in the top right, then select “Rephrase All”. It will create a copy of every test in the test suite, with a slightly rephrased test input. Rephrasals can be removed by pressing “Delete all rephrasals.”
Likewise, it is often useful to test how a model performs on different languages. We provide a way to automatically translate every test to a new language - simply select “Translate” from the “Augment” menu.
Importing a CSV File
If you already have a Test Suite used for evaluation, you can import it into Vals using the “Import from File” button. This allows you to select a CSV file that is parsed and added to the Test Suite. The columns of this CSV should be as follows:
Test Input
: The input to the model.Operator
: The operator to use for the check.Criteria
: The criteria for the check.Right Answers
(optional): If using a “Right Answer”, then the correct output you’re expecting should be here (see Right Answers for more information).
Make sure that the elements in the Operator column are entered exactly as they appear in the website drop-down. See here for a full example of the CSV file: Example CSV File