Creating Test Suites with the SDK

Overview

In the SDK, every construct is generally represented as a Python object (constructed with Pydantic). To create a test suite, you can first create a Suite object, then call create().

Basic Test Suite Creation

Here’s a simple example of creating a test suite:

from vals import Suite, Test, Check

async def create_suite():
    suite = Suite(
        title="Test Suite",
        description="This is an example test suite.",
        tests=[
            Test(
                input_under_test="What is QSBS?",
                checks=[
                    Check(operator="equals", criteria="QSBS")
                ]
            )
        ],
    )
    await suite.create()

    print("Url: ", suite.url)

This creates a simple test suite with a single test.

Tests with Files

Our system also supports testing files as input. For example, you may want to test a model’s ability to answer questions about a contract, or extract information from an image. To add files to a test, you can do the following:

from vals import Suite, Test, Check

suite = Suite(
    title = "My Suite with files",
    tests = [
        Test(
            input_under_test="Is there an MFN clause in this contract?",
            files_under_test=["path/to/file.docx"],
            checks=[Check(operator="equals", criteria="No")]
        )
    ]
)

Both the model and the operators will have access to the file content.

Adding Context

We also support adding arbitrary information to the input of each test, in addition to the input and the files. For example, you may want to provide a chat history to the model, provide information about the user who asked the question, specify where in an application the question was asked, etc. You can provide this with the context parameter of the Test:

from vals import Suite, Test, Check

suite = Suite(
    title = "My Suite with context",
    tests = [
        Test(
            input_under_test="What is the MFN clause?",
            context={
                "user_email": "[email protected]",
                "message_history": [
                    {"role": "user", "content": "What can you help me with?"},
                    {"role": "assistant", "content": "I can help you with answering legal questions about contracts."},
                ]
            },
            checks=[Check(operator="equals", criteria="No")]
        )
    ]
)

NOTE: Context field values can be either raw strings or JSON objects. If it is a JSON object, it will be parsed correctly and pretty-printed in the UI.

Adding Tags

You can also add tags to a test. These tags are searchable in the test suite and run result, and you can see a performance breakdown by tag.

Test(
    input_under_test="What is the MFN clause?",
    tags=["contract", "mfn"],
    checks=[Check(operator="grammar")]
)

Adding Global Checks

If you want certain checks to be run on every test, you can add them to the suite with the global_checks parameter. For example, this is how you would check the grammar of every test by default.

suite = Suite(
    title="My Suite with global checks",
    global_checks=[
        Check(operator="grammar")
    ],
    tests=[...],
)

Advanced Check Modifiers

Each check has a set of modifiers that can be used to change its behavior:

severity: Allows you to weight some checks higher than others
examples: Allows you to provide in-context examples of outputs that should pass or fail
extractor: Allows you to extract items from the output before the check is evaluated
conditional: Allows you to only run the check if another check evaluates to true
category: Allows you to override the default category of the check (correctness, formatting, etc.). This is also similar to tags, but allows you to do it on a more granular level.

Check(
    operator = "grammar",
    modifiers=CheckModifiers(
        # Weight three times as important as other checks
        severity=3,
        # In-context examples of outputs that should pass or fail
        examples=[
            Example(type="positive", text="This is an example of good grammar.")
        ],
        # Only evaluate part of the output
        extractor="Extract only the first paragraph",
        # Only run this check if the below passes
        conditional=ConditionalCheck(operator="...", criteria="..."),
        # Override the category
        category="writing_quality"
    )
)

See the modifiers page for more information.

Downloading / Pulling a Test Suite

If a test suite is already in the platform, you can pull it locally to edit or save it. Just copy the suite ID from the test suite page (or from the last portion of the test suite URL). Then call Suite.from_id:

suite = await Suite.from_id("12345678-abcd-efgh-1234-0123456789")

Updating a Test Suite

You can also update the test suite that you have locally. For example, let’s say you want to change the global checks of a suite. You can do this as follows:

# Download suite locally
suite = await Suite.from_id("12345678-abcd-efgh-1234-0123456789")

# Update the suite
suite.global_checks = [Check(operator="grammar")]
await suite.update()

Loading a Test Suite from a File

Although it’s preferred to create a Test Suite with python objects, the test suite can also be loaded from a local JSON file. To create a test suite suite from a file, you can use the Suite.from_file() function.

suite = await Suite.from_file("path/to/test_suite.json")

Here is an example of what the test suite file looks like:

{
  "title": "My Test Suite",
  "description": "This is an example test suite.",
  "global_checks": [{ "operator": "grammar" }],
  "tests": [
    {
      "input_under_test": "What is QSBS?",
      "checks": [{ "operator": "includes", "criteria": "C Corporation" }]
    },
    {
      "input_under_test": "Does this contract have a MFN clause?",
      "context": { "user_email": "[email protected]" },
      "files_under_test": ["path/to/file.docx"],
      "checks": [{ "operator": "equals", "criteria": "No" }]
    }
  ]
}

Using “Right Answers”

To add tests with right answers, just use the golden_answer field in the test. A full example is as follows:

suite = Suite(
    title="My Test Suite with Golden Outputs",
    tests=[{
        "input_under_test": "What is QSBS?",
        "checks": [],
        "golden_output": "QSBS stands for Qualified Small Business Stock, a designation in the U.S. tax code (Section 1202) that offers tax advantages to investors who hold eligible small business stock. If an individual holds QSBS for more than five years, they may be able to exclude up to 100% of the gains from the sale of the stock, subject to certain limits and qualifications. This tax incentive aims to encourage investments in small, innovative businesses. However, the stock must meet specific criteria, including being issued by a C-corporation in certain industries and having gross assets below $50 million when the stock was issued."
    }],
)

run = await suite.run(
    model="openai/gpt-4o-mini",
    wait_for_completion=True,
    parameters=RunParameters()
)
print("Run URL: ", run.url)

See the web app for more information.

Get Started

Web App

SDK

CLI

Creating Test Suites with the SDK

Overview

Basic Test Suite Creation

Tests with Files

Adding Context

Adding Tags

Adding Global Checks

Advanced Check Modifiers

Downloading / Pulling a Test Suite

Updating a Test Suite

Loading a Test Suite from a File

Using “Right Answers”

Get Started

Web App

SDK

CLI

​Overview

​Basic Test Suite Creation

​Tests with Files

​Adding Context

​Adding Tags

​Adding Global Checks

​Advanced Check Modifiers

​Downloading / Pulling a Test Suite

​Updating a Test Suite

​Loading a Test Suite from a File

​Using “Right Answers”

Overview

Basic Test Suite Creation

Tests with Files

Adding Context

Adding Tags

Adding Global Checks

Advanced Check Modifiers

Downloading / Pulling a Test Suite

Updating a Test Suite

Loading a Test Suite from a File

Using “Right Answers”