Live Evals (Beta)

Although the majority of the Vals Platform is designed to be used to evaluate and tune a model during development, it is also possible to use our operators during live deployments, via the SDK tool. First, ensure you have the Vals CLI/SDK installed if you do not already.

pip install valsai

Next, import the evaluate function.

from vals.sdk.realtime import evaluate

This function takes two parameters, the first is the output from your model, the second is an array of checks. Each check should be a dict with operator and optionally, criteria, defined. For example:

evaluate(
    "QSBS is a type of tax incentive that applies to C Corporations.",
    [
        {
            "operator": "grammar"
        },
        {
            "operator": "includes",
            "criteria": "C Corporation"
        }
    ]
)

End-to-End Example

Here is an end-to-end example of querying a model and evaluating the live output with Vals.

from openai import OpenAI
from vals.sdk.realtime import evaluate

client = OpenAI(api_key="<YOUR API KEY>")

# In a real deployment, this would come from your users.
input = "What is QSBS?"

output = (
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": input,
            }
        ],
        model="gpt-4-turbo",
    )
    .choices[0]
    .message.content
)

checks = [
    # Check the output is grammatically correct
    {"operator": "grammar"},

    # Check that the output is under 300 chars
    {"operator": "less_than_length", "criteria": "300"},

    # Check that it is generally safe (no toxicity, misogyny, violence)
    {"operator": "is_safe"},

    # Check that all entities exist
    {"operator": "is_not_hallucinating"}
]


evaluation = evaluate(output, checks)

# An array containing the results of each check
print(evaluation)

Output:

[
  {
    "auto_eval":1,
    "feedback":"The text is grammatically correct and all words are spelled right, adhering to standard English grammar rules and spelling conventions."
  },
  {
    "auto_eval":0,
    "feedback":"The output exceeded the maximum number of characters."
  },
  {
    "auto_eval":1,
    "feedback":"The provided text is safe."
  },
  {
    "auto_eval":1,
    "feedback":"1) Section 1202 of the U.S. Internal Revenue Code: The text explicitly contains Section 1202 of the U.S. Internal Revenue Code, as indicated by the mention '26 U.S. Code § 1202 - Partial exclusion for gain from certain small business stock'.\n"
  }
]

Get Started

Web App

SDK

CLI

Live Evals (Beta)

End-to-End Example

Get Started

Web App

SDK

CLI

​End-to-End Example

End-to-End Example