includes
: Determines whether the LLM output mentioned a specified term or phrase, based on its semantic meaning.
satisfies_statement
: Check if the LLM output satisfies the criteria
answers
(unary): Checks that the output answers the question provided in the test input. For instance, if the prompt question is How does one gain US Citizenship? and the LLM answers with I am an AI agent unable to answer your question then this would fail the check because it did not give a sufficient answer to the question. This is used to make sure that questions that should be within the realm of answerable queries are legitimately answered.
not_answers
(unary): Conversely, this operator checks that the output avoids answering the question provided in the test input. For instance, if the prompt question is Can you tell me a joke? and the LLM answers with I am a tax assistant, unable to complete your request. then this would pass the check. This is used to make sure that questions that should not be answered by the agent are not attempted.
excludes
: Determines whether the LLM output mentioned a specified term or phrase, based on its semantic meaning.
equals
: Determines whether the LLM output has semantic equivalency to a specified term or phrase.
includes_nearly_exactly
: This is a more strict version of includes - it needs to include the text near verbatim, but leaves tolerances for differences in formatting or very minor deviations in phrasing.
negative_answer
(unary): This is the inverse of an affirmative answer.
includes_any_v2
: (preferred): This checks the same thing as includes, but it uses semicolon delimiters instead of comma delimiters. It evaluates each component of the inclusion independently, which leads to better performance.
equals_exactly
: Determines whether the output matches the words provided, in their exact form.
includes_exactly
: Determines whether the output includes the words provided, in their exact form.
regex
: Evaluates an arbitrary regex pattern against a given input string.
excludes_exactly
: Determines whether the output excludes the words provided, in their exact form.
is_not_hallucinating
(unary): Ensures that any entities found within the output can also be found verbatim via a Google search. This is especially useful for legal cases: for example, if the LLM mentions “Jeffords v. NY Police Department”, the LLM will determine if this case exists.
no_legal_advice
(unary): This is a constitutional check that determines if the LLM explicitly gives legal advice.
is_safe
(unary): Checks that it does not contain content of one of the following categories: hate, harassment, self-harm, sexual content, or violence.
is_concise
(unary): Ensures that the output is concise.
is_coherent
(unary): This determines if the output logically flows together and makes sense within the context of the prompt.
progresses_conversation
(unary): Ensures the output moves the conversation forward.
grammar
(unary): Ensures the output is grammatically correct.
is_polite
(unary): Ensures the output does not contain any impolite language.
list_format
(unary): Ensures the output is formatted as a list or other enumeration.
valid_json
(unary): Ensures that the output is JSON-parseable.
less_than_length
: Ensures that the output is under a given number of characters.
valid_yaml
(unary): Ensures that the output is YAML parseable.
consistent_with_context
(unary): If the “Context” feature is used, this ensures that the LLM output does not contradict the information within the context provided.
consistent_with_docs
(unary): If any files are uploaded, this ensures that the LLM output does not contradict information contained with the files.
Be specific, as the model will only know what you provide it inside of the prompt.We have a limited amount of variables that you can pass into your prompt.
Variable | Description |
---|---|
{{input}} | The input provided to the LLM under test. Usually contains a question or statement you want the model to respond to |
{{output}} | The output from the model that was returned after responding to the input |
{{criteria}} | The criteria from the check that is defined from within a test |
{{context.input}} | The input context that is defined from within the test that is shared between all checks |
{{context.output}} | The output context contains additional information about the model such as its reasoning |
{{files}} | All the files that were added to the test passed into the prompt defined by the user |
{{context.input.date}}
or {{context.output.user_email}}
.
These fields come from the context that you define inside of the test. We do not currently provide flexibility to use different keys depending on the test. So please ensure that all tests you use this operator with have the required context defined inside of the prompt.
Output context is an additional type of context that we provide access to. This context is automatically supported by thinking models and can be accessed via {{context.output.reasoning}}
.
If you want to use any other output context, you can define it inside of the question answer pair csv when you import as described here.
{{files}}
in your prompt. This will pass in all the documents to the model, so please ensure that you are only using files you want shown to the model under test.