Operators

We have support for a variety of operators, described briefly below. Operators can be binary, which means they take additional information (we call this criteria), or they can be unary, which means that they do not take additional information.

For example, when you use the “includes” operator, the criteria is the text that you expect to find in the output. However, when you use the “is_concise” operator, which checks how verbose the output is, you would not provide additional information.

Semantic Operators

excludes: Determines whether the LLM output did not mention specified term or phrase, based on its semantic meaning.

equals: Determines whether the LLM output has semantic equivalency to a specified term or phrase.

equal_intent: This operator determines if two pieces of text have the same intention behind them and the same key points, with tolerance for minor differences in wording or the addition of non-essential details.

includes: Determines whether the LLM output mentioned a specified term or phrase, based on its semantic meaning.

includes_any (deprecated): Determines whether the LLM output includes at least one of the provided terms or phrases, based on its semantic meaning. Options should be provided in the following format: [option a], [option b], [option c].

includes_any_v2 (preferred): This checks the same thing as includes, but it uses semicolon delimiters instead of comma delimiters. It evaluates each component of the inclusion independently, which leads to better performance.

logical_includes: Similar to includes, but it strictly checks the logical equivalency of statements. For example, it would perform better at distinguishing between “A is true and B is true, or C is true and D is true” from “A is true or B is true, and C is true or D is true.”

includes_nearly_exactly: This is a more strict version of includes - it needs to include the text near verbatim, but leaves tolerances for differences in formatting or very minor deviations in phrasing.

satisfies_statement: Ensures that the criteria is a true statement about the output. For example, let’s say your LLM output was an essay. Your criteria might be “Every paragraph in the text starts with a topic sentence.”

answers: Checks that the output answers the question provided in the test input. For instance, if the prompt question is How does one gain US Citizenship? and the LLM answers with I am an AI agent unable to answer your question then this would fail the check because it did not give a sufficient answer to the question. This is used to make sure that questions that should be within the realm of answerable queries are legitimately answered.

not_answers: Conversely, this operator checks that the output avoids answering the question provided in the test input. For instance, if the prompt question is Can you tell me a joke? and the LLM answers with I am a tax assistant, unable to complete your request. then this would pass the check. This is used to make sure that questions that should not be answered by the agent are not attempted.

consistent_phrasing: This is a niche operator, but useful if you want to make sure you use similar phrasing across an entire document. For example, let’s say that your legal document should always use the word “Thus”, and never the phrases “Therefore”, “As a Result”, “Consequently”, etc. If you use the ‘consistent_phrasing’ operator with the criteria ‘Thus’, it will fail if any of these synonyms are used.

consistent_with_json: The criteria is a piece of JSON that contains some data. This operator checks if the output of the model matches the piece of JSON data.

affirmative_answer (unary): If the prompt is a Yes / No question, this operator checks if the output answers affirmatively. For example, if your prompt is “Can felonies be prosecuted after two years”, then this would pass if the output is “Yes.” However, unlike the includes operator, it would also pass if the output is “Felonies can be prosecuted after two years.”

negative_answer (unary): This is the inverse of an affirmative answer.

is_not_generated (unary): This operator checks to ensure that the output does not contain any signifiers that it may be generated by an LLM. For example, this could be templated items (e.g. {{name}} in an email), phrases like “As a large language model”, etc.

matches_tone: This ensures that the tone of the LLM output and the tone of the criteria are the same.

includes_only (binary): This operator is used to ensure that an output contains specific content, but does not include alternate answers or content in addition to this correct answer. It can be used with a single value in the criteria. To check that an answer includes only a list of specific items, the criteria should be a semicolon-separated list of those items.

For instance, in a question asking what the 3 Unforgivable Curses are in Harry Potter, we would use the includes_only criteria “Avada Kedavra; Crucio; Imperio” to make sure that those were the only three answers provided.

This operator works by comparing lists (or single elements) of “right” answers. By default, this comparison is made against the list of key ideas pulled from the generated answer. You can specify how to extract this list for evaluation with the extractor modifier. This means that a test fails if either an element of the criteria was not found in the extracted list OR any element from the extracted list is not found in the expected criteria. Read more about using the extractor modifier in the extractor section.

Constitutional Checks

is_concise (unary): This is a constitutional check that uses a secondary LLM to evaluate whether the output is concise.

no_legal_advice (unary): This is a constitutional check that determines if the LLM explicitly gives legal advice.

is_coherent (unary): This determines if the output logically flows together and makes sense within the context of the prompt.

is_polite (unary): Ensures the output does not contain any impolite language.

progresses_conversation (unary): Ensures the output moves the conversation forward.

grammar (unary): Ensures the output is grammatically correct.

Safety

is_safe: Checks that it does not contain content of one of the following categories: hate, harassment, self-harm, sexual content, or violence.

safety_consistency: Checks that the safety of the criteria is the same as the safety of the output. So, if the criteria contains violent content, it would check that the output also contains violent content. This operator is useful to ensure that an LLM is not censoring sensitive content when that is not desired.

Basic Checks

includes_exactly: Determines whether the output includes the words provided, in their exact form.

excludes_exactly: Determines whether the output excludes the words provided, in their exact form.

equals_exactly: Determines whether the output matches the words provided, in their exact form.

less_than_length: Ensures that the output is under a given number of characters.

Syntax Check

valid_json (unary): Ensures that the output is JSON parseable.

valid_yaml (unary): Ensures that the output is YAML parseable.

regex: Evaluates an arbitrary regex pattern against a given input string.

matches_json_schema: Provide a JSON schema file to define the fields/attributes/types that are are expected for an output and ensure that these expectations are met.

Format Checking

list_format: Ensures the output is formatted as a list or other enumeration

paragraph_format: Ensures the output is formatted as one or multiple distinct paragraphs

matches_format: Ensures that the output and the criteria are formatted in the same way.

Hallucination Checking

is_not_hallucinating: Ensures that any entities found within the output can also be found verbatim via a Google search. This is especially useful for legal cases: for example, if the LLM mentions “Jeffords v. NY Police Department”, the LLM will determine if this case exists.

consistent_with_files: If any files are uploaded, this ensures that the LLM output does not contradict information contained with the files.

consistent_with_context: If the “Context” feature is used, this ensures that the LLM output does not contradict the information within the context provided.

Example Operator Usage

You can see example checks for every operator by navigating to the test suites page, clicking “Import Quick Start Test Suite”, then choosing Suite Family: “Basic Examples” and “operator_examples”.