Generally, our platform works by defining an input, and a set of operator/criteria pairs. However, there’s also another way to do evaluations, where you define an input and a “right answer”. You compare the right answer to the LLM output and see if it matches.

We support this workflow in the platform.

How Right Answer Comparison Works

First, you add a right answer to each test (see below how to this in the web app and SDK).

When you run the test suite, our system will then compare the LLM output to your right answer on three axes:

  • Content: It will compare to the golden answer based on the content similarity - passing if the contents are similar, failing if they are not.
  • Style: It will do a comparison based on the style of the text, passing only if the styles are similar.
  • Format: It will ensure that the outputs are formatted the same.

Using Right Answers in the Web App

To add a Right Answer, in the Add Test pane, press “Add Right Answer”.

Using Right Answers in the SDK

To add tests with right answers, just use the golden_answer field in the test. A full example is as follows:

test = Test(
  input_under_test="What is QSBS?",
  golden_output="Qualified Small Business Stock (QSBS) is a special type of stock issued by certain small, domestic C corporations that provides tax advantages to investors. If held for more than five years, gains from the sale of QSBS may be excluded from federal capital gains taxes, up to certain limits. To qualify, the stock must be acquired directly from a C corporation engaged in active business, with gross assets not exceeding $50 million at the time of issuance.",
  checks=[
      Check(
          operator="includes",
          criteria="C Corporation"
      )
  ]
)