Vals.ai CLI
Intro
The CLI is a tool for interacting with the Vals.ai platform from the command line. It’s good for simple workflows, such as running a test suite or pulling a run, but for more complex workflows, it’s recommended to use the SDK.
Install
Authentication
Make an account at platform.vals.ai and confirm your email.
To authenticate with the CLI, go to the Admin page of the console, and create an API key. Next, in your environment, run the following:
To make this permanent, add it to your .zshrc. If you are using a method other than environment variables to manage your credentials, you can also set the API key with the following line of python code:
Ensure that you never commit any credentials to any version control system such as Git.
Overall Usage
The CLI is run as follows:
Use the --help
flag at the top and subcommand level for guidance.
Commands must be run from the pip environment the cli was installed in. Commands are split up into subcommands. Currently, there are two subcommands:
vals suite
: relating to creating / updating tests and suitesvals run
: relating to creating and querying runs and run results.vals rag
: Relating to RAG workflows (see the RAG Debug page for more information)
Creating, Modifying, and Viewing Test Suites
You can create a test suite from JSON file. The file is the same format as the one used in the SDK.
The vals suite create
commands will produce a link to the created test suite.
To update a test suite, the workflow is similar, but you should provide the id of the test suite you want to update.
To pull a test suite locally (either to CSV or JSON), you can use the vals suite pull
command.
To list all test suites, you can use the vals suite list
command.
Running a Test Suite
To run a test suite, you can use the vals suite run
command.
There are a number of optional flags you can provide to the command, in addition to the model flag:
--wait_for_completion
: Wait for the run to complete before exiting the command.--run-name
: Provide a name for the run for easy identification.--eval-model
: Model to use as the LLM-as-judge.
To see the full list of flags, do
Viewing and Querying Completed Runs
To list all of the runs you’ve done, you can use the vals run list
command.
You can also view only runs for a single test suite:
You can pull the data for a single run similarly to how you pull a test suite:
Or, to pull as JSON: