Import and Export

This page covers how to import and export data to and from the Vals platform: setting up a test suite, uploading historical Q&A pairs, or pulling run results for offline analysis.

To get started

If you’re building a new suite, start with Importing Data.

If you’ve already run an evaluation and want to analyze results, jump to Exporting Results.

Importing Data

Test Suite

A full test suite import includes tests, checks, context, tags, and any associated files. Only Test Input is required; all other fields are optional, so you can import inputs alone without defining any checks. Imported tests are appended after any existing tests in the suite. When tests are imported, they are added / appended to the test suite after existing tests. If you want to import existing model outputs and run checks against them, see this page

Supported formats: CSV, JSON, ZIP

If your tests include file attachments (documents, images, etc.), use ZIP. Attached files should be stored under documents/ inside the ZIP.

📎 View CSV Example · 📎 View CSV Example (with Right Answer) · 📎 View ZIP Example (Files)

Test Columns

These columns define an individual test: the input sent to the model and any supporting context. Only Test Input field is required.

Column	Type	Description
Test Input	str	The prompt or question sent to the LLM (e.g., What is burden shifting under Title VII?)
Right Answer *	str	The expected correct answer
Tags	str	Labels for organizing tests (e.g., `math`, `law`). Spread across rows. See Formatting Rules
Files * *	str	Filename or path to an attached file (e.g., `documents/doc1.pdf`)
Context Keys	str	Key for injecting context into the test (e.g., `date`)
Context Values	str	Corresponding value for the context key (e.g., `2024-01-01`)

* In most cases, either Right Answer or Checks are used. Learn more about Right Answer
** Files will only work as expected for the .zip upload.

Check Columns

Checks define how LLM responses are evaluated within a test.

Column	Type	Description
Operator	str	The evaluation method (e.g., `includes`, `excludes`)
Criteria	str	The value checked against the LLM response (e.g., `age`, `sex`, `religion`)

Advanced Options

Column	Type	Description
Weight	int	Numeric importance for scoring (e.g., `1`, `2`)
Category	str	Label grouping checks by purpose (e.g., `Style`, `Correctness`)
Extraction Prompt	str	Instructions for pulling a specific value from the LLM output before evaluating
Conditional Operator	str	Operator for conditional evaluation
Conditional Criteria	str	Criteria used in conditional evaluation
Example Type	str	`positive` (should pass) or `negative` (should fail)
Example Value	str	A sample value used for the check

Tips for Formatting Imports

Spreading values down columns

For fields that support multiple values (like Tags or Checks), each value goes in its own row beneath the test, rather than being comma-separated in a single cell. Example:

Test Id	Test Input	Tags	Operator	Criteria
19025787-…	Where is the Bay Area located?	Bay	includes	California
		Easy	includes_exactly	Northern California, United States
			excludes	Los Angeles
			excludes_exactly	Atlantic Ocean

Each row without a new Test Input belongs to the previous test. Tags stack down, and each check gets its own row.

Encoding

We support multiple encoding types. UTF-8 is strongly recommended for compatibility.

Criteria requirement

All non-unary operators require a Criteria value. Leaving it blank will cause the import to fail.

Global checks

When importing a file with global checks and tests, include a blank row between the global checks section and the tests section.

Exporting Results

Test Suite

Supported formats: CSV, JSON, ZIP

Test Columns

Column	Description
Test Id	Unique identifier for the test
Test Input	The prompt or question sent to the LLM (e.g., What is burden shifting under Title VII?)
Right Answer	The expected correct answer
Tags	Labels for organizing tests (e.g., `math`, `law`). Spread across rows. See Formatting Rules
Files	Filename or path to an attached file (e.g., `documents/doc1.pdf`)
Context Keys	Key for injecting context into the test (e.g., `date`)
Context Values	Corresponding value for the context key (e.g., `2024-01-01`)

Check Columns

Checks define how LLM responses are evaluated within a test.

Column	Description
Operator	The evaluation method (e.g., `includes`, `excludes`)
Criteria	The value checked against the LLM response (e.g., `age`, `sex`, `religion`)
Weight	Numeric importance for scoring (e.g., `1`, `2`)
Category	Label grouping checks by purpose (e.g., `Style`, `Correctness`)
Extraction Prompt	Instructions for pulling a specific value from the LLM output before evaluating
Conditional Operator	Operator for conditional evaluation
Conditional Criteria	Criteria used in conditional evaluation
Example Type	`positive` (should pass) or `negative` (should fail)
Example Value	A sample value used for the check

Auto Eval Results

Results are best reviewed directly in the platform. If you need to export them for custom reporting or offline storage, we support CSV and JSON.

Supported formats: CSV, JSON

We recommend CSV if the data needs to be reviewed by non-technical users, and JSON for any programmatic use case.

📎 View CSV Example · 📎 View JSON Example

Run Result

Top-level summary for the entire evaluation run.

Column	Description
Run Id	Unique identifier for the run
Test Suite Id	ID of the suite that was run
Test Suite Title	Name of the suite
Run Status	Outcome of the run (e.g., `success`, `error`)
Run Error Message	Error message if the run failed
Run Error Analysis	LLM-generated analysis of failed check feedback
Completed At	When the run finished
Run Parameters	Configuration used during the run
Percent Of Checks Passed	Share of individual checks that passed
Amount Of Checks Passed	Count of checks that passed
Standard Deviation For Checks Passed	Variability in check pass rates
Percent Of Tests Passed	Share of tests where all checks passed
Amount Of Tests Passed	Count of fully passing tests
Standard Deviation For Tests Passed	Variability in test pass rates
Needs Review Percentage	Share of results flagged for human review

Test Results

Per-test breakdown of inputs, outputs, and token usage.

Column	Description
Test Result Id	Unique identifier for this test result
Test Id	Identifier of the originating test
Test Status	Outcome (e.g., `success`, `error`)
Test Error Message	Error message if the test failed
Test Input	The prompt sent to the LLM
LLM Output	The response generated by the LLM
Files	Files passed to the LLM during the test
In Tokens	Number of input tokens consumed
Out Tokens	Number of output tokens generated
Duration	Time taken to generate the response (seconds)
Input Context Keys	Context keys used in this test
Input Context Values	Corresponding context values
Output Context Keys	Keys referencing extracted output context
Output Context Values	Extracted values from the LLM output

Check Results (Auto Eval)

Note: Check columns in an export differ from check columns in a suite definition. Export checks reflect evaluation outcomes, not configuration.

Column	Description
Operator	The evaluation operator used
Criteria	The criteria evaluated against the LLM output
Auto Eval	Evaluation result (e.g., `pass`, `fail`; numeric scores left as-is)
Edited Auto Eval	Overridden score, if a human reviewer modified the result
Edited Auto Eval Feedback	Reviewer’s reason for the override
Confidence Level	Model’s confidence in its evaluation (e.g., `high`, `low`)
Feedback	LLM-generated explanation of the evaluation decision
Weight	The check’s scoring weight
Extractor	Value extracted from LLM output using the extraction prompt
Conditional Operator	Operator used in conditional evaluation
Conditional Criteria	Criteria for the conditional check
Category	Check category (e.g., `Style`, `Correctness`)
Example Type	`positive` or `negative`
Example Value	Example value associated with the check

Human Review Results

Export completed human review data for analyzing reviewer agreement, test-level feedback, and metric evaluations outside the platform. Supported formats: CSV

Only completed reviews will be included in exports.

📎 View Human Review Example

Run Review

Column	Description
Run Id	Identifier of the evaluated run
Run Name	Display name of the run
Run Review Status	Review completion status (e.g., `completed`)
Run Review Created By	User who initiated the review
Run Review Created At	When the review was created
Run Review Completion Time	When the review was completed
Number Of Reviews	Number of reviewers assigned per test result
Assigned Reviewers	Users assigned to review
Pass Rate	Share of checks marked as pass by reviewers
Flagged Rate	Share of checks flagged by reviewers
Auto Eval ↔ Reviewer Agreement	Agreement rate between automated and human scores
Reviewer ↔ Reviewer Agreement	Agreement rate across reviewers

Test Review

Column	Description
Test Result Id	Identifier for the reviewed test result
Test Input	The original prompt
LLM Output	The LLM’s response
Files	Files included in the test
Completed At	When the review was submitted
Completed By	Reviewer who completed it
Test Review Feedback	Reviewer’s written feedback

Human Review Check

Column	Description
Check Type	`Auto-eval review` for checks from auto eval; otherwise, drawn from the review template
Metric Name	Name of the metric from the review template (blank for auto eval checks)
Operator	Evaluation operator
Criteria	Criteria evaluated
Auto Eval	Original automated score
Reviewer Response	Human reviewer’s score or feedback (e.g., `pass`, `fail`)

Troubleshooting

Common issues:

Test Input is missing: Ensure every test has a value in the Test Input column.
Global checks: If your file includes global checks alongside tests, leave a blank row between the global checks setion and the tests section.
Duplicate questions when importing Q&A pairs: Check for repeated rows in your CSV.
Missing criteria: Non-unary operators require a Criteria value. Don’t leave this blank.
Values comma-separated in one cell instead of spread across rows: See Formatting Rules above.

If you run into an issue that isn’t covered here, reach out at contact@vals.ai.

Get Started

Web App

SDK

CLI

Import and Export

Importing Data

Test Suite

Test Columns

Check Columns

Tips for Formatting Imports

Spreading values down columns

Encoding

Criteria requirement

Global checks

Exporting Results

Test Suite

Test Columns

Check Columns

Auto Eval Results

Run Result

Test Results

Check Results (Auto Eval)

Human Review Results

Run Review

Test Review

Human Review Check

Troubleshooting

Get Started

Web App

SDK

CLI

​Importing Data

​Test Suite

​Test Columns

​Check Columns

​Tips for Formatting Imports

​Spreading values down columns

​Encoding

​Criteria requirement

​Global checks

​Exporting Results

​Test Suite

​Test Columns

​Check Columns

​Auto Eval Results

​Run Result

​Test Results

​Check Results (Auto Eval)

​Human Review Results

​Run Review

​Test Review

​Human Review Check

​Troubleshooting

Importing Data

Test Suite

Test Columns

Check Columns

Tips for Formatting Imports

Spreading values down columns

Encoding

Criteria requirement

Global checks

Exporting Results

Test Suite

Test Columns

Check Columns

Auto Eval Results

Run Result

Test Results

Check Results (Auto Eval)

Human Review Results

Run Review

Test Review

Human Review Check

Troubleshooting