Human Review

Overview

The Human Review system allows you to queue test runs for manual evaluation by human reviewers. This provides a way to validate model outputs beyond automated checks.

Adding a Run to Review Queue

Queue a run for human review using the add_to_queue() method:

from vals import Run

run: Run = ...
await run.add_to_queue(
    assigned_reviewers=["[email protected]"],
    number_of_reviews=2,
    rereview_auto_eval=True
)

# Fetch the review after queuing
review = await run.review
print(f"Review status: {review.status}")

Parameters

assigned_reviewers - List of reviewer email addresses (empty list allows any reviewer)
number_of_reviews - How many reviewers will evaluate each test (default: 1)
rereview_auto_eval - Whether to re-run auto-evaluation after reviews (default: True)

Working with Reviews

Once a run is queued, you can access the review through the run.review cached property:

from vals import SingleRunReview

review: SingleRunReview = await run.review

# Basic review information
print(f"Review ID: {review.id}")
print(f"Status: {review.status}")  # Pending, Archived, or Completed
print(f"Created by: {review.created_by}")
print(f"Number of reviews: {review.number_of_reviews}")
print(f"Assigned reviewers: {review.assigned_reviewers}")

# Assign different reviewers
await review.modify_queue(
    assigned_reviewers=["[email protected]"]
)

Key Properties

id - Same as run.review_id
status - Current review status (Pending, Archived, or Completed)
pass_rate_human_eval - Pass rate across all human reviews
agreement_rate_human_eval - Agreement rate between human reviewers
test_results - List of completed test results (cached property, requires await)

Working with Test Results

Access individual test result reviews to get detailed feedback:

# Get all test results from the review
test_results = await review.test_results

# Access individual test result
test_result = test_results[0]
print(f"Reviewed by: {test_result.reviewed_by}")
print(f"Number of reviews: {len(test_result.reviews)}")

# Get the first review for this test result
test_review = test_result.reviews[0]
print(f"Feedback: {test_review.feedback}")
print(f"Completed by: {test_review.completed_by}")
print(f"Status: {test_review.status}")

# Check auto-evaluation review values
for auto_eval in test_review.auto_eval_review_values:
    print(f"Auto eval: {auto_eval.auto_eval}")
    print(f"Human eval: {auto_eval.human_eval}")
    print(f"Flagged: {auto_eval.is_flagged}")

# Check custom review values (if using templates)
for custom_val in test_review.custom_review_values:
    # Access custom review data
    pass

Test Result Properties

reviewed_by - List of reviewer email addresses
reviews - List of all reviews for this test
test - The original test being reviewed
check_results - Auto-evaluated check results

Test Review Properties

feedback - Optional reviewer feedback
completed_by - Reviewer who completed this review
completed_at / started_at - Timestamps
auto_eval_review_values - Human validation of auto-evaluations
custom_review_values - Custom template review data

Get Started

Web App

SDK

CLI

Human Review

Overview

Adding a Run to Review Queue

Parameters

Working with Reviews

Key Properties

Working with Test Results

Test Result Properties

Test Review Properties

Get Started

Web App

SDK

CLI

​Overview

​Adding a Run to Review Queue

​Parameters

​Working with Reviews

​Key Properties

​Working with Test Results

​Test Result Properties

​Test Review Properties

Overview

Adding a Run to Review Queue

Parameters

Working with Reviews

Key Properties

Working with Test Results

Test Result Properties

Test Review Properties