Skip to main content

Overview

The Human Review system allows you to queue test runs for manual evaluation by human reviewers. This provides a way to validate model outputs beyond automated checks.

Accessing a Run’s Review

Once a run has been queued for human review (via the web app), you can access the review through the SDK:
from vals import Run

run: Run = ...
review = await run.review
print(f"Review status: {review.status}")

Working with Reviews

Once a run is queued, you can access the review through the run.review cached property:
from vals import SingleRunReview

review: SingleRunReview = await run.review

# Basic review information
print(f"Review ID: {review.id}")
print(f"Status: {review.status}")  # Pending, Archived, or Completed
print(f"Created by: {review.created_by}")
print(f"Number of reviews: {review.number_of_reviews}")
print(f"Assigned reviewers: {review.assigned_reviewers}")

Key Properties

  • id - Same as run.review_id
  • status - Current review status (Pending, Archived, or Completed)
  • pass_rate_human_eval - Pass rate across all human reviews
  • agreement_rate_human_eval - Agreement rate between human reviewers
  • agreement_rate_auto_eval - Agreement rate between human and auto evaluations
  • flagged_rate - Rate of flagged test results
  • created_at - When the review queue was created
  • completed_time - When the review was completed (if applicable)
  • test_results - List of completed test results (cached property, requires await)

Working with Test Results

Access individual test result reviews to get detailed feedback:
# Get all test results from the review
test_results = await review.test_results

# Access individual test result
test_result = test_results[0]
print(f"Reviewed by: {test_result.reviewed_by}")
print(f"Number of reviews: {len(test_result.reviews)}")

# Get the first review for this test result
test_review = test_result.reviews[0]
print(f"Feedback: {test_review.feedback}")
print(f"Completed by: {test_review.completed_by}")
print(f"Status: {test_review.status}")
# Check reviewed check results (human validation of auto-evaluations)
for reviewed_check in test_review.reviewed_check_results:
    print(f"Auto eval: {reviewed_check.auto_eval}")
    print(f"Human eval: {reviewed_check.human_eval}")
    print(f"Flagged: {reviewed_check.is_flagged}")
You can also access the first review for a test result using the convenience property:
# Shortcut: get the first review directly
first_review = test_result.review  # equivalent to test_result.reviews[0]

Test Result Properties

  • reviewed_by - List of reviewer email addresses
  • reviews - List of all reviews for this test
  • review - Convenience property returning the first review
  • test - The original test being reviewed
  • check_results - Auto-evaluated check results

Test Review Properties

  • feedback - Optional reviewer feedback
  • completed_by - Reviewer who completed this review
  • completed_at / started_at - Timestamps
  • reviewed_check_results - Human validation of auto-evaluations
  • custom_review_values - Custom template review data (when using custom review templates)