Skip to main content

Overview

The Human Review system allows you to queue test runs for manual evaluation by human reviewers. This provides a way to validate model outputs beyond automated checks.

Adding a Run to Review Queue

Queue a run for human review using the add_to_queue() method:
from vals import Run

run: Run = ...
await run.add_to_queue(
    assigned_reviewers=["reviewer@company.com"],
    number_of_reviews=2,
    rereview_auto_eval=True
)
# Fetch the review after queuing
review = await run.review
print(f"Review status: {review.status}")

Parameters

  • assigned_reviewers - List of reviewer email addresses (empty list allows any reviewer)
  • number_of_reviews - How many reviewers will evaluate each test (default: 1)
  • rereview_auto_eval - Whether to re-run auto-evaluation after reviews (default: True)

Working with Reviews

Once a run is queued, you can access the review through the run.review cached property:
from vals import SingleRunReview

review: SingleRunReview = await run.review

# Basic review information
print(f"Review ID: {review.id}")
print(f"Status: {review.status}")  # Pending, Archived, or Completed
print(f"Created by: {review.created_by}")
print(f"Number of reviews: {review.number_of_reviews}")
print(f"Assigned reviewers: {review.assigned_reviewers}")
# Assign different reviewers
await review.modify_queue(
    assigned_reviewers=["new-reviewer@company.com"]
)

Key Properties

  • id - Same as run.review_id
  • status - Current review status (Pending, Archived, or Completed)
  • pass_rate_human_eval - Pass rate across all human reviews
  • agreement_rate_human_eval - Agreement rate between human reviewers
  • test_results - List of completed test results (cached property, requires await)

Working with Test Results

Access individual test result reviews to get detailed feedback:
# Get all test results from the review
test_results = await review.test_results

# Access individual test result
test_result = test_results[0]
print(f"Reviewed by: {test_result.reviewed_by}")
print(f"Number of reviews: {len(test_result.reviews)}")

# Get the first review for this test result
test_review = test_result.reviews[0]
print(f"Feedback: {test_review.feedback}")
print(f"Completed by: {test_review.completed_by}")
print(f"Status: {test_review.status}")
# Check auto-evaluation review values
for auto_eval in test_review.auto_eval_review_values:
    print(f"Auto eval: {auto_eval.auto_eval}")
    print(f"Human eval: {auto_eval.human_eval}")
    print(f"Flagged: {auto_eval.is_flagged}")

# Check custom review values (if using templates)
for custom_val in test_review.custom_review_values:
    # Access custom review data
    pass

Test Result Properties

  • reviewed_by - List of reviewer email addresses
  • reviews - List of all reviews for this test
  • test - The original test being reviewed
  • check_results - Auto-evaluated check results

Test Review Properties

  • feedback - Optional reviewer feedback
  • completed_by - Reviewer who completed this review
  • completed_at / started_at - Timestamps
  • auto_eval_review_values - Human validation of auto-evaluations
  • custom_review_values - Custom template review data