Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vals.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Human Review system allows you to queue test runs for manual evaluation by human reviewers. This provides a way to validate model outputs beyond automated checks.

Accessing a Run’s Review

Once a run has been queued for human review (via the web app), you can access the review through the SDK:
from vals import Run

run: Run = ...
review = await run.review
print(f"Review status: {review.status}")

Working with Reviews

Once a run is queued, you can access the review through the run.review cached property:
from vals import SingleRunReview

review: SingleRunReview = await run.review

# Basic review information
print(f"Review ID: {review.id}")
print(f"Status: {review.status}")  # Pending, Archived, or Completed
print(f"Created by: {review.created_by}")
print(f"Number of reviews: {review.number_of_reviews}")
print(f"Assigned reviewers: {review.assigned_reviewers}")

Key Properties

  • id - Same as run.review_id
  • status - Current review status (Pending, Archived, or Completed)
  • pass_rate_human_eval - Pass rate across all human reviews
  • agreement_rate_human_eval - Agreement rate between human reviewers
  • agreement_rate_auto_eval - Agreement rate between human and auto evaluations
  • flagged_rate - Rate of flagged test results
  • created_at - When the review queue was created
  • completed_time - When the review was completed (if applicable)
  • test_results - List of completed test results (cached property, requires await)

Working with Test Results

Access individual test result reviews to get detailed feedback:
# Get all test results from the review
test_results = await review.test_results

# Access individual test result
test_result = test_results[0]
print(f"Reviewed by: {test_result.reviewed_by}")
print(f"Number of reviews: {len(test_result.reviews)}")

# Get the first review for this test result
test_review = test_result.reviews[0]
print(f"Feedback: {test_review.feedback}")
print(f"Completed by: {test_review.completed_by}")
print(f"Status: {test_review.status}")
# Check reviewed check results (human validation of auto-evaluations)
for reviewed_check in test_review.reviewed_check_results:
    print(f"Auto eval: {reviewed_check.auto_eval}")
    print(f"Human eval: {reviewed_check.human_eval}")
    print(f"Flagged: {reviewed_check.is_flagged}")
You can also access the first review for a test result using the convenience property:
# Shortcut: get the first review directly
first_review = test_result.review  # equivalent to test_result.reviews[0]

Test Result Properties

  • reviewed_by - List of reviewer email addresses
  • reviews - List of all reviews for this test
  • review - Convenience property returning the first review
  • test - The original test being reviewed
  • check_results - Auto-evaluated check results

Test Review Properties

  • feedback - Optional reviewer feedback
  • completed_by - Reviewer who completed this review
  • completed_at / started_at - Timestamps
  • reviewed_check_results - Human validation of auto-evaluations
  • custom_review_values - Custom template review data (when using custom review templates)