Overview
The Human Review system allows you to queue test runs for manual evaluation by human reviewers. This provides a way to validate model outputs beyond automated checks.Accessing a Run’s Review
Once a run has been queued for human review (via the web app), you can access the review through the SDK:Working with Reviews
Once a run is queued, you can access the review through therun.review cached property:
Key Properties
id- Same asrun.review_idstatus- Current review status (Pending, Archived, or Completed)pass_rate_human_eval- Pass rate across all human reviewsagreement_rate_human_eval- Agreement rate between human reviewersagreement_rate_auto_eval- Agreement rate between human and auto evaluationsflagged_rate- Rate of flagged test resultscreated_at- When the review queue was createdcompleted_time- When the review was completed (if applicable)test_results- List of completed test results (cached property, requiresawait)
Working with Test Results
Access individual test result reviews to get detailed feedback:Test Result Properties
reviewed_by- List of reviewer email addressesreviews- List of all reviews for this testreview- Convenience property returning the first reviewtest- The original test being reviewedcheck_results- Auto-evaluated check results
Test Review Properties
feedback- Optional reviewer feedbackcompleted_by- Reviewer who completed this reviewcompleted_at/started_at- Timestampsreviewed_check_results- Human validation of auto-evaluationscustom_review_values- Custom template review data (when using custom review templates)