Skip to main content

Send your data to subject matter experts

Use labeling queues when you want a subject matter expert or third party to label spans without exposing the full traces view. Reviewers get a focused interface showing only what they need to annotate. Once labeled, those examples become the ground truth dataset you use to validate evaluators and run experiments.
Ask Alyx to create a labeling queue, send data to it, and optionally annotate data. For example:
  • “Send spans where latency is over 5 seconds to my Slow Response labeling queue”
  • “Send spans where hallucination eval scored 0 to the Hallucination Review queue”
Tracing view with eval filter applied and Alyx sidebar suggesting sending low-scoring hallucination spans to the Hallucination Review labeling queue

Build a ground truth dataset

A ground truth dataset is a curated set of labeled examples that captures the range of behaviors your system should and should not produce. It gives you a stable benchmark for validating automated evaluators and a reusable dataset to run experiments against as your prompts and models evolve.
Ask Alyx to create a dataset from spans of interest, append spans to an existing dataset, or suggest examples that cover edge cases for your rubric.Example prompts:
  • “Create a dataset from the spans I filtered in this trace view and include inputs and outputs”
  • “Append these high-error spans to my regression benchmark dataset”
  • “Suggest 20 diverse examples for a golden dataset based on my last week’s traces”
Tracing view with span filter applied and Alyx sidebar offering to create a golden dataset from factual spans, with preview table and Accept and Create Dataset action