Send your data to subject matter experts
Use labeling queues when you want a subject matter expert or third party to label spans without exposing the full traces view. Reviewers get a focused interface showing only what they need to annotate. Once labeled, those examples become the ground truth dataset you use to validate evaluators and run experiments.- By Alyx
- By UI
Ask Alyx to create a labeling queue, send data to it, and optionally annotate data. For example:
- “Send spans where latency is over 5 seconds to my Slow Response labeling queue”
- “Send spans where hallucination eval scored 0 to the Hallucination Review queue”

Build a ground truth dataset
A ground truth dataset is a curated set of labeled examples that captures the range of behaviors your system should and should not produce. It gives you a stable benchmark for validating automated evaluators and a reusable dataset to run experiments against as your prompts and models evolve.- By Alyx
- By UI
Ask Alyx to create a dataset from spans of interest, append spans to an existing dataset, or suggest examples that cover edge cases for your rubric.Example prompts:
- “Create a dataset from the spans I filtered in this trace view and include inputs and outputs”
- “Append these high-error spans to my regression benchmark dataset”
- “Suggest 20 diverse examples for a golden dataset based on my last week’s traces”





