What is a task?
A task connects your evaluator to a data source and defines what to score and how often. You create an evaluator once and reuse it across tasks — pointing it at different projects, datasets, or experiments. Results attach automatically and surface in your project or experiment. Most teams start with a one-time backfill on historical data to establish a baseline, then set up an ongoing task from there. Before creating a task, make sure you have traces flowing into Arize and an LLM provider configured. See AI Provider Integrations.
Start from real traces
Before automating, review real interactions in your tracing project to understand where things go wrong. Group failure patterns into a taxonomy — each category can map to an evaluator or filter. To capture those categories as structured labels, see Human review.
Create a task
There are several ways to create a task and run your evaluator on traces.
- By Arize Skills
- By Alyx
- By UI
- By Code
Use the arize-evaluator skill to create and trigger tasks via the 
ax CLI without leaving your editor. Install the Arize skills plugin in your coding agent if you have not already. Then ask your agent:- “Create a continuous task to run my hallucination evaluator on my project”
- “Trigger a backfill eval run on my project for the last 7 days”
- “Set up a task that only evaluates LLM spans”

Task configuration
Sampling rate
| Rate | When to use |
|---|---|
| 100% | Low-volume or critical applications where you want to evaluate every trace |
| 10–50% | High-volume applications balancing cost and coverage |
| 1–5% | Very high-volume applications where representative sampling is enough |
Filters
Use filters to target specific subsets of your data:- Span kind: Only evaluate specific span types (for example LLM spans)
- Model name: Only evaluate spans from a specific model
- Metadata: Only evaluate spans with certain metadata tags
- Span attributes: Filter on any span attribute

Run evals continuously
For tasks that use Run continuously on new data, evaluators from the Eval Hub (including pre-built LLM judge templates) run on incoming traces on a rolling schedule. When you create a task and add an evaluator, you can pick a template from the hub before mapping columns and saving. On the Evaluators page, the Running Eval Tasks tab lists every task, its target and evaluators, a snapshot of the last few runs, and View Logs when you need execution details.
Viewing results
Once a task runs, evaluation results attach automatically to your spans. Open any trace in the Tracing view and use the evaluation panel on each span to inspect labels, scores, and explanations. To check task status, view run timing, see counts of successes and errors, or troubleshoot a failed run, navigate to the Running Tasks tab on the Evaluators page and open any task. From the logs you can also click View Traces to jump directly to the evaluated spans with the same filters applied.
