The ax tasks commands are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.
The ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.
ax tasks list
List evaluation tasks, optionally filtered by space, project, dataset, or type.
ax tasks list [--space <id>] [--project <id>] [--dataset <id>] [--name <filter>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]
| Option | Description |
|---|
--space | Filter tasks by space name or ID |
--project | Filter tasks by project name or ID |
--dataset | Filter tasks by dataset name or ID |
--name | Case-insensitive substring filter on task name |
--task-type | Filter by type: template_evaluation or code_evaluation |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
Examples:
ax tasks list --space sp_abc123
ax tasks list --space sp_abc123 --task-type template_evaluation
ax tasks list --project proj_abc123 --output tasks.json
ax tasks create
Create a new evaluation task. Either --project or --dataset must be provided, but not both. Required options will be prompted interactively if not passed as flags.
ax tasks create \
--name <name> \
--task-type <type> \
--evaluators <json-array> \
(--project <name-or-id> | --dataset <name-or-id>)
| Option | Description |
|---|
--name | Task name (must be unique within the space) |
--task-type | template_evaluation or code_evaluation |
--evaluators | JSON array of evaluator objects (see format below) |
--project | Target project name or ID; mutually exclusive with --dataset |
--space | Space name or ID (required when resolving --project or --dataset by name) |
--dataset | Target dataset name or ID; mutually exclusive with --project |
--experiment-ids | Comma-separated experiment global IDs (required for dataset-based tasks) |
--sampling-rate | Fraction of spans to evaluate, 0–1 (project-based tasks only) |
--is-continuous / --no-continuous | Run task continuously on incoming data |
--query-filter | Task-level SQL-style filter applied to all evaluators |
Evaluators JSON format:
[
{
"evaluator_id": "ev_abc123",
"query_filter": null,
"column_mappings": null
}
]
Examples:
Project-based task (continuous):
ax tasks create \
--name "Relevance Monitor" \
--task-type template_evaluation \
--project proj_abc123 \
--evaluators '[{"evaluator_id": "ev_abc123"}]' \
--is-continuous \
--sampling-rate 0.1
Dataset-based task:
ax tasks create \
--name "Experiment Evaluation" \
--task-type template_evaluation \
--dataset ds_xyz789 \
--experiment-ids "exp_abc123,exp_def456" \
--evaluators '[{"evaluator_id": "ev_abc123"}]' \
--no-continuous
ax tasks get
Get a task by name or ID.
ax tasks get <name-or-id>
Example:
ax tasks trigger-run
Trigger an on-demand run for a task. The run starts in pending status. Pass --wait to block until the run reaches a terminal state.
ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--wait] [--poll-interval <s>] [--timeout <s>]
| Option | Description |
|---|
--data-start-time | ISO 8601 start of the data window to evaluate |
--data-end-time | ISO 8601 end of the data window (defaults to now) |
--max-spans | Maximum number of spans to process (default: 10 000) |
--override-evaluations / --no-override-evaluations | Re-evaluate data that already has labels |
--experiment-ids | Comma-separated experiment global IDs (dataset-based tasks only) |
--wait / -w | Block until the run reaches a terminal state |
--poll-interval | Seconds between polling attempts when using --wait (default: 5) |
--timeout | Maximum seconds to wait when using --wait (default: 600) |
Examples:
# Trigger a run and return immediately
ax tasks trigger-run task_abc123
# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
--data-start-time 2024-01-01T00:00:00Z \
--data-end-time 2024-02-01T00:00:00Z
# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait
# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10
ax tasks list-runs
List runs for a task, with optional status filtering.
ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]
| Option | Description |
|---|
--status | Filter by run status: pending, running, completed, failed, cancelled |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
Examples:
ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json
ax tasks get-run
Get a task run by its global ID.
ax tasks get-run <run-id>
Example:
ax tasks get-run run_abc123
ax tasks cancel-run
Cancel a task run. Only valid when the run is pending or running.
ax tasks cancel-run <run-id> [--force]
| Option | Description |
|---|
--force | Skip the confirmation prompt |
Examples:
ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force
ax tasks wait-for-run
Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.
ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]
| Option | Description |
|---|
--poll-interval | Seconds between polling attempts (default: 5) |
--timeout | Maximum seconds to wait before failing (default: 600) |
Example:
ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10