Tasks - Arize AX Docs

The ax tasks commands are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.

The ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.

`ax tasks list`

List evaluation tasks, optionally filtered by space, project, dataset, or type.

ax tasks list [--space <id>] [--project <id>] [--dataset <id>] [--name <filter>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]

Option	Description
`--space`	Filter tasks by space name or ID
`--project`	Filter tasks by project name or ID
`--dataset`	Filter tasks by dataset name or ID
`--name`	Case-insensitive substring filter on task name
`--task-type`	Filter by type: `template_evaluation`, `code_evaluation`, or `run_experiment`
`--limit`	Maximum number of results to return (default: 15)
`--cursor`	Pagination cursor for the next page

Examples:

ax tasks list --space sp_abc123
ax tasks list --space sp_abc123 --task-type template_evaluation
ax tasks list --project proj_abc123 --output tasks.json

`ax tasks create`

Create a new task. Dispatches internally based on --task-type. For evaluation tasks (template_evaluation or code_evaluation), either --project or --dataset must be provided, but not both. Run-experiment tasks (run_experiment) require --dataset and --run-configuration.

ax tasks create \
  --name <name> \
  --task-type <type> \
  [--evaluators <json-array>] \
  [--run-configuration <json>] \
  (--project <name-or-id> | --dataset <name-or-id>)

Option	Description
`--name`	Task name (must be unique within the space)
`--task-type`	`template_evaluation`, `code_evaluation`, or `run_experiment`
`--evaluators`	JSON array of evaluator objects (required for evaluation tasks; see format below)
`--run-configuration`	JSON object (or `@file.json`) specifying the run configuration (required for `run_experiment` tasks)
`--project`	Target project name or ID; mutually exclusive with `--dataset` (evaluation tasks only)
`--space`	Space name or ID (required when resolving `--project` or `--dataset` by name)
`--dataset`	Target dataset name or ID; mutually exclusive with `--project` for evaluation tasks; required for `run_experiment` tasks
`--experiment-ids`	Comma-separated experiment global IDs (evaluation tasks only)
`--sampling-rate`	Fraction of spans to evaluate, 0–1 (project evaluation tasks only)
`--is-continuous / --no-continuous`	Run task continuously on incoming data (evaluation tasks only)
`--query-filter`	Task-level SQL-style filter applied to all evaluators (evaluation tasks only)

Evaluators JSON format:

[
  {
    "evaluator_id": "ev_abc123",
    "query_filter": null,
    "column_mappings": null
  }
]

Run configuration JSON format (run_experiment tasks):

{
  "experiment_type": "llm_generation",
  "ai_integration_id": "...",
  "model_name": "gpt-4o",
  "messages": [{"role": "user", "content": "{{input}}"}]
}

Examples: Project-based evaluation task (continuous):

ax tasks create \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --is-continuous \
  --sampling-rate 0.1

Dataset-based evaluation task:

ax tasks create \
  --name "Experiment Evaluation" \
  --task-type template_evaluation \
  --dataset ds_xyz789 \
  --experiment-ids "exp_abc123,exp_def456" \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --no-continuous

Run-experiment task:

ax tasks create \
  --name "GPT-4o Summarization" \
  --task-type run_experiment \
  --dataset ds_xyz789 \
  --run-configuration '{"experiment_type": "llm_generation", "ai_integration_id": "ai_abc", "model_name": "gpt-4o", "messages": [{"role": "user", "content": "{{input}}"}]}'

`ax tasks create-evaluation`

Create a new evaluation task (template_evaluation or code_evaluation). Requires --name, --task-type, --evaluators, and one of --project / --dataset.

ax tasks create-evaluation \
  --name <name> \
  --task-type <type> \
  --evaluators <json-array> \
  (--project <name-or-id> | --dataset <name-or-id>)

Option	Description
`--name`	Task name (must be unique within the space)
`--task-type`	`template_evaluation` or `code_evaluation`
`--evaluators`	JSON array of evaluator objects (see format above)
`--project`	Target project name or ID; mutually exclusive with `--dataset`
`--space`	Space name or ID (required when using a project name)
`--dataset`	Target dataset name or ID; mutually exclusive with `--project`
`--experiment-ids`	Comma-separated experiment global IDs (required for dataset-based tasks)
`--sampling-rate`	Fraction of data to evaluate, 0–1 (project tasks only)
`--is-continuous / --no-continuous`	Run task continuously on incoming data
`--query-filter`	Task-level query filter applied to all evaluators

Example:

ax tasks create-evaluation \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --sampling-rate 0.1 \
  --is-continuous

`ax tasks create-run-experiment`

Create a new run_experiment task. Requires --name, --dataset, and --run-configuration.

ax tasks create-run-experiment \
  --name <name> \
  --dataset <name-or-id> \
  --run-configuration <json>

Option	Description
`--name`	Task name (must be unique within the space)
`--dataset`	Dataset name or ID to run experiments against
`--run-configuration`	JSON object (or `@file.json`) specifying the run configuration
`--space`	Space name or ID

Example:

ax tasks create-run-experiment \
  --name "GPT-4o Summarization" \
  --dataset ds_xyz789 \
  --run-configuration @./run_config.json

`ax tasks get`

Get a task by name or ID.

ax tasks get <name-or-id>

Example:

ax tasks get task_abc123

`ax tasks update`

Update mutable fields on an existing task. The SDK auto-dispatches based on the task’s type; providing a field invalid for the resolved task type raises an error. At least one field must be provided.

ax tasks update <name-or-id> [--space <id>] [--name <name>] [--sampling-rate <n>] [--is-continuous|--no-continuous] [--query-filter <expr>] [--evaluators <json>] [--run-configuration <json>]

Option	Description
`--space`, `-s`	Space name or ID (required when resolving task by name)
`--name`, `-n`	New task display name
`--sampling-rate`	Sampling rate between 0 and 1 (evaluation tasks only)
`--is-continuous` / `--no-continuous`	Whether the task runs continuously (evaluation tasks only)
`--query-filter`	Task-level query filter (evaluation tasks only). Pass `--query-filter ""` to clear the existing filter.
`--evaluators`	JSON array replacing the full evaluator list (evaluation tasks only; same shape as `ax tasks create --evaluators`)
`--run-configuration`	JSON object (or `@file.json`) replacing the run configuration (`run_experiment` tasks only). The entire stored config is atomically replaced.

Example:

ax tasks update task_abc123 --name "Relevance Monitor v2" --sampling-rate 0.25

`ax tasks delete`

Delete a task and its associated configuration. This operation is irreversible.

ax tasks delete <name-or-id> [--space <id>] [--force]

Option	Description
`--space`, `-s`	Space name or ID (required when resolving task by name)
`--force`, `-f`	Skip the confirmation prompt

Example:

ax tasks delete task_abc123 --force

`ax tasks trigger-run`

Trigger an on-demand run for a task. The run starts in pending status. The SDK auto-dispatches based on the task’s type; providing a flag invalid for the resolved task type raises an error. Pass --wait to block until the run reaches a terminal state.

ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--example-ids <ids>] [--evaluation-task-ids <ids>] [--experiment-name <name>] [--dataset-version-id <id>] [--max-examples <n>] [--tracing-metadata <json>] [--wait] [--poll-interval <s>] [--timeout <s>]

Option	Description
`--data-start-time`	ISO 8601 start of the data window to evaluate (evaluation tasks only)
`--data-end-time`	ISO 8601 end of the data window (evaluation tasks only, defaults to now)
`--max-spans`	Maximum number of spans to process (evaluation tasks only, default: 10 000)
`--override-evaluations / --no-override-evaluations`	Re-evaluate data that already has labels (evaluation tasks only)
`--experiment-ids`	Comma-separated experiment global IDs (dataset-based evaluation tasks only)
`--example-ids`	Comma-separated dataset example global IDs to run against (`run_experiment` tasks only). Mutually exclusive with `--max-examples`.
`--evaluation-task-ids`	Comma-separated task global IDs of evaluation tasks to trigger after the experiment run completes (`run_experiment` tasks only)
`--experiment-name`	Display name for the experiment to be created (required for `run_experiment` tasks)
`--dataset-version-id`	Dataset version global ID (base64); defaults to the latest version (`run_experiment` tasks only)
`--max-examples`	Maximum number of examples to run (`run_experiment` tasks only)
`--tracing-metadata`	JSON object (or `@file.json`) of key/value pairs attached to experiment traces (`run_experiment` tasks only)
`--wait` / `-w`	Block until the run reaches a terminal state
`--poll-interval`	Seconds between polling attempts when using `--wait` (default: 5)
`--timeout`	Maximum seconds to wait when using `--wait` (default: 600)

Examples:

# Trigger a run and return immediately
ax tasks trigger-run task_abc123

# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
  --data-start-time 2024-01-01T00:00:00Z \
  --data-end-time 2024-02-01T00:00:00Z

# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait

# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10

`ax tasks list-runs`

List runs for a task, with optional status filtering.

ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]

Option	Description
`--status`	Filter by run status: `pending`, `running`, `completed`, `failed`, `cancelled`
`--limit`	Maximum number of results to return (default: 15)
`--cursor`	Pagination cursor for the next page

Examples:

ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json

`ax tasks get-run`

Get a task run by its global ID.

ax tasks get-run <run-id>

Example:

ax tasks get-run run_abc123

`ax tasks cancel-run`

Cancel a task run. Only valid when the run is pending or running.

ax tasks cancel-run <run-id> [--force]

Option	Description
`--force`	Skip the confirmation prompt

Examples:

ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force

`ax tasks wait-for-run`

Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.

ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]

Option	Description
`--poll-interval`	Seconds between polling attempts (default: 5)
`--timeout`	Maximum seconds to wait before failing (default: 600)

Example:

ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10

​ax tasks list

​ax tasks create

​ax tasks create-evaluation

​ax tasks create-run-experiment

​ax tasks get

​ax tasks update

​ax tasks delete

​ax tasks trigger-run

​ax tasks list-runs

​ax tasks get-run

​ax tasks cancel-run

​ax tasks wait-for-run

`ax tasks list`

`ax tasks create`

`ax tasks create-evaluation`

`ax tasks create-run-experiment`

`ax tasks get`

`ax tasks update`

`ax tasks delete`

`ax tasks trigger-run`

`ax tasks list-runs`

`ax tasks get-run`

`ax tasks cancel-run`

`ax tasks wait-for-run`