Evaluators - Arize AX Docs

The ax evaluators commands are currently in BETA. The API may change without notice. A one-time warning is emitted on first use.

The ax evaluators commands let you create and manage LLM-as-judge evaluators and their versions on the Arize platform.

`ax evaluators list`

List evaluators, optionally filtered by space.

ax evaluators list [--space <id>] [--name <filter>] [--limit <n>] [--cursor <cursor>]

Option	Description
`--space`	Filter evaluators by space name or ID
`--name`	Case-insensitive substring filter on evaluator name
`--limit`	Maximum number of results to return (default: 15)
`--cursor`	Pagination cursor for the next page

Examples:

ax evaluators list --space sp_abc123
ax evaluators list --space sp_abc123 --output evaluators.json

`ax evaluators create-template-evaluator`

Create a new template (LLM-as-judge) evaluator with an initial version. Required options will be prompted interactively if not passed as flags.

ax evaluators create-template-evaluator \
  --name <name> \
  --space <id> \
  --commit-message <message> \
  --template-name <name> \
  --template <template-string> \
  --ai-integration-id <id> \
  --model-name <model>

Option	Description
`--name`, `-n`	Evaluator name (must be unique within the space)
`--space`, `-s`	Space name or ID to create the evaluator in
`--commit-message`	Commit message for the initial version
`--template-name`	Eval column name (alphanumeric, spaces, hyphens, underscores)
`--template`	Prompt template string with `{{variable}}` placeholders
`--ai-integration-id`	AI integration global ID (base64)
`--model-name`	Model name (e.g. `gpt-4o`)
`--description`	Optional evaluator description
`--include-explanations`	Include reasoning explanation alongside the score (flag)
`--use-function-calling`	Prefer structured function-call output when supported (flag)
`--invocation-params`	JSON object of model invocation parameters (e.g. `'{"temperature": 0}'`)
`--provider-params`	JSON object of provider-specific parameters
`--classification-choices`	JSON object mapping choice labels to numeric scores (e.g. `'{"relevant":1,"irrelevant":0}'`). Omit for freeform output.
`--direction`	Optimization direction: `maximize`, `minimize`, or `none`
`--data-granularity`	Data granularity: `span`, `trace`, or `session`

Example:

ax evaluators create-template-evaluator \
  --name "Relevance" \
  --space sp_abc123 \
  --commit-message "Initial version" \
  --template-name "Relevance" \
  --template "Is the response relevant to the query?\nQuery: {{input.value}}\nResponse: {{output.value}}" \
  --ai-integration-id ai_xyz789 \
  --model-name gpt-4o \
  --include-explanations \
  --invocation-params '{"temperature": 0}' \
  --classification-choices '{"relevant":1,"irrelevant":0}'

`ax evaluators create-code-evaluator`

Create a new code evaluator with an initial version. Use --code-type managed for a built-in check (MatchesRegex, JSONParseable, ContainsAnyKeyword, ContainsAllKeywords, ExactMatch) or --code-type custom to supply Python.

ax evaluators create-code-evaluator \
  --name <name> \
  --space <id> \
  --commit-message <message> \
  --code-type managed \
  --code-name <name> \
  --variables <json-array> \
  --managed-evaluator <kind>

Option	Description
`--name`, `-n`	Evaluator name (must be unique within the space)
`--space`, `-s`	Space name or ID to create the evaluator in
`--commit-message`	Commit message for the initial version
`--code-type`	`managed` (built-in) or `custom` (user Python)
`--code-name`	Eval column name
`--variables`	JSON array of span attribute names to pass into the evaluator (e.g. `'["output"]'`). Inline JSON or a `@file` path.
`--managed-evaluator`	Built-in evaluator (when `--code-type managed`): `MatchesRegex`, `JSONParseable`, `ContainsAnyKeyword`, `ContainsAllKeywords`, or `ExactMatch`
`--code`	Python source (when `--code-type custom`). Inline or `@path/to/evaluator.py`.
`--imports`	Optional Python import block for `--code-type custom`. Inline or `@path/to/imports.py`.
`--static-params`	JSON array of static parameters. Each item: `{name, type: STRING\|STRING_ARRAY\|REGEX, default_value}`. Inline JSON or a `@file` path.
`--query-filter`	Optional filter query applied before evaluation
`--data-granularity`	Data granularity: `span`, `trace`, or `session`
`--description`	Optional evaluator description

Example:

ax evaluators create-code-evaluator \
  --name "JSON Parseable" \
  --space sp_abc123 \
  --commit-message "Initial version" \
  --code-type managed \
  --code-name "json_parseable" \
  --variables '["output"]' \
  --managed-evaluator JSONParseable

`ax evaluators get`

Get an evaluator by name or ID, with its resolved version.

ax evaluators get <name-or-id> [--space <id>] [--version-id <id>]

Option	Description
`--space`	Space name or ID (required when using evaluator name instead of ID)
`--version-id`	Specific version ID to retrieve (default: latest version)

Examples:

ax evaluators get ev_abc123
ax evaluators get "Relevance" --space sp_abc123
ax evaluators get ev_abc123 --version-id evv_xyz789

`ax evaluators update`

Update an evaluator’s name or description. At least one of --name or --description is required.

ax evaluators update <name-or-id> [--space <id>] [--name <name>] [--description <desc>]

Option	Description
`--space`	Space name or ID (required when using evaluator name instead of ID)
`--name`	New evaluator name
`--description`	New evaluator description

Example:

ax evaluators update ev_abc123 --name "Relevance v2" --description "Updated scoring rubric"

`ax evaluators delete`

Delete an evaluator and all its versions. This operation is irreversible.

ax evaluators delete <name-or-id> [--space <id>] [--force]

Option	Description
`--space`	Space name or ID (required when using evaluator name instead of ID)
`--force`	Skip the confirmation prompt

Examples:

ax evaluators delete ev_abc123
ax evaluators delete ev_abc123 --force
ax evaluators delete "Relevance" --space sp_abc123 --force

`ax evaluators list-versions`

List all versions of an evaluator.

ax evaluators list-versions <name-or-id> [--space <id>] [--limit <n>] [--cursor <cursor>]

Option	Description
`--space`	Space name or ID (required when using evaluator name instead of ID)
`--limit`	Maximum number of versions to return (default: 15)
`--cursor`	Pagination cursor for the next page

Example:

ax evaluators list-versions ev_abc123

`ax evaluators create-template-evaluator-version`

Create a new template version of an existing template evaluator. Versions are immutable once created; the new version becomes the latest immediately. Required options will be prompted interactively if not passed as flags.

ax evaluators create-template-evaluator-version <name-or-id> \
  --commit-message <message> \
  --template-name <name> \
  --template <template-string> \
  --ai-integration-id <id> \
  --model-name <model> \
  [--space <id>]

Option	Description
`--space`, `-s`	Space name or ID (required when using evaluator name instead of ID)
`--commit-message`	Commit message describing the changes in this version
`--template-name`	Eval column name
`--template`	Updated prompt template string with `{{variable}}` placeholders
`--ai-integration-id`	AI integration global ID (base64)
`--model-name`	Model name (e.g. `gpt-4o`)
`--include-explanations`	Include reasoning explanation alongside the score (flag)
`--use-function-calling`	Prefer structured function-call output when supported (flag)
`--invocation-params`	JSON object of model invocation parameters
`--provider-params`	JSON object of provider-specific parameters
`--classification-choices`	JSON object mapping choice labels to numeric scores. Omit for freeform output.
`--direction`	Optimization direction: `maximize`, `minimize`, or `none`
`--data-granularity`	Data granularity: `span`, `trace`, or `session`

Example:

ax evaluators create-template-evaluator-version ev_abc123 \
  --commit-message "Improved prompt for edge cases" \
  --template-name "Relevance" \
  --template "Rate the relevance of the response on a scale of 0 to 1.\nQuery: {{input.value}}\nResponse: {{output.value}}" \
  --ai-integration-id ai_xyz789 \
  --model-name gpt-4o

`ax evaluators create-code-evaluator-version`

Create a new code version of an existing code evaluator.

ax evaluators create-code-evaluator-version <name-or-id> \
  --commit-message <message> \
  --code-type managed \
  --code-name <name> \
  --variables <json-array> \
  --managed-evaluator <kind> \
  [--space <id>]

Option	Description
`--space`, `-s`	Space name or ID (required when using evaluator name instead of ID)
`--commit-message`	Commit message describing the changes in this version
`--code-type`	`managed` (built-in) or `custom` (user Python)
`--code-name`	Eval column name
`--variables`	JSON array of span attribute names to pass into the evaluator. Inline JSON or a `@file` path.
`--managed-evaluator`	Built-in evaluator (when `--code-type managed`)
`--code`	Python source (when `--code-type custom`). Inline or `@path/to/evaluator.py`.
`--imports`	Optional Python import block for `--code-type custom`. Inline or `@path/to/imports.py`.
`--static-params`	JSON array of static parameters. Inline JSON or a `@file` path.
`--query-filter`	Optional filter query applied before evaluation
`--data-granularity`	Data granularity: `span`, `trace`, or `session`

Example:

ax evaluators create-code-evaluator-version ev_abc123 \
  --commit-message "Updated managed evaluator" \
  --code-type managed \
  --code-name "json_parseable" \
  --variables '["output"]' \
  --managed-evaluator JSONParseable

`ax evaluators get-version`

Get a specific evaluator version by ID.

ax evaluators get-version <version-id>

Example:

ax evaluators get-version evv_xyz789

​ax evaluators list

​ax evaluators create-template-evaluator

​ax evaluators create-code-evaluator

​ax evaluators get

​ax evaluators update

​ax evaluators delete

​ax evaluators list-versions

​ax evaluators create-template-evaluator-version

​ax evaluators create-code-evaluator-version

​ax evaluators get-version

`ax evaluators list`

`ax evaluators create-template-evaluator`

`ax evaluators create-code-evaluator`

`ax evaluators get`

`ax evaluators update`

`ax evaluators delete`

`ax evaluators list-versions`

`ax evaluators create-template-evaluator-version`

`ax evaluators create-code-evaluator-version`

`ax evaluators get-version`