Know what every request costs. A single agent might chain five LLM calls — without cost tracking, you’re guessing what you’re spending. Arize AX calculates cost for every span, aggregates it at the trace level, and lets you filter, monitor, and optimize from there.

What is Cost Tracking?

Arize AX calculates the cost of every LLM call in your traces — at the trace level (total cost of a request) and at the span level (cost of each individual LLM call). Use it to:

Spot which requests or agents are expensive and why
Track spend across models and providers over time
Catch cost spikes before they become budget problems
Compare cost/quality tradeoffs between different models

Arize AX includes default cost configurations for common models (GPT-4o, Claude, Gemini, Mistral, and more), making it easy to get started with no setup required in many cases.

If there is a default you’d like us to add, reach out to support@arize.com

Token Tracking

Arize AX tracks token usage via standard OpenInference attributes on your LLM spans:

Attribute	Description
`llm.token_count.prompt`	Number of tokens in the prompt
`llm.token_count.completion`	Number of tokens in the completion
`llm.token_count.total`	Total number of tokens (prompt + completion)

Cost is calculated based on these token counts and the cost configuration for the model. The system supports multiple token types for detailed cost breakdowns:

Token Type	Category	Description
`input`	Prompt	Regular input tokens
`cache`	Prompt	Cached prompt tokens
`cache_read`	Prompt	Cache read tokens
`cache_write`	Prompt	Cache write tokens
`cache_input`	Prompt	Cached input tokens
`output`	Completion	Regular output tokens
`reasoning`	Completion	Reasoning tokens (e.g., o1/o3 models)
`audio`	Both	Audio tokens

Cost configs also support tiered pricing — volume-based pricing where cost per token changes based on total token count thresholds. These token counts are how Arize AX calculates cost:

How Cost Tracking Works

When a span is received, Arize AX determines cost as follows:

If the span already includes cost attributes (set by the client), those values are used as-is.
Otherwise, the system looks up a cost configuration by matching llm.model_name and llm.provider.
The matching config’s per-token rates are applied to the span’s token counts.
Cost configs are cached with a 10-minute TTL for performance.

Cost attributes on spans:

Attribute	Description
`llm.cost.prompt`	Total prompt cost
`llm.cost.completion`	Total completion cost
`llm.cost.total`	Total cost
`llm.cost.prompt_details.*`	Cost breakdown by prompt token type
`llm.cost.completion_details.*`	Cost breakdown by completion token type

Set Up Cost Tracking

1. Use a Default (Zero Setup)

If your model and provider match a default, Arize AX automatically applies the correct pricing — no action needed.

2. Customize a Default

To tweak an existing config (e.g., apply discounts):

Go to Settings > Cost Tracking > Configuration
Click Options > Clone on a default config
Edit fields like token type cost or provider name

Customizing cost tracking config in Arize AX

3. Create from Scratch

To define your own model config:

Click Add New
Enter the model name (required)
Optionally enter the provider
Specify cost per 1 million tokens for each token type
Assign each token type to Prompt or Completion

Cost configs are saved at the organization level.

Using Cost Data

Once configured, cost data is available across the platform.

Filtering and Monitoring

All cost attributes are available throughout the platform and can be used to:

Filter traces or spans where cost exceeds a defined threshold
Create monitors for high-cost traces or model behavior anomalies
Build dashboards based on specific token types or cost groupings

Trace-Level Visualization

At the trace level, Arize AX aggregates cost across all LLM spans in the trace. This provides a complete view of how much it cost to serve a given request end-to-end.

Trace-level cost aggregated across all LLM spans in a request in Arize AX

Span-Level Visualization

You can also inspect cost at the individual span level, including a breakdown by token type. This allows you to:

Pinpoint expensive steps in the LLM pipeline
Analyze the relative contribution of different token categories (e.g., reasoning, cache, image)

LLM span Attributes tab in Arize AX showing llm.cost breakdown with prompt, completion, prompt_details, completion_details, and total

Lookup Logic

To determine cost:

We extract the model name from your trace using the following fallback order:
- llm.model_name (Primary)
- llm.invocation_parameters.model (Fallback 1)
- metadata.model (Fallback 2)
Optionally, if you provide a provider, we’ll match that as well (e.g., differentiating OpenAI vs Azure OpenAI for gpt-4).
Each token type (e.g., prompt, completion, audio) is matched against the configuration, and the cost is calculated per million tokens (1M token unit basis).

Important: Cost is not retroactive. To track costs, you must configure pricing before ingesting traces.

Supported Token Types and Semantic Conventions

You can send any token types using OpenInference semantic conventions. Below are the supported fields:

Prompt Tokens

Token Type	Field Name
Prompt (Includes all input subtypes to LLM)	`llm.token_count.prompt`
Prompt Details	`llm.token_count.prompt_details`
Audio	`llm.token_count.prompt_details.audio`
Image	`llm.token_count.prompt_details.image`
Cache Input	`llm.token_count.prompt_details.cache_input`
Cache Read	`llm.token_count.prompt_details.cache_read`
Cache Write	`llm.token_count.prompt_details.cache_write`

Completion Tokens

Token Type	Field Name
Completion (Includes all output subtypes from LLM)	`llm.token_count.completion`
Audio	`llm.token_count.completion_details.audio`
Reasoning	`llm.token_count.completion_details.reasoning`
Image	`llm.token_count.completion_details.image`

Total Tokens (Optional)

llm.token_count.total

Custom Token Types

You can also define custom token types under either prompt_details or completion_details. Just make sure to:

Use semantic naming
Include a matching token type and cost in your configuration

Each token sent will have a cost calculated provided a matching token type is defined in your configuration.

Next step

Configure your OpenTelemetry tracer for production — batch processing, routing, and resource attributes:

Track Costs

What is Cost Tracking?

Token Tracking

How Cost Tracking Works