Configure cost tracking to monitor LLM spend by model, provider, and token type
Know what every request costs. A single agent might chain five LLM calls — without cost tracking, you’re guessing what you’re spending. Arize AX calculates cost for every span, aggregates it at the trace level, and lets you filter, monitor, and optimize from there.
Arize AX calculates the cost of every LLM call in your traces — at the trace level (total cost of a request) and at the span level (cost of each individual LLM call). Use it to:
Spot which requests or agents are expensive and why
Track spend across models and providers over time
Catch cost spikes before they become budget problems
Compare cost/quality tradeoffs between different models
Arize includes default cost configurations for common models (GPT-4o, Claude, Gemini, Mistral, and more), making it easy to get started with no setup required in many cases.
If there is a default you’d like us to add, reach out to support@arize.com
Arize AX tracks token usage via standard OpenInference attributes on your LLM spans:
Attribute
Description
llm.token_count.prompt
Number of tokens in the prompt
llm.token_count.completion
Number of tokens in the completion
llm.token_count.total
Total number of tokens (prompt + completion)
Cost is calculated based on these token counts and the cost configuration for the model. The system supports multiple token types for detailed cost breakdowns:
Token Type
Category
Description
input
Prompt
Regular input tokens
cache
Prompt
Cached prompt tokens
cache_read
Prompt
Cache read tokens
cache_write
Prompt
Cache write tokens
cache_input
Prompt
Cached input tokens
output
Completion
Regular output tokens
reasoning
Completion
Reasoning tokens (e.g., o1/o3 models)
audio
Both
Audio tokens
Cost configs also support tiered pricing — volume-based pricing where cost per token changes based on total token count thresholds.These token counts are how Arize calculates cost:
At the trace level, Arize aggregates cost across all LLM spans in the trace. This provides a complete view of how much it cost to serve a given request end-to-end.
We extract the model name from your trace using the following fallback order:
llm.model_name (Primary)
llm.invocation_parameters.model (Fallback 1)
metadata.model (Fallback 2)
Optionally, if you provide a provider, we’ll match that as well (e.g., differentiating OpenAI vs Azure OpenAI for gpt-4).
Each token type (e.g., prompt, completion, audio) is matched against the configuration, and the cost is calculated per million tokens (1M token unit basis).
Important: Cost is not retroactive. To track costs, you must configure pricing before ingesting traces.