Skip to main content
Know what every request costs. A single agent might chain five LLM calls — without cost tracking, you’re guessing what you’re spending. Arize AX calculates cost for every span, aggregates it at the trace level, and lets you filter, monitor, and optimize from there.

What is Cost Tracking?

Arize AX calculates the cost of every LLM call in your traces — at the trace level (total cost of a request) and at the span level (cost of each individual LLM call). Use it to:
  • Spot which requests or agents are expensive and why
  • Track spend across models and providers over time
  • Catch cost spikes before they become budget problems
  • Compare cost/quality tradeoffs between different models
Arize includes default cost configurations for common models (GPT-4o, Claude, Gemini, Mistral, and more), making it easy to get started with no setup required in many cases.
If there is a default you’d like us to add, reach out to support@arize.com

Token Tracking

Arize AX tracks token usage via standard OpenInference attributes on your LLM spans:
AttributeDescription
llm.token_count.promptNumber of tokens in the prompt
llm.token_count.completionNumber of tokens in the completion
llm.token_count.totalTotal number of tokens (prompt + completion)
Cost is calculated based on these token counts and the cost configuration for the model. The system supports multiple token types for detailed cost breakdowns:
Token TypeCategoryDescription
inputPromptRegular input tokens
cachePromptCached prompt tokens
cache_readPromptCache read tokens
cache_writePromptCache write tokens
cache_inputPromptCached input tokens
outputCompletionRegular output tokens
reasoningCompletionReasoning tokens (e.g., o1/o3 models)
audioBothAudio tokens
Cost configs also support tiered pricing — volume-based pricing where cost per token changes based on total token count thresholds. These token counts are how Arize calculates cost:

How Cost Tracking Works

When a span is received, Arize AX determines cost as follows:
  1. If the span already includes cost attributes (set by the client), those values are used as-is.
  2. Otherwise, the system looks up a cost configuration by matching llm.model_name and llm.provider.
  3. The matching config’s per-token rates are applied to the span’s token counts.
  4. Cost configs are cached with a 10-minute TTL for performance.
Cost attributes on spans:
AttributeDescription
llm.cost.promptTotal prompt cost
llm.cost.completionTotal completion cost
llm.cost.totalTotal cost
llm.cost.prompt_details.*Cost breakdown by prompt token type
llm.cost.completion_details.*Cost breakdown by completion token type

Set Up Cost Tracking

1. Use a Default (Zero Setup)

If your model and provider match a default, Arize automatically applies the correct pricing — no action needed.

2. Customize a Default

To tweak an existing config (e.g., apply discounts):
  • Go to Settings > Cost Tracking > Configuration
  • Click Options > Clone on a default config
  • Edit fields like token type cost or provider name
Customizing cost tracking config in Arize AX

3. Create from Scratch

To define your own model config:
  • Click Add New
  • Enter the model name (required)
  • Optionally enter the provider
  • Specify cost per 1 million tokens for each token type
  • Assign each token type to Prompt or Completion
Cost configs are saved at the organization level.
Creating custom cost tracking config

Using Cost Data

Once configured, cost data is available across the platform.

Filtering and Monitoring

All cost attributes are available throughout the platform and can be used to:
  • Filter traces or spans where cost exceeds a defined threshold
  • Create monitors for high-cost traces or model behavior anomalies
  • Build dashboards based on specific token types or cost groupings

Trace-Level Visualization

At the trace level, Arize aggregates cost across all LLM spans in the trace. This provides a complete view of how much it cost to serve a given request end-to-end.
Trace-level cost aggregated across all LLM spans in a request in Arize AX

Span-Level Visualization

You can also inspect cost at the individual span level, including a breakdown by token type. This allows you to:
  • Pinpoint expensive steps in the LLM pipeline
  • Analyze the relative contribution of different token categories (e.g., reasoning, cache, image)
LLM span Attributes tab in Arize AX showing llm.cost breakdown with prompt, completion, prompt_details, completion_details, and total

Lookup Logic

To determine cost:
  1. We extract the model name from your trace using the following fallback order:
    • llm.model_name (Primary)
    • llm.invocation_parameters.model (Fallback 1)
    • metadata.model (Fallback 2)
  2. Optionally, if you provide a provider, we’ll match that as well (e.g., differentiating OpenAI vs Azure OpenAI for gpt-4).
  3. Each token type (e.g., prompt, completion, audio) is matched against the configuration, and the cost is calculated per million tokens (1M token unit basis).
Important: Cost is not retroactive. To track costs, you must configure pricing before ingesting traces.

Supported Token Types and Semantic Conventions

You can send any token types using OpenInference semantic conventions. Below are the supported fields:

Prompt Tokens

Token TypeField Name
Prompt (Includes all input subtypes to LLM)llm.token_count.prompt
Prompt Detailsllm.token_count.prompt_details
Audiollm.token_count.prompt_details.audio
Imagellm.token_count.prompt_details.image
Cache Inputllm.token_count.prompt_details.cache_input
Cache Readllm.token_count.prompt_details.cache_read
Cache Writellm.token_count.prompt_details.cache_write

Completion Tokens

Token TypeField Name
Completion (Includes all output subtypes from LLM)llm.token_count.completion
Audiollm.token_count.completion_details.audio
Reasoningllm.token_count.completion_details.reasoning
Imagellm.token_count.completion_details.image

Total Tokens (Optional)

llm.token_count.total

Custom Token Types

You can also define custom token types under either prompt_details or completion_details. Just make sure to:
  • Use semantic naming
  • Include a matching token type and cost in your configuration
Each token sent will have a cost calculated provided a matching token type is defined in your configuration.

Next step

Configure your OpenTelemetry tracer for production — batch processing, routing, and resource attributes:

Next: Configure Your Tracer