1
Before running evals, make sure Phoenix is running & you have sent traces in your project. For more step by step instructions, check out this Get Started guide & Get Started with Tracing guide.
- Phoenix Cloud
- Local (Self-hosted)
Log in, create a space, navigate to the settings page in your space, and create your API keys.Set your environment variables.You can find your collector endpoint here:
Your Collector Endpoint is: https://app.phoenix.arize.com/s/ + your space name.

Launch your space, navigate to settings & copy your hostname for your collector endpoint
2
You’ll need to install the evals library that’s apart of Phoenix.
3
Since, we are running our evaluations on our trace data from our first project, we’ll need to pull that data into our code.
4
In this example, we will define, create, and run our own evaluator. There’s a number of different evaluators you can run, but this quick start will go through an LLM as a Judge Model.1) Define your LLM Judge ModelWe’ll use OpenAI as our evaluation model for this example, but Phoenix also supports a number of other models.If you haven’t yet defined your OpenAI API Key from the previous step, let’s first add it to our environment.2) Define your EvaluatorsWe will set up a Q&A correctness Evaluator with the LLM of choice. I want to first define my LLM-as-a-Judge prompt template. Most LLM-as-a-judge evaluations can be framed as a classification task where the output is one of two or more categorical labels.Now we want to define our Classification Evaluator
5
Now that we have defined our evaluator, we’re ready to evaluate our traces.
6
You’ll now be able to log your evaluations in your project view.First, format the evaluation results for logging using the Then log the annotations to Phoenix:
to_annotation_dataframe utility:
