Log Evaluation Results

This guide shows how LLM evaluation results in dataframes can be sent to Phoenix.

An evaluation must have a name (e.g. "Q&A Correctness") and its DataFrame must contain identifiers for the subject of evaluation, e.g. a span or a document (more on that below), and values under either the score, label, or explanation columns. See Evaluations for more information.

Connect to Phoenix

Initialize the Phoenix client to connect to your Phoenix instance:

from phoenix.client import Client

# Initialize client - automatically reads from environment variables:
# PHOENIX_BASE_URL and PHOENIX_API_KEY (if using Phoenix Cloud)
client = Client()

# Or explicitly configure for your Phoenix instance:
# client = Client(base_url="https://your-phoenix-instance.com", api_key="your-api-key")

Span Evaluations

A dataframe of span evaluations would look similar to the table below. It must contain span_id as an index or as a column. Once ingested, Phoenix uses the span_id to associate the evaluation with its target span.

span_id
label
score
explanation

5B8EF798A381

correct

1

"this is correct ..."

E19B7EC3GG02

incorrect

0

"this is incorrect ..."

The evaluations dataframe can be sent to Phoenix as follows. Note that the name of the evaluation must be supplied through the annotation_name= parameter. In this case we name it "Q&A Correctness".

client.spans.log_span_annotations_dataframe(
    dataframe=qa_correctness_eval_df,
    annotation_name="Q&A Correctness",
    annotator_kind="LLM",
)

Document Evaluations

A dataframe of document evaluations would look something like the table below. It must contain span_id and document_position as either indices or columns. document_position is the document's (zero-based) index in the span's list of retrieved documents. Once ingested, Phoenix uses the span_id and document_position to associate the evaluation with its target span and document.

span_id
document_position
label
score
explanation

5B8EF798A381

0

relevant

1

"this is ..."

5B8EF798A381

1

irrelevant

0

"this is ..."

E19B7EC3GG02

0

relevant

1

"this is ..."

The evaluations dataframe can be sent to Phoenix as follows. Note that the name of the evaluation must be supplied through the annotation_name= parameter. In this case we name it "Relevance".

client.spans.log_document_annotations_dataframe(
    dataframe=document_relevance_eval_df,
    annotation_name="Relevance",
    annotator_kind="LLM",
)

Logging Multiple Evaluation DataFrames

Multiple sets of Evaluations can be logged using separate function calls with the new client.

client.spans.log_span_annotations_dataframe(
    dataframe=qa_correctness_eval_df,
    annotation_name="Q&A Correctness",
    annotator_kind="LLM",
)
client.spans.log_document_annotations_dataframe(
    dataframe=document_relevance_eval_df,
    annotation_name="Relevance",
    annotator_kind="LLM",
)
client.spans.log_span_annotations_dataframe(
    dataframe=hallucination_eval_df,
    annotation_name="Hallucination",
    annotator_kind="LLM",
)
# ... continue with additional evaluations as needed

Last updated

Was this helpful?