Advanced options for running experiments via code
Setup asynchronous experiments
Experiments can be run as either Synchronous or Asynchronous.
We recommend:
Synchronous: Slower but easier to debug. When you are building your tests these are inherently easier to debug. Start with synchronous and then make them asynchronous.
Asynchronous: Faster. When timing and speed of the tests matter. Make the tasks and/or Evals asynchronous and you can 10x the speed of your runs.
The synchronous running of an experiment runs one after another. The asynchronous running of an experiment runs in parallel.
Here are some code differences between the two. You just need to add the async
keyword before your functions def and add async_
at the front of the name, and then run nest_asyncio.apply()
. This will rely on the concurrency
parameter in run_experiment
, so if you'd like to run them faster, set it to a higher number.
# Sync task
def prompt_gen_task(example):
print('running task sync')
# Sync evaluation
def evaluate_hallu(output, dataset_row):
print('running eval sync')
# run experiment
experiment1 = arize_client.run_experiment(space_id=space_id,
dataset_id=dataset_id, task=prompt_gen_task, evaluators=[evaluate_hallu],
experiment_name="test"
)
###############
import nest_asyncio
nest_asyncio.apply()
# Async task
async def async_prompt_gen_task(example):
print('running task sync')
# Async evaluation
async def async_evaluate_hallu(output, dataset_row):
print('running eval sync')
# same run experiment function
experiment1 = arize_client.run_experiment(
space_id=space_id,
dataset_id=dataset_id,
task=async_prompt_gen_task,
evaluators=[async_evaluate_hallu],
experiment_name="test",
concurrency=10
)
Sampling a dataset for an experiment
Running a test on dataset sometimes requires running on random or stratified samples of the dataset. Arize supports running on samples by allowing teams to download a dataframe. That dataframe can be sampled prior to running the experiment.
# Get dataset as Dataframe
dataset_df = arize_client.get_dataset(space_id=SPACE_ID, dataset_name=dataset_name)
# Any sampling methods you want on a DF
sampled_df = dataset_df.sample(n=100) # Sample 100 rows randomly
# Sample 10% of rows randomly
sampled_df = dataset_df.sample(frac=0.1)
# Create proportional sampling based on the original dataset's class label distribution
stratified_sampled_df = dataset_df.groupby('class_label', group_keys=False).apply(lambda x: x.sample(frac=0.1))
# Select every 10th row
systematic_sampled_df = dataset_df.iloc[::10, :]
# Run Experiment on sampled_df
client.run_experiment(space_id, dataset_name, sampled_df, taskfn, evaluators)
An experiment will only matched up with the data that was run against it. You can run experiments with different samples of the same dataset. The platform will take care of tracking and visualization.
Any complex sampling method that can be applied to a dataframe can be used for sampling.
Tracing your experiment
When running experiments, arize_client.run_experiment()
will produce a task span attached to the experiment. If you want to add more traces on the experimental run, you can actually instrument any part of that experiment and they will get attached below the task span
Arize tracers instrumented on experiment code will automatically trace the experiments into the platform.
Tracing Using Explicit Spans
from opentelemetry import trace
# Outer function will be traced by Arize with a span
def task_add_1(dataset_row):
tracer = trace.get_tracer(__name__)
# Start the span for the function
with tracer.start_as_current_span("test_function") as span:
# Extract the number from the dataset row
num = dataset_row['attributes.my_number']
# Set 'num' as a span attribute
span.set_attribute("dataset.my_number", num)
# Return the incremented number
return num + 1
Tracing Using Auto-Instrumentor
# Import the automatic instrumentor from OpenInference
from openinference.instrumentation.openai import OpenAIInstrumentor
# Automatic instrumentation --- This will trace all tasks below with LLM Calls
OpenAIInstrumentor().instrument()
task_prompt_template = "Answer in a few words: {question}"
openai_client = OpenAI()
def task(dataset_row) -> str:
question = dataset_row["question"]
message_content = task_prompt_template.format(question=question)
response = openai_client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": message_content}]
)
return response.choices[0].message.content
Experiments SDK differences in Arize AX vs Phoenix OSS
There are subtle differences between the experiments SDK using Arize AX vs. Phoenix, but the base concepts are the same.
You can check out a full notebook example of each. The example below runs an experiment to write a haiku, and evaluate its tone using an LLM eval.
Last updated
Was this helpful?