Run playground experiments on datasets

Experiment with your datasets in prompt playground

When modifying a prompt in the playground, you can test your new prompt across a dataset of examples to validate that the model is hill climbing in terms of performance across challenging examples, without regressing on core business use cases.

You can also add evaluations to these runs to track and measure improvements.

Save playground outputs as an experiment

After iterating on a template in the playground and observing improvements across a dataset of examples, a typical workflow involves saving the results as an experiment for further analysis and comparison.

The saved outputs can then be A/B tested against multiple experiment templates on the same dataset. By comparing outputs side-by-side and aggregating metrics, teams can efficiently collaborate and align on the model best suited for a production workflow, ensuring decisions are based on both qualitative examples and quantitative metrics.

After successfully creating an experiment, you can select 'View Experiment' to navigate to the experiments page for reviewing and analyzing the saved results in detail.

Last updated

Was this helpful?