Skip to main content

Regression Testing

When evaluating LLM applications, it is important to be able to track how your system performs over time. In this guide, we will show you how to use LangSmith's comparison view in order to track regressions in your application, and drill down to inspect the specific runs that improved/regressed over time.


In the LangSmith comparison view, runs that regressed on your specified feedback key against your baseline experiment will be highlighted in red, while runs that improved will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and and how many did worse than your baseline experiment.


Baseline Experiment

In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can change it from the dropdown at the top of the page.


Select Feedback Key

You will also want to select the feedback key on which you would like focus. This can be selected via another dropdown at the top. Again, one will be assigned by default, but you can adjust as needed.


Filter to Regressions or Improvements

Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.

Regressions Filter

Try it out

To get started with regression testing, try running a no-code experiment in our prompt playground or check out the Evaluation Quick Start Guide to get started with the SDK.

Was this page helpful?

You can leave detailed feedback on GitHub.