Set up online evaluations
Recommended Reading
Before diving into this content, it might be helpful to read the following:
Online evaluations provide real-time feedback on your production traces. This is useful to continuously monitor the performance of your application - to identify issues, measure improvements, and ensure consistent quality over time.
There are two types of online evaluations supported in LangSmith:
- LLM-as-a-judge: Use an LLM to evaluate your traces. Used as a scalable way to provide human-like judgement to your output (e.g. toxicity, hallucination, correctness, etc.).
- Custom Code: Write an evaluator in Python directly in LangSmith. Often used for validating structure or statistical properties of your data.
Online evaluations are configured using automation rules.