Evaluation how-to guides

These guides answer “How do I…?” format questions. They are goal-oriented and concrete, and are meant to help you complete a specific task. For conceptual explanations see the Conceptual guide. For end-to-end walkthroughs see Tutorials. For comprehensive descriptions of every class and function see the API reference.

Key features

Create a dataset with the SDK or from the UI
Run offline evaluations with the SDK or from the UI
Run online evaluations with LLM-as-judge and custom code evaluators
Analyze evaluation results in the UI
Log user feedback from your app
Log expert feedback with annotation queues

Offline evaluation

Evaluate and improve your application before deploying it.

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Add default evaluators to a dataset

Set up evaluators that automatically run for all experiments against a dataset.

Testing integrations

Run evals using your favorite testing tools.

Online evaluation

Evaluate and monitor your system's live performance on production data.

Analyzing experiment results

Use the UI & API to understand your experiment results.

Dataset management

Manage datasets in LangSmith used by your evaluations.

Annotation queues and human feedback

Collect feedback from subject matter experts and users to improve your applications.

Evaluation how-to guides

Key features

Offline evaluation

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Add default evaluators to a dataset

Testing integrations

Online evaluation

Analyzing experiment results

Dataset management

Annotation queues and human feedback

Was this page helpful?

You can leave detailed feedback on GitHub.

Key features​

Offline evaluation​

Run an evaluation​

Define an evaluator​

Configure the evaluation data​

Configure an evaluation job​

Add default evaluators to a dataset​

Testing integrations​

Online evaluation​

Analyzing experiment results​

Dataset management​

Annotation queues and human feedback​

Was this page helpful?

You can leave detailed feedback on GitHub.

Key features

Offline evaluation

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Add default evaluators to a dataset

Testing integrations

Online evaluation

Analyzing experiment results

Dataset management

Annotation queues and human feedback