Skip to main content

Working with feedback

This guide will walk you through feedback in LangSmith. For more end-to-end examples incorporating feedback into a workflow, see the LangSmith Cookbook.

Feedback comes in two forms: automated and human. Both are key to delivering a consistently high quality application experience.

Automated feedback is generated any time you use the run_on_dataset method to test your component on a dataset or any time you call client.evaluate_run(run_id, run_evaluator). This guide will focus on human feedback using a common workflow.

There are two ways to log human feedback:

  1. Clicking "Rate Run" on a run's trace in the web app directly
  2. Using the LangSmith client (or REST API).

Below is the method signature from the client.

from langsmith import Client

client = Client()
score: float | int | bool | None = None,
value: float | int | bool | str | dict | None = None,
correction: str | dict | None = None,
comment: str | None = None,
source_info: Dict[str, Any] | None = None,
feedback_source_type: "API" | "MODEL" = "API",

Feedback requires a "key" as a name for the feedback and a run_id to assign the feedback to. Typically, you will also want a "score" to characterize measure the feedback and facilitate comparisons. Feedback can contain any of the following optional fields:

  • score: The score to rate this run on the metric or aspect.
  • value: The display value or non-numeric value for this feedback.
  • correction: The correct ground truth for this run.
  • comment: A comment about this feedback, or additional reasoning about why it received this score.
  • source_info: Information about the source of this feedback (e.g., a user ID, model type, tags, etc.)
  • feedback_source_type: The type of feedback source, either API or MODEL.

Example workflow

In addition to general production observability, user feedback is a key ingredient to continually improving your LLM application. A simple workflow for this is:

  1. Enable tracing and use the LangSmith client to save user feedback.
  2. Filter runs based on feedback and other automated metrics.
  3. Export runs to a dataset to use for evaluation and training.

1. Log feedback for a run

Assuming you've already set up tracing in the LangSmith quick start, you can add user feedback to a run by returning the run ID and and including that in the create_feedback call.

To use the run ID for a given call, use the "RunCollectorCallbackHandler" (or use the collect_runs(). In python, you will have to specify include_run_info=True when calling your chain or LLM. Below is an example:

export LANGCHAIN_API_KEY=<your_api_key>
export LANGCHAIN_PROJECT=<your_project>
import os
from langchain import chat_models, prompts, callbacks
from langchain.schema import output_parser
from langsmith import Client

chain = (
("system", "You are a conversational bot."),
("user", "{input}"),
| chat_models.ChatOpenAI()
| output_parser.StrOutputParser()
client = Client()
def main():
with callbacks.collect_runs() as cb:
for tok in{"input": "Hi, I'm Clara"}):
print(tok, end="", flush=True)
run_id = cb.traced_runs[0].ids
# ... User copies the generated response
client.create_feedback(run_id, "did_copy", score=True)
# ... User clicks a thumbs up button
client.create_feedback(run_id, "thumbs_up", score=True)

2. Filter runs based on feedback

Once you've collected runs, you can select some to analyze or export to a dataset. It's easiest to do this directly in the LangSmith web app directly, but you can also use the SDK. Below, select all runs that received a 'thumbs_up' from the user.

runs = client.list_runs(
filter='and(eq(feedback_key, "Correctness"), eq(feedback_score, 1.0))',

3. Export runs to dataset

After reviewing the runs or filtering for other metrics, you can export them to a dataset for further analysis, evaluation, or even for training new models. The easiest way to do so is via the web app directly by going to the project, selecting runs and clicking "Add to dataset".

dataset = client.create_dataset(
dataset_name="Thumbs Up Runs",
description="Runs that received a thumbs up from the user",
for run in runs:

Once you've created a dataset, you can use it to evaluate new prompts against, to train a new LLM, or for other purposes.