Skip to main content

Querying Feedback

LangSmith makes it easy to fetch feedback associated with your runs. The Run object itself has aggregate feedback_stats on its body, which may satisfy your needs. If you want the additional feedback metadata (or full list of feedback objects), you can use the SDK or API to list feedback objects based on run IDs, feedback keys, and feedback source types.

Using the list_feedback method in the SDK or /feedback endpoint in the API, you can fetch feedback to analyze. Most simple requests can be satisfied using the following arguments:

  • run_ids: Fetch feedback for specific runs by providing their IDs.
  • feedback_keys: Filter feedback by specific keys, such as 'correctness' or 'quality'.
  • feedback_source_types: Filter feedback by the source type, such as 'model' for model-generated feedback or 'api' for feedback submitted via the API.

All the examples below assume you have created a LangSmith client and configured it with your API key to connect to the LangSmith server.

from langsmith import Client

client = Client()
Below are some examples of ways to list feedback using the available arguments:

List all feedback for a specific run

run_feedback = client.list_feedback(run_ids=["<run_id>"])

List all feedback with a specific key

correctness_feedback = client.list_feedback(feedback_key=["correctness"])

List all model-generated feedback

model_feedback = client.list_feedback(feedback_source_type=["model"])

Use Cases

Here are a few common use cases for querying feedback:

Compare model-generated and human feedback for a set of runs

After querying for a set of runs, you can compare the model-generated and human feedback for those specific runs:

# Query for runs
runs = client.list_runs(project_name="<your_project>", filter='gt(start_time, "2023-07-15T00:00:00Z")')

# Extract run IDs
run_ids = [run.id for run in runs]

# Fetch model-generated feedback for the runs
model_feedback = client.list_feedback(run_ids=run_ids, feedback_source_type=["model"])

# Fetch human feedback for the runs
human_feedback = client.list_feedback(run_ids=run_ids, feedback_source_type=["api"])

Analyze feedback for a specific key and set of runs

If you're interested in analyzing feedback for a specific key, such as 'correctness' or 'quality', for a set of runs, you can query for the runs and then filter the feedback by key:

# Query for runs
runs = client.list_runs(project_name="<your_project>", filter='gt(start_time, "2023-07-15T00:00:00Z")')

# Extract run IDs
run_ids = [run.id for run in runs]

# Fetch correctness feedback for the runs
correctness_feedback = client.list_feedback(run_ids=run_ids, feedback_key=["correctness"])

# Analyze the correctness scores
scores = [feedback.score for feedback in correctness_feedback]

Was this page helpful?