How to query traces and runs
LangSmith makes it easy to query for traces and runs. In addition to the filtering experience presented in the UI, you can also use the SDK or API to query for traces and runs.
Using the list_runs
method in the SDK or /runs/query
endpoint in the API, you can filter runs to analyze and export. Most simple requests can be satisfied using simple top level arguments:
Keys | Description |
---|---|
project_id / project_name | The project(s) to fetch runs from - can be a single project or a list of projects. |
trace_id | Fetch runs that are part of a specific trace. |
run_type | The type of run to get, such as llm , chain , tool , retriever , etc. |
dataset_name / dataset_id | Fetch runs that are associated with an example row in the specified dataset. This is useful for comparing prompts or models over a given dataset. |
reference_example_id | Fetch runs that are associated with a specific example row. This is useful for comparing prompts or models on a given input. |
parent_run_id | Fetch runs that are children of a given run. This is useful for fetching runs grouped together using the context manager or for fetching an agent trajectory. |
error | Fetch runs that errored or did not error. |
run_ids | Fetch runs with a given list of run ids. Note: This will ignore all other filtering arguments. |
filter | Fetch runs that match a given structured filter statement. See the run filtering guide below for more information. |
trace_filter | Filter to apply to the ROOT run in the trace tree. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of the root run within a trace. |
tree_filter | Filter to apply to OTHER runs in the trace tree, including sibling and child runs. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of any run within a trace. |
is_root | Only return root runs. |
select | Select the fields to return in the response. By default, all fields are returned. |
query (experimental) | Query the experimental natural language API, which translates your query into a filter statement. |
Using keyword arguments
For simple queries, such as filtering by project, run time, name, or run ID's, you can directly use keyword arguments in the list_runs method. These correspond directly to query params in the REST API. All the examples below assume you have created a LangSmith client and configured it with your API key to connect to the LangSmith server.
- Python
- TypeScript
from langsmith import Client
client = Client()
import { Client, Run } from "langsmith";
const client = new Client();
Below are some examples of ways to list runs using keyword arguments:
List all runs in a project
- Python
- TypeScript
project_runs = client.list_runs(project_name="<your_project>")
// Download runs in a project
const projectRuns: Run[] = [];
for await (const run of client.listRuns({
projectName: "<your_project>",
})) {
projectRuns.push(run);
};
List LLM and Chat runs in the last 24 hours
- Python
- TypeScript
todays_llm_runs = client.list_runs(
project_name="<your_project>",
start_time=datetime.now() - timedelta(days=1),
run_type="llm",
)
const todaysLlmRuns: Run[] = [];
for await (const run of client.listRuns({
projectName: "<your_project>",
startTime: new Date(Date.now() - 1000 * 60 * 60 * 24),
runType: "llm",
})) {
todaysLlmRuns.push(run);
};
List traces in a project
Root runs (or run traces), are runs that have no parents. These are assigned an 'execution_order' of 1. You can use this to filter for root runs.
- Python
- TypeScript
root_runs = client.list_runs(
project_name="<your_project>",
is_root=True
)
const rootRuns: Run[] = [];
for await (const run of client.listRuns({
projectName: "<your_project>",
isRoot: 1,
})) {
rootRuns.push(run);
};
List runs without errors
- Python
- TypeScript
correct_runs = client.list_runs(project_name="<your_project>", error=False)
const correctRuns: Run[] = [];
for await (const run of client.listRuns({
projectName: "<your_project>",
error: false,
})) {
correctRuns.push(run);
};
List runs by run ID
If you have a list of run IDs, you can list them directly:
- Python
- TypeScript
run_ids = ['a36092d2-4ad5-4fb4-9c0d-0dba9a2ed836','9398e6be-964f-4aa4-8ae9-ad78cd4b7074']
selected_runs = client.list_runs(id=run_ids)
const runIds = [
"a36092d2-4ad5-4fb4-9c0d-0dba9a2ed836",
"9398e6be-964f-4aa4-8ae9-ad78cd4b7074",
];
const selectedRuns: Run[] = [];
for await (const run of client.listRuns({
id: runIds,
})) {
selectedRuns.push(run);
};
If you provide a list of run IDs in the way described above, it will ignore all other filtering arguments like project_name
, run_type
, etc. and directly return the runs matching the given IDs.
Run Filtering
Listing runs with query params is useful for simple queries, but doesn't support many common needs, such as filtering by metadata, tags, or other fields.
LangSmith supports a filter query language to permit more complex filtering operations when fetching runs. This guide will provide a high level overview of the grammar as well as a few examples of when it can be useful.
If you'd prefer a more visual guide, you can get a taste of the language by viewing the table of runs on any of your projects' pages. We provide some recommended filters to get you started that you can copy and use the SDK.
Grammar
The filtering grammar is based on common comparators on fields in the run object. Supported comparators include:
gte
(greater than or equal to)gt
(greater than)lte
(less than or equal to)lt
(less than)eq
(equal to)neq
(not equal to)has
(check if run contains a tag or metadata json blob)search
(search for a substring in a string field)
Additionally, you can combine multiple comparisons through and
and or
operators.
These can be applied on fields of the run object, such as its id
, name
, run_type
, start_time
/ end_time
, latency
, total_tokens
, error
, execution_order
, tags
, and any associated feedback through feedback_key
and feedback_score
.
Examples
The following examples assume you have configured your environment appropriately and have runs stored in LangSmith.
List all runs called "extractor" whose root of the trace was assigned feedback "user_score" score of 1
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='eq(name, "extractor")',
trace_filter='and(eq(feedback_key, "user_score"), eq(feedback_score, 1))'
)
client.listRuns({
projectName: "<your_project>",
filter: 'eq(name, "extractor")',
traceFilter: 'and(eq(feedback_key, "user_score"), eq(feedback_score, 1))'
})
List runs with "star_rating" key whose score is greater than 4
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='and(eq(feedback_key, "star_rating"), gt(feedback_score, 4))'
)
client.listRuns({
projectName: "<your_project>",
filter: 'and(eq(feedback_key, "star_rating"), gt(feedback_score, 4))'
})
List runs that took longer than 5 seconds to complete
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='gt(latency, "5s")')
client.listRuns({projectName: "<your_project>", filter: 'gt(latency, "5s")'})
List all runs where total_tokens
is greater than 5000
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='gt(total_tokens, 5000)')
client.listRuns({projectName: "<your_project>", filter: 'gt(total_tokens, 5000)'})
List all runs that have "error" not equal to null
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='neq(error, null)')
client.listRuns({projectName: "<your_project>", filter: 'neq(error, null)'})
List all runs where start_time
is greater than a specific timestamp
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='gt(start_time, "2023-07-15T12:34:56Z")')
client.listRuns({projectName: "<your_project>", filter: 'gt(start_time, "2023-07-15T12:34:56Z")'})
List all runs that contain the string "substring"
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='search("substring")')
client.listRuns({projectName: "<your_project>", filter: 'search("substring")'})
List all runs that are tagged with the git hash "2aa1cf4"
- Python
- TypeScript
client.list_runs(project_name="<your_project>", filter='has(tags, "2aa1cf4")')
client.listRuns({projectName: "<your_project>", filter: 'has(tags, "2aa1cf4")'})
List all "chain" type runs that took more than 10 seconds and
had total_tokens
greater than 5000
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='and(eq(run_type, "chain"), gt(latency, 10), gt(total_tokens, 5000))'
)
client.listRuns({
projectName: "<your_project>",
filter: 'and(eq(run_type, "chain"), gt(latency, 10), gt(total_tokens, 5000))'
})
List all runs that started after a specific timestamp and either
have "error" not equal to null or a "Correctness" feedback score equal to 0
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='and(gt(start_time, "2023-07-15T12:34:56Z"), or(neq(error, null), and(eq(feedback_key, "Correctness"), eq(feedback_score, 0.0))))'
)
client.listRuns({
projectName: "<your_project>",
filter: 'and(gt(start_time, "2023-07-15T12:34:56Z"), or(neq(error, null), and(eq(feedback_key, "Correctness"), eq(feedback_score, 0.0))))'
})
Complex query: List all runs where tags
include "experimental" or "beta" and
latency
is greater than 2 seconds
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='and(or(has(tags, "experimental"), has(tags, "beta")), gt(latency, 2))'
)
client.listRuns({
projectName: "<your_project>",
filter: 'and(or(has(tags, "experimental"), has(tags, "beta")), gt(latency, 2))'
})
Search trace trees by full text You can use the search()
function without
any specific field to do a full text search across all string fields in a run. This allows you to quickly find traces that match a search term.
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='search("image classification")'
)
client.listRuns({
projectName: "<your_project>",
filter: 'search("image classification")'
})
Check for presence of metadata
If you want to check for the presence of metadata, you can use the eq
operator, optionally with an and
statement to match by value. This is useful if you want to log more structured information
about your runs.
- Python
- TypeScript
to_search = {
"user_id": ""
}
# Check for any run with the "user_id" metadata key
client.list_runs(
project_name="default",
filter="eq(metadata_key, 'user_id')"
)
# Check for runs with user_id=4070f233-f61e-44eb-bff1-da3c163895a3
client.list_runs(
project_name="default",
filter="and(eq(metadata_key, 'user_id'), eq(metadata_value, '4070f233-f61e-44eb-bff1-da3c163895a3'))"
)
// Check for any run with the "user_id" metadata key
client.listRuns({
projectName: 'default',
filter: `eq(metadata_key, 'user_id')`
});
// Check for runs with user_id=4070f233-f61e-44eb-bff1-da3c163895a3
client.listRuns({
projectName: 'default',
filter: `and(eq(metadata_key, 'user_id'), eq(metadata_value, '4070f233-f61e-44eb-bff1-da3c163895a3'))`
});
Check for environment details in metadata.
A common pattern is to add environment information to your traces via metadata. If you want to filter for runs containing environment metadata, you can use the same pattern as above:
- Python
- TypeScript
client.list_runs(
project_name="default",
filter="and(eq(metadata_key, 'environment'), eq(metadata_value, 'production'))"
)
client.listRuns({
projectName: 'default',
filter: `and(eq(metadata_key, 'environment'), eq(metadata_value, 'production'))`
});
Check for conversation ID in metadata
Another common way to associate traces in the same conversation is by using a shared conversation ID. If you want to filter runs based on a conversation ID in this way, you can search for that ID in the metadata.
- Python
- TypeScript
client.list_runs(
project_name="default",
filter="and(eq(metadata_key, 'conversation_id'), eq(metadata_value, 'a1b2c3d4-e5f6-7890'))"
)
client.listRuns({
projectName: 'default',
filter: `and(eq(metadata_key, 'conversation_id'), eq(metadata_value, 'a1b2c3d4-e5f6-7890'))`
});
Combine multiple filters
If you want to combine multiple conditions to refine your search, you can use the and
operator along with other
filtering functions. Here's how you can search for runs named "ChatOpenAI" that also
have a specific conversation_id
in their metadata:
- Python
- TypeScript
client.list_runs(
project_name="default",
filter="and(eq(name, 'ChatOpenAI'), eq(metadata_key, 'conversation_id'), eq(metadata_value, '69b12c91-b1e2-46ce-91de-794c077e8151'))"
)
client.listRuns({
projectName: 'default',
filter: `and(eq(name, 'ChatOpenAI'), eq(metadata_key, 'conversation_id'), eq(metadata_value, '69b12c91-b1e2-46ce-91de-794c077e8151'))`
});
Tree Filter
List all runs named "RetrieveDocs" whose root run has a "user_score" feedback of 1 and any run in the full trace is named "ExpandQuery".
This type of query is useful if you want to extract a specific run conditional on various states or steps being reached within the trace.
- Python
- TypeScript
client.list_runs(
project_name="<your_project>",
filter='eq(name, "RetrieveDocs")',
trace_filter='and(eq(feedback_key, "user_score"), eq(feedback_score, 1))',
tree_filter='eq(name, "ExpandQuery")'
)
client.listRuns({
projectName: "<your_project>",
filter: 'eq(name, "RetrieveDocs")',
traceFilter: 'and(eq(feedback_key, "user_score"), eq(feedback_score, 1))',
treeFilter: 'eq(name, "ExpandQuery")'
})
Advanced Examples
Export flattened trace view with child tool usage
The following Python example demonstrates how to export a flattened view of traces, including information on the tools (from nested runs) used by the agent within each trace. This can be used to analyze the behavior of your agents across multiple traces.
This example queries all tool runs within a specified number of days and groups them by their parent (root) run ID. It then fetches the relevant information for each root run, such as the run name, inputs, outputs, and combines that information with the child run information.
To optimize the query, the example:
- Selects only the necessary fields when querying tool runs to reduce query time.
- Fetches root runs in batches while processing tool runs concurrently.
- Python
from collections import defaultdict
from concurrent.futures import Future, ThreadPoolExecutor
from datetime import datetime, timedelta
from langsmith import Client
from tqdm.auto import tqdm
client = Client()
project_name = "my-project"
num_days = 30
# List all tool runs
tool_runs = client.list_runs(
project_name=project_name,
start_time=datetime.now() - timedelta(days=num_days),
run_type="tool",
# We don't need to fetch inputs, outputs, and other values that # may increase the query time
select=["trace_id", "name", "run_type"],
)
data = []
futures: list[Future] = []
trace_cursor = 0
trace_batch_size = 50
tool_runs_by_parent = defaultdict(lambda: defaultdict(set))
# Do not exceed rate limit
with ThreadPoolExecutor(max_workers=2) as executor:
# Group tool runs by parent run ID
for run in tqdm(tool_runs):
# Collect all tools invoked within a given trace
tool_runs_by_parent[run.trace_id]["tools_involved"].add(run.name)
# maybe send a batch of parent run IDs to the server
# this lets us query for the root runs in batches
# while still processing the tool runs
if len(tool_runs_by_parent) % trace_batch_size == 0:
if this_batch := list(tool_runs_by_parent.keys())[
trace_cursor : trace_cursor + trace_batch_size
]:
trace_cursor += trace_batch_size
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=["name", "inputs", "outputs", "run_type"],
)
)
if this_batch := list(tool_runs_by_parent.keys())[trace_cursor:]:
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=["name", "inputs", "outputs", "run_type"],
)
)
for future in tqdm(futures):
root_runs = future.result()
for root_run in root_runs:
root_data = tool_runs_by_parent[root_run.id]
data.append(
{
"run_id": root_run.id,
"run_name": root_run.name,
"run_type": root_run.run_type,
"inputs": root_run.inputs,
"outputs": root_run.outputs,
"tools_involved": list(root_data["tools_involved"]),
}
)
# (Optional): Convert to a pandas DataFrame
import pandas as pd
df = pd.DataFrame(data)
df.head()
Export retriever inputs/outputs for traces with a specific feedback score
This query is useful if you want to fine-tune embeddings or diagnose end-to-end system performance issues based on retriever behavior. The following Python example demonstrates how to export retriever inputs and outputs within traces that have a specific feedback score.
- Python
from collections import defaultdict
from concurrent.futures import Future, ThreadPoolExecutor
from datetime import datetime, timedelta
import pandas as pd
from langsmith import Client
from tqdm.auto import tqdm
client = Client()
project_name = "your-project-name"
num_days = 1
# List all tool runs
retriever_runs = client.list_runs(
project_name=project_name,
start_time=datetime.now() - timedelta(days=num_days),
run_type="retriever",
# This time we do want to fetch the inputs and outputs, since they
# may be adjusted by query expansion steps.
select=["trace_id", "name", "run_type", "inputs", "outputs"],
trace_filter='eq(feedback_key, "user_score")',
)
data = []
futures: list[Future] = []
trace_cursor = 0
trace_batch_size = 50
retriever_runs_by_parent = defaultdict(lambda: defaultdict(list))
# Do not exceed rate limit
with ThreadPoolExecutor(max_workers=2) as executor:
# Group retriever runs by parent run ID
for run in tqdm(retriever_runs):
# Collect all retriever calls invoked within a given trace
for k, v in run.inputs.items():
retriever_runs_by_parent[run.trace_id][f"retriever.inputs.{k}"].append(v)
for k, v in (run.outputs or {}).items():
# Extend the docs
retriever_runs_by_parent[run.trace_id][f"retriever.outputs.{k}"].extend(v)
# maybe send a batch of parent run IDs to the server
# this lets us query for the root runs in batches
# while still processing the retriever runs
if len(retriever_runs_by_parent) % trace_batch_size == 0:
if this_batch := list(retriever_runs_by_parent.keys())[
trace_cursor : trace_cursor + trace_batch_size
]:
trace_cursor += trace_batch_size
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=[
"name",
"inputs",
"outputs",
"run_type",
"feedback_stats",
],
)
)
if this_batch := list(retriever_runs_by_parent.keys())[trace_cursor:]:
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=["name", "inputs", "outputs", "run_type"],
)
)
for future in tqdm(futures):
root_runs = future.result()
for root_run in root_runs:
root_data = retriever_runs_by_parent[root_run.id]
feedback = {
f"feedback.{k}": v.get("avg")
for k, v in (root_run.feedback_stats or {}).items()
}
inputs = {f"inputs.{k}": v for k, v in root_run.inputs.items()}
outputs = {f"outputs.{k}": v for k, v in (root_run.outputs or {}).items()}
data.append(
{
"run_id": root_run.id,
"run_name": root_run.name,
**inputs,
**outputs,
**feedback,
**root_data,
}
)
# (Optional): Convert to a pandas DataFrame
import pandas as pd
df = pd.DataFrame(data)
df.head()