Skip to main content

Exporting a Test to CSV

If you want to export test/experiment results to a CSV for later analysis or reporting, you can use the client's beta get_test_results utility.

The basic code snippet is as follows:

import langsmith

client = langsmith.Client()

# Project here is the test / experiment name
df = client.get_test_results(project_name="My Project")
df.to_csv("results.csv")

For more control over the structure and content of the fields, check out the Downloading Feedback and Examples notebook.

We will review a quick example below.

%pip install -U langsmith langchain pandas
import os
import uuid

# Adjust if self-hosted
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_API_KEY"] = "YOUR API KEY"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"

Example Dataset

Our toy example will be testing whether the chain can compute the n'th fibonnaci number. You can skip this section if you already have a test you wish to export.

def fibonacci(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
import openai
from langchain.smith import RunEvalConfig
from langsmith import Client, traceable
from langsmith.wrappers import wrap_openai

client = Client()
openai_client = wrap_openai(openai.Client())

# Dataset

test_name = "My test data"
dataset_name = f"My Dataset - {uuid.uuid4().hex[:6]}"
ds = client.create_dataset(dataset_name=dataset_name)
client.create_examples(
inputs=[{"n": i} for i in range(10)],
outputs=[{"expected": fibonacci(i)} for i in range(10)],
dataset_id=ds.id,
)


# Evaluator
def exact_match(run, example):
score = run.outputs["output"] == example.outputs["expected"]
return {"score": score}


eval_config = RunEvalConfig(evaluators=[exact_match])

# Model/chain we're testing


@traceable
def llm_fibonacci(n: int):
completion = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": f"Compute the {n}'th fibonacci number. "
"Think step by step, the n print the output. Finally, return the number at the end on a newline."
" The final line MUST be parseable with python's int() function.",
}
],
)
result = completion.choices[0].message.content
return {"output": int(result.split()[-1].strip())}


# Evaluate
test_results = client.run_on_dataset(
dataset_name=dataset_name,
project_name=test_name,
llm_or_chain_factory=llm_fibonacci,
evaluation=eval_config,
)
# You could directly output as a csv using
# test_results.to_dataframe().to_csv(...)
# But we assume you want to export post-facto
View the evaluation results for project 'My test data' at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/b9ecc8d0-1e0c-4094-999c-cfdca9472c7b/compare?selectedSessions=11155a34-0a71-49fb-a445-5fa62867e467

View all tests for Dataset My Dataset - 855dcc at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/b9ecc8d0-1e0c-4094-999c-cfdca9472c7b
[------------------------------------------------->] 10/10

Export

df = client.get_test_results(project_name=test_name)
df.to_csv("results.csv", index=False)
!cat results.csv
reference.expected,input.n,outputs.output,feedback.exact_match,execution_time,error,id
13,7,7,0.0,1.426419,,11193006-9afb-4896-8f01-22575a505e74
1,1,55,0.0,5.490039,,d88763a0-d07d-42b3-aaf9-47fc0fd39968
21,8,21,1.0,1.262267,,019a26a6-709b-4db9-9017-18c4373502ff
1,2,1,1.0,2.711957,,930f4bb2-3c05-4e29-9c23-96d718877ea6
34,9,9,0.0,1.861771,,a134dc53-9a68-4528-abf9-fcb5da99af98
3,4,3,1.0,3.174816,,3e6551a3-b293-4c3e-ab6c-cc6c598625b2
5,5,5,1.0,1.634533,,ca084860-155f-483a-9a21-5b0d36b30998
2,3,2,1.0,2.088853,,8e56a900-4364-4f8c-b296-5bd216b52241
0,0,0,1.0,1.809274,,86e44d51-1253-441b-a672-2b5dda04a14f
8,6,5,0.0,4.030962,,1e9147fe-6e33-411a-9f04-98083fb6a489

Conclusion

Congrats! You've exported a flat table of your test results. The beta get_test_results utility lets you easily export your Langsmith test results to a CSV file. This can be handy if you wan to:

  • Perform custom analysis and interpretation of the evaluation metrics
  • Create visualizations using tools like matplotlib or plotly to better understand performance
  • Share the test results with partners, leaders, or other stakeholders
  • Include the exported data in research papers or reports

We plan on addition additional first-class support for this type of reporting in the UI, but there's no need to wait, since the API/client let you do it today.


Was this page helpful?


You can leave detailed feedback on GitHub.