Skip to main content

Run an evaluation with large file inputs

LangSmith supports creating dataset examples with file attachments, which you can consume when running evals over that dataset.

Attachments are most useful when working with LLM applications that require multimodal inputs or produce multimodal outputs. While multimodal data can be base64 encoded and uploaded as part of an example's inputs/outputs, base64 encodings are fairly space inefficient relative to the underlying binary data making them slower to upload/download to/from LangSmith. By using attachments you can speed up uploads/downloads and get nicer renderings of different file types in the LangSmith UI.

Create examples with attachments

Using the SDK

To upload examples with attachments using the SDK, use the create_examples / update_examples Python methods or the uploadExamplesMultipart / updateExamplesMultipart TypeScript methods.

Requires langsmith>=0.3.13

import requests
import uuid
from pathlib import Path
from langsmith import Client

# Publicly available test files
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
wav_url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"

# Fetch the files as bytes
pdf_bytes = requests.get(pdf_url).content
wav_bytes = requests.get(wav_url).content

# Create the dataset
ls_client = Client()
dataset_name = "attachment-test-dataset"
dataset = ls_client.create_dataset(
dataset_name=dataset_name,
description="Test dataset for evals with publicly available attachments",
)

# Define an example with attachments
example_id = uuid.uuid4()
example = {
"id": example_id,
"inputs": {
"audio_question": "What is in this audio clip?",
"image_question": "What is in this image?",
},
"outputs": {
"audio_answer": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.",
"image_answer": "A mug with a blanket over it.",
},
"attachments": {
"my_pdf": {"mime_type": "application/pdf", "data": pdf_bytes},
"my_wav": {"mime_type": "audio/wav", "data": wav_bytes),
# Example of an attachment specified via a local file path:
# "my_img": {"mime_type": "image/png", "data": Path(__file__).parent / "my_img.png"},
},
)

# Create the example
ls_client.create_examples(
dataset_id=dataset.id,
examples=[example],
# Uncomment this flag if you'd like to upload attachments from local files:
# dangerously_allow_filesystem=True
)
Uploading from filesystem

Along with being passed in as bytes, attachments can be specified as paths to local files. To do so pass in a path for the attachment data value and specify arg dangerously_allow_filesystem=True:

client.create_examples(..., dangerously_allow_filesystem=True)

Once you upload examples with attachments, you can view them in the LangSmith UI. Each attachment will be rendered as a file with a preview, making it easy to inspect the contents.

From the UI

From existing runs

When adding runs to a LangSmith dataset, attachments can be selectively propagated from the source run to the destination example. To learn more, please see this guide.

From scratch

You can also upload examples with attachments directly from the LangSmith UI. You can do so by clicking the + Example button in the Examples tab of the dataset UI. You can then upload the attachments that you want by using the "Upload Files" button:

Run evaluations

Once you have a dataset that contains examples with file attachments, you can run evaluations that process these attachments.

Define a target function

Now that we have a dataset that includes examples with attachments, we can define a target function to run over these examples. The following example simply uses OpenAI's GPT-4o model to answer questions about an image and an audio clip.

The target function you are evaluating must have two positional arguments in order to consume the attachments associated with the example, the first must be called inputs and the second must be called attachments.

  • The inputs argument is a dictionary that contains the input data for the example, excluding the attachments.
  • The attachments argument is a dictionary that maps the attachment name to a dictionary containing a presigned url, mime_type, and a reader of the bytes content of the file. You can use either the presigned url or the reader to get the file contents. Each value in the attachments dictionary is a dictionary with the following structure:
{
    "presigned_url": str,
    "mime_type": str,
    "reader": BinaryIO
}
from langsmith.wrappers import wrap_openai

import base64
from openai import OpenAI

client = wrap_openai(OpenAI())

# Define target function that uses attachments
def file_qa(inputs, attachments): # Read the audio bytes from the reader and encode them in base64
audio_reader = attachments["my_wav"]["reader"]
audio_b64 = base64.b64encode(audio_reader.read()).decode('utf-8')
audio_completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": inputs["audio_question"]
},
{
"type": "input_audio",
"input_audio": {
"data": audio_b64,
"format": "wav"
}
}
]
}
]

# Most models support taking in an image URL directly in addition to base64 encoded images
# You can pipe the image pre-signed URL directly to the model
image_url = attachments["my_img"]["presigned_url"]
image_completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": inputs["image_question"]},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
}
],
)

return {
"audio_answer": audio_completion.choices[0].message.content,
"image_answer": image_completion.choices[0].message.content,
}

Define custom evaluators

In addition to using attachments inside of your target function, you can also use them inside of your evaluators as follows. The exact same rules apply as above to determine whether the evaluator should receive attachments.

The evaluator below uses an LLM to judge if the reasoning and the answer are consistent. To learn more about how to define llm-based evaluators, please see this guide.

# Assumes you've installed pydantic
from pydantic import BaseModel

def valid_image_description(outputs: dict, attachments: dict) -> bool:
"""Use an LLM to judge if the image description and ime are consistent."""

instructions = """
Does the description of the following image make sense?
Please carefully review the image and the description to determine if the description is valid."""

class Response(BaseModel):
description_is_valid: bool

image_url = attachments["my_img"]["presigned_url"]
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": instructions
},
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "text", "text": outputs["image_answer"]}
]
}
],
response_format=Response
)

return response.choices[0].message.parsed.description_is_valid

ls_client.evaluate(
file_qa,
data=dataset_name,
evaluators=[valid_image_description],
)

Update examples with attachments

Using the SDK

In the code above, we showed how to add examples with attachments to a dataset. It is also possible to update these same examples using the SDK.

As with existing examples, datasets are versioned when you update them with attachments. Therefore, you can navigate to the dataset version history to see the changes made to each example. To learn more, please see this guide.

When updating an example with attachments, you can update attachments in a few different ways:

  • Pass in new attachments
  • Rename existing attachments
  • Delete existing attachments

Note that:

  • Any existing attachments that are not explicitly renamed or retained will be deleted.
  • An error will be raised if you pass in a non-existent attachment name to retain or rename.
  • New attachments take precedence over existing attachments in case the same attachment name appears in the attachments and attachment_operations fields.
example_update = {
"id": example_id,
"attachments": {
# These are net new attachments
"my_new_file": ("text/plain", b"foo bar"),
},
# Any attachments not in rename/retain will be deleted.
# In this case, that would be "my_img" if we uploaded it.
"attachments_operations": (
# Retained attachments will stay exactly the same
"retain": ["my_pdf"],
# Renaming attachments preserves the original data
"rename": {
"my_wav": "my_new_wav",
}
),
)

ls_client.update_examples(dataset_id=dataset.id, updates=[example_update])

From the UI

Attachment Size Limit

Attachments are limited to 20MB in size in the UI.

When editing an example in the UI, you can upload new attachments, rename and delete attachemnts, and there is also a quick reset button to restore the attachments to what previously existed on the example. No changes will be saved until you click submit.


Was this page helpful?


You can leave detailed feedback on GitHub.