test#

langsmith.testing._internal.test(func: Callable) Callable[source]#
langsmith.testing._internal.test(*, id: UUID | None = None, output_keys: Sequence[str] | None = None, client: Client | None = None, test_suite_name: str | None = None) Callable[[Callable], Callable]

Trace a pytest test case in LangSmith.

This decorator is used to trace a pytest test to LangSmith. It ensures that the necessary example data is created and associated with the test function. The decorated function will be executed as a test case, and the results will be recorded and reported by LangSmith.

Parameters:
  • id (-) – A unique identifier for the test case. If not provided, an ID will be generated based on the test function’s module and name.

  • output_keys (-) – A list of keys to be considered as the output keys for the test case. These keys will be extracted from the test function’s inputs and stored as the expected outputs.

  • client (-) – An instance of the LangSmith client to be used for communication with the LangSmith service. If not provided, a default client will be used.

  • test_suite_name (-) – The name of the test suite to which the test case belongs. If not provided, the test suite name will be determined based on the environment or the package name.

Returns:

The decorated test function.

Return type:

Callable

Environment:
  • LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to

    save time and costs during testing. Recommended to commit the cache files to your repository for faster CI/CD runs. Requires the ‘langsmith[vcr]’ package to be installed.

  • LANGSMITH_TEST_TRACKING: Set this variable to the path of a directory
    to enable caching of test results. This is useful for re-running tests

    without re-executing the code. Requires the ‘langsmith[vcr]’ package.

Example

For basic usage, simply decorate a test function with @pytest.mark.langsmith. Under the hood this will call the test method:

import pytest


# Equivalently can decorate with `test` directly:
# from langsmith import test
# @test
@pytest.mark.langsmith
def test_addition():
    assert 3 + 4 == 7

Any code that is traced (such as those traced using @traceable or wrap_* functions) will be traced within the test case for improved visibility and debugging.

import pytest
from langsmith import traceable


@traceable
def generate_numbers():
    return 3, 4


@pytest.mark.langsmith
def test_nested():
    # Traced code will be included in the test case
    a, b = generate_numbers()
    assert a + b == 7

LLM calls are expensive! Cache requests by setting LANGSMITH_TEST_CACHE=path/to/cache. Check in these files to speed up CI/CD pipelines, so your results only change when your prompt or requested model changes.

Note that this will require that you install langsmith with the vcr extra:

pip install -U “langsmith[vcr]”

Caching is faster if you install libyaml. See https://vcrpy.readthedocs.io/en/latest/installation.html#speed for more details.

# os.environ["LANGSMITH_TEST_CACHE"] = "tests/cassettes"
import openai
import pytest
from langsmith import wrappers

oai_client = wrappers.wrap_openai(openai.Client())


@pytest.mark.langsmith
def test_openai_says_hello():
    # Traced code will be included in the test case
    response = oai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )
    assert "hello" in response.choices[0].message.content.lower()

LLMs are stochastic. Naive assertions are flakey. You can use langsmith’s expect to score and make approximate assertions on your results.

import pytest
from langsmith import expect


@pytest.mark.langsmith
def test_output_semantically_close():
    response = oai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )
    # The embedding_distance call logs the embedding distance to LangSmith
    expect.embedding_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # The following optional assertion logs a
        # pass/fail score to LangSmith
        # and raises an AssertionError if the assertion fails.
    ).to_be_less_than(1.0)
    # Compute damerau_levenshtein distance
    expect.edit_distance(
        prediction=response.choices[0].message.content,
        reference="Hello!",
        # And then log a pass/fail score to LangSmith
    ).to_be_less_than(1.0)

The @test decorator works natively with pytest fixtures. The values will populate the “inputs” of the corresponding example in LangSmith.

import pytest


@pytest.fixture
def some_input():
    return "Some input"


@pytest.mark.langsmith
def test_with_fixture(some_input: str):
    assert "input" in some_input

You can still use pytest.parametrize() as usual to run multiple test cases using the same test function.

import pytest


@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.parametrize(
    "a, b, expected",
    [
        (1, 2, 3),
        (3, 4, 7),
    ],
)
def test_addition_with_multiple_inputs(a: int, b: int, expected: int):
    assert a + b == expected

By default, each test case will be assigned a consistent, unique identifier based on the function name and module. You can also provide a custom identifier using the id argument:

import pytest
import uuid

example_id = uuid.uuid4()


@pytest.mark.langsmith(id=str(example_id))
def test_multiplication():
    assert 3 * 4 == 12

By default, all test inputs are saved as “inputs” to a dataset. You can specify the output_keys argument to persist those keys within the dataset’s “outputs” fields.

import pytest


@pytest.fixture
def expected_output():
    return "input"


@pytest.mark.langsmith(output_keys=["expected_output"])
def test_with_expected_output(some_input: str, expected_output: str):
    assert expected_output in some_input

To run these tests, use the pytest CLI. Or directly run the test functions.

test_output_semantically_close()
test_addition()
test_nested()
test_with_fixture("Some input")
test_with_expected_output("Some input", "Some")
test_multiplication()
test_openai_says_hello()
test_addition_with_multiple_inputs(1, 2, 3)