[docs]deftest(*args:Any,**kwargs:Any)->Callable:"""Trace a pytest test case in LangSmith. This decorator is used to trace a pytest test to LangSmith. It ensures that the necessary example data is created and associated with the test function. The decorated function will be executed as a test case, and the results will be recorded and reported by LangSmith. Args: - id (Optional[uuid.UUID]): A unique identifier for the test case. If not provided, an ID will be generated based on the test function's module and name. - output_keys (Optional[Sequence[str]]): A list of keys to be considered as the output keys for the test case. These keys will be extracted from the test function's inputs and stored as the expected outputs. - client (Optional[ls_client.Client]): An instance of the LangSmith client to be used for communication with the LangSmith service. If not provided, a default client will be used. - test_suite_name (Optional[str]): The name of the test suite to which the test case belongs. If not provided, the test suite name will be determined based on the environment or the package name. Returns: Callable: The decorated test function. Environment: - LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and costs during testing. Recommended to commit the cache files to your repository for faster CI/CD runs. Requires the 'langsmith[vcr]' package to be installed. - LANGSMITH_TEST_TRACKING: Set this variable to the path of a directory to enable caching of test results. This is useful for re-running tests without re-executing the code. Requires the 'langsmith[vcr]' package. Example: For basic usage, simply decorate a test function with `@pytest.mark.langsmith`. Under the hood this will call the `test` method: .. code-block:: python import pytest # Equivalently can decorate with `test` directly: # from langsmith import test # @test @pytest.mark.langsmith def test_addition(): assert 3 + 4 == 7 Any code that is traced (such as those traced using `@traceable` or `wrap_*` functions) will be traced within the test case for improved visibility and debugging. .. code-block:: python import pytest from langsmith import traceable @traceable def generate_numbers(): return 3, 4 @pytest.mark.langsmith def test_nested(): # Traced code will be included in the test case a, b = generate_numbers() assert a + b == 7 LLM calls are expensive! Cache requests by setting `LANGSMITH_TEST_CACHE=path/to/cache`. Check in these files to speed up CI/CD pipelines, so your results only change when your prompt or requested model changes. Note that this will require that you install langsmith with the `vcr` extra: `pip install -U "langsmith[vcr]"` Caching is faster if you install libyaml. See https://vcrpy.readthedocs.io/en/latest/installation.html#speed for more details. .. code-block:: python # os.environ["LANGSMITH_TEST_CACHE"] = "tests/cassettes" import openai import pytest from langsmith import wrappers oai_client = wrappers.wrap_openai(openai.Client()) @pytest.mark.langsmith def test_openai_says_hello(): # Traced code will be included in the test case response = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say hello!"}, ], ) assert "hello" in response.choices[0].message.content.lower() LLMs are stochastic. Naive assertions are flakey. You can use langsmith's `expect` to score and make approximate assertions on your results. .. code-block:: python import pytest from langsmith import expect @pytest.mark.langsmith def test_output_semantically_close(): response = oai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say hello!"}, ], ) # The embedding_distance call logs the embedding distance to LangSmith expect.embedding_distance( prediction=response.choices[0].message.content, reference="Hello!", # The following optional assertion logs a # pass/fail score to LangSmith # and raises an AssertionError if the assertion fails. ).to_be_less_than(1.0) # Compute damerau_levenshtein distance expect.edit_distance( prediction=response.choices[0].message.content, reference="Hello!", # And then log a pass/fail score to LangSmith ).to_be_less_than(1.0) The `@test` decorator works natively with pytest fixtures. The values will populate the "inputs" of the corresponding example in LangSmith. .. code-block:: python import pytest @pytest.fixture def some_input(): return "Some input" @pytest.mark.langsmith def test_with_fixture(some_input: str): assert "input" in some_input You can still use pytest.parametrize() as usual to run multiple test cases using the same test function. .. code-block:: python import pytest @pytest.mark.langsmith(output_keys=["expected"]) @pytest.mark.parametrize( "a, b, expected", [ (1, 2, 3), (3, 4, 7), ], ) def test_addition_with_multiple_inputs(a: int, b: int, expected: int): assert a + b == expected By default, each test case will be assigned a consistent, unique identifier based on the function name and module. You can also provide a custom identifier using the `id` argument: .. code-block:: python import pytest import uuid example_id = uuid.uuid4() @pytest.mark.langsmith(id=str(example_id)) def test_multiplication(): assert 3 * 4 == 12 By default, all test inputs are saved as "inputs" to a dataset. You can specify the `output_keys` argument to persist those keys within the dataset's "outputs" fields. .. code-block:: python import pytest @pytest.fixture def expected_output(): return "input" @pytest.mark.langsmith(output_keys=["expected_output"]) def test_with_expected_output(some_input: str, expected_output: str): assert expected_output in some_input To run these tests, use the pytest CLI. Or directly run the test functions. .. code-block:: python test_output_semantically_close() test_addition() test_nested() test_with_fixture("Some input") test_with_expected_output("Some input", "Some") test_multiplication() test_openai_says_hello() test_addition_with_multiple_inputs(1, 2, 3) """langtest_extra=_UTExtra(id=kwargs.pop("id",None),output_keys=kwargs.pop("output_keys",None),client=kwargs.pop("client",None),test_suite_name=kwargs.pop("test_suite_name",None),cache=ls_utils.get_cache_dir(kwargs.pop("cache",None)),)ifkwargs:warnings.warn(f"Unexpected keyword arguments: {kwargs.keys()}")disable_tracking=ls_utils.test_tracking_is_disabled()ifdisable_tracking:logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping LangSmith test tracking.")defdecorator(func:Callable)->Callable:ifinspect.iscoroutinefunction(func):@functools.wraps(func)asyncdefasync_wrapper(*test_args:Any,request:Any=None,**test_kwargs:Any):ifdisable_tracking:returnawaitfunc(*test_args,**test_kwargs)await_arun_test(func,*test_args,pytest_request=request,**test_kwargs,langtest_extra=langtest_extra,)returnasync_wrapper@functools.wraps(func)defwrapper(*test_args:Any,request:Any=None,**test_kwargs:Any):ifdisable_tracking:returnfunc(*test_args,**test_kwargs)_run_test(func,*test_args,pytest_request=request,**test_kwargs,langtest_extra=langtest_extra,)returnwrapperifargsandcallable(args[0]):returndecorator(args[0])returndecorator
## Private functionsdef_get_experiment_name(test_suite_name:str)->str:# If this is a pytest-xdist multi-process run then we need to create the same# experiment name across processes. We can do this by accessing the# PYTEST_XDIST_TESTRUNID env var.ifos.environ.get("PYTEST_XDIST_TESTRUNUID")andimportlib.util.find_spec("xdist"):id_name=test_suite_name+os.environ["PYTEST_XDIST_TESTRUNUID"]id_=str(uuid.uuid5(uuid.NAMESPACE_DNS,id_name).hex[:8])else:id_=str(uuid.uuid4().hex[:8])ifos.environ.get("LANGSMITH_EXPERIMENT"):prefix=os.environ["LANGSMITH_EXPERIMENT"]else:prefix=ls_utils.get_tracer_project(False)or"TestSuiteResult"name=f"{prefix}:{id_}"returnnamedef_get_test_suite_name(func:Callable)->str:test_suite_name=ls_utils.get_env_var("TEST_SUITE")iftest_suite_name:returntest_suite_namerepo_name=ls_env.get_git_info()["repo_name"]try:mod=inspect.getmodule(func)ifmod:returnf"{repo_name}.{mod.__name__}"exceptBaseException:logger.debug("Could not determine test suite name from file path.")raiseValueError("Please set the LANGSMITH_TEST_SUITE environment variable.")def_get_test_suite(client:ls_client.Client,test_suite_name:str)->ls_schemas.Dataset:ifclient.has_dataset(dataset_name=test_suite_name):returnclient.read_dataset(dataset_name=test_suite_name)else:repo=ls_env.get_git_info().get("remote_url")or""description="Test suite"ifrepo:description+=f" for {repo}"try:returnclient.create_dataset(dataset_name=test_suite_name,description=description,metadata={"__ls_runner":"pytest"},)exceptls_utils.LangSmithConflictError:returnclient.read_dataset(dataset_name=test_suite_name)def_start_experiment(client:ls_client.Client,test_suite:ls_schemas.Dataset,)->ls_schemas.TracerSession:experiment_name=_get_experiment_name(test_suite.name)try:returnclient.create_project(experiment_name,reference_dataset_id=test_suite.id,description="Test Suite Results.",metadata={"revision_id":ls_env.get_langchain_env_var_metadata().get("revision_id"),"__ls_runner":"pytest",},)exceptls_utils.LangSmithConflictError:returnclient.read_project(project_name=experiment_name)def_get_example_id(func:Callable,inputs:Optional[dict],suite_id:uuid.UUID)->Tuple[uuid.UUID,str]:try:file_path=str(Path(inspect.getfile(func)).relative_to(Path.cwd()))exceptValueError:# Fall back to module name if file path is not availablefile_path=func.__module__identifier=f"{suite_id}{file_path}::{func.__name__}"# If parametrized test, need to add inputs to identifier:ifhasattr(func,"pytestmark")andany(m.name=="parametrize"forminfunc.pytestmark):identifier+=_stringify(inputs)returnuuid.uuid5(uuid.NAMESPACE_DNS,identifier),identifier[len(str(suite_id)):]def_end_tests(test_suite:_LangSmithTestSuite):git_info=ls_env.get_git_info()or{}test_suite.shutdown()dataset_version=test_suite.get_dataset_version()dataset_id=test_suite._dataset.idtest_suite.client.update_project(test_suite.experiment_id,metadata={**git_info,"dataset_version":dataset_version,"revision_id":ls_env.get_langchain_env_var_metadata().get("revision_id"),"__ls_runner":"pytest",},)ifdataset_versionandgit_info["commit"]isnotNone:test_suite.client.update_dataset_tag(dataset_id=dataset_id,as_of=dataset_version,tag=f'git:commit:{git_info["commit"]}',)ifdataset_versionandgit_info["branch"]isnotNone:test_suite.client.update_dataset_tag(dataset_id=dataset_id,as_of=dataset_version,tag=f'git:branch:{git_info["branch"]}',)VT=TypeVar("VT",bound=Optional[dict])def_serde_example_values(values:VT)->VT:ifvaluesisNone:returnvaluesbts=ls_client._dumps_json(values)return_orjson.loads(bts)class_LangSmithTestSuite:_instances:Optional[dict]=None_lock=threading.RLock()def__init__(self,client:Optional[ls_client.Client],experiment:ls_schemas.TracerSession,dataset:ls_schemas.Dataset,):self.client=clientorrt.get_cached_client()self._experiment=experimentself._dataset=datasetself._dataset_version:Optional[datetime.datetime]=dataset.modified_atself._executor=ls_utils.ContextThreadPoolExecutor()atexit.register(_end_tests,self)@propertydefid(self):returnself._dataset.id@propertydefexperiment_id(self):returnself._experiment.id@propertydefexperiment(self):returnself._experiment@classmethoddeffrom_test(cls,client:Optional[ls_client.Client],func:Callable,test_suite_name:Optional[str]=None,)->_LangSmithTestSuite:client=clientorrt.get_cached_client()test_suite_name=test_suite_nameor_get_test_suite_name(func)withcls._lock:ifnotcls._instances:cls._instances={}iftest_suite_namenotincls._instances:test_suite=_get_test_suite(client,test_suite_name)experiment=_start_experiment(client,test_suite)cls._instances[test_suite_name]=cls(client,experiment,test_suite)returncls._instances[test_suite_name]@propertydefname(self):returnself._experiment.namedefget_dataset_version(self):returnself._dataset_versiondefsubmit_result(self,run_id:uuid.UUID,error:Optional[str]=None,skipped:bool=False,pytest_plugin:Any=None,pytest_nodeid:Any=None,)->None:ifskipped:score=Nonestatus="skipped"eliferror:score=0status="failed"else:score=1status="passed"ifpytest_pluginandpytest_nodeid:pytest_plugin.update_process_status(pytest_nodeid,{"status":status})self._executor.submit(self._submit_result,run_id,score)def_submit_result(self,run_id:uuid.UUID,score:Optional[int])->None:self.client.create_feedback(run_id,key="pass",score=score)defsync_example(self,example_id:uuid.UUID,*,inputs:Optional[dict]=None,outputs:Optional[dict]=None,metadata:Optional[dict]=None,pytest_plugin=None,pytest_nodeid=None,)->None:inputs=inputsor{}ifpytest_pluginandpytest_nodeid:update={"inputs":inputs,"reference_outputs":outputs}update={k:vfork,vinupdate.items()ifvisnotNone}pytest_plugin.update_process_status(pytest_nodeid,update)metadata=metadata.copy()ifmetadataelsemetadatainputs=_serde_example_values(inputs)outputs=_serde_example_values(outputs)try:example=self.client.read_example(example_id=example_id)exceptls_utils.LangSmithNotFoundError:example=self.client.create_example(example_id=example_id,inputs=inputs,outputs=outputs,dataset_id=self.id,metadata=metadata,created_at=self._experiment.start_time,)else:if((inputs!=example.inputs)or(outputsisnotNoneandoutputs!=example.outputs)or(metadataisnotNoneandmetadata!=example.metadata)orstr(example.dataset_id)!=str(self.id)):self.client.update_example(example_id=example.id,inputs=inputs,outputs=outputs,metadata=metadata,dataset_id=self.id,)example=self.client.read_example(example_id=example.id)ifself._dataset_versionisNone:self._dataset_version=example.modified_atelif(example.modified_atandself._dataset_versionandexample.modified_at>self._dataset_version):self._dataset_version=example.modified_atdef_submit_feedback(self,run_id:ID_TYPE,feedback:Union[dict,list],pytest_plugin:Any=None,pytest_nodeid:Any=None,**kwargs:Any,):feedback=feedbackifisinstance(feedback,list)else[feedback]forfbinfeedback:ifpytest_pluginandpytest_nodeid:val=fb["score"]if"score"infbelsefb["value"]pytest_plugin.update_process_status(pytest_nodeid,{"feedback":{fb["key"]:val}})self._executor.submit(self._create_feedback,run_id=run_id,feedback=fb,**kwargs)def_create_feedback(self,run_id:ID_TYPE,feedback:dict,**kwargs:Any)->None:self.client.create_feedback(run_id,**feedback,**kwargs)defshutdown(self):self._executor.shutdown()defend_run(self,run_tree,example_id,outputs,reference_outputs,pytest_plugin=None,pytest_nodeid=None,)->Future:returnself._executor.submit(self._end_run,run_tree=run_tree,example_id=example_id,outputs=outputs,reference_outputs=reference_outputs,pytest_plugin=pytest_plugin,pytest_nodeid=pytest_nodeid,)def_end_run(self,run_tree,example_id,outputs,reference_outputs,pytest_plugin,pytest_nodeid,)->None:# TODO: remove this hack so that run durations are correct# Ensure example is fully updatedself.sync_example(example_id,inputs=run_tree.inputs,outputs=reference_outputs)run_tree.end(outputs=outputs)run_tree.patch()class_TestCase:def__init__(self,test_suite:_LangSmithTestSuite,example_id:uuid.UUID,run_id:uuid.UUID,pytest_plugin:Any=None,pytest_nodeid:Any=None,inputs:Optional[dict]=None,reference_outputs:Optional[dict]=None,)->None:self.test_suite=test_suiteself.example_id=example_idself.run_id=run_idself.pytest_plugin=pytest_pluginself.pytest_nodeid=pytest_nodeidself._logged_reference_outputs:Optional[dict]=Noneself.inputs=inputsself.reference_outputs=reference_outputsifpytest_pluginandpytest_nodeid:pytest_plugin.add_process_to_test_suite(test_suite._dataset.name,pytest_nodeid)ifinputs:self.log_inputs(inputs)ifreference_outputs:self.log_reference_outputs(reference_outputs)defsync_example(self,*,inputs:Optional[dict]=None,outputs:Optional[dict]=None)->None:self.test_suite.sync_example(self.example_id,inputs=inputs,outputs=outputs,pytest_plugin=self.pytest_plugin,pytest_nodeid=self.pytest_nodeid,)defsubmit_feedback(self,*args,**kwargs:Any):self.test_suite._submit_feedback(*args,**{**kwargs,**dict(pytest_plugin=self.pytest_plugin,pytest_nodeid=self.pytest_nodeid,),},)deflog_inputs(self,inputs:dict)->None:ifself.pytest_pluginandself.pytest_nodeid:self.pytest_plugin.update_process_status(self.pytest_nodeid,{"inputs":inputs})deflog_outputs(self,outputs:dict)->None:ifself.pytest_pluginandself.pytest_nodeid:self.pytest_plugin.update_process_status(self.pytest_nodeid,{"outputs":outputs})deflog_reference_outputs(self,reference_outputs:dict)->None:self._logged_reference_outputs=reference_outputsifself.pytest_pluginandself.pytest_nodeid:self.pytest_plugin.update_process_status(self.pytest_nodeid,{"reference_outputs":reference_outputs})defsubmit_test_result(self,error:Optional[str]=None,skipped:bool=False,)->None:returnself.test_suite.submit_result(self.run_id,error=error,skipped=skipped,pytest_plugin=self.pytest_plugin,pytest_nodeid=self.pytest_nodeid,)defstart_time(self)->None:ifself.pytest_pluginandself.pytest_nodeid:self.pytest_plugin.update_process_status(self.pytest_nodeid,{"start_time":time.time()})defend_time(self)->None:ifself.pytest_pluginandself.pytest_nodeid:self.pytest_plugin.update_process_status(self.pytest_nodeid,{"end_time":time.time()})defend_run(self,run_tree,outputs:Any)->None:ifnot(outputsisNoneorisinstance(outputs,dict)):outputs={"output":outputs}self.test_suite.end_run(run_tree,self.example_id,outputs,reference_outputs=self._logged_reference_outputs,pytest_plugin=self.pytest_plugin,pytest_nodeid=self.pytest_nodeid,)_TEST_CASE=contextvars.ContextVar[Optional[_TestCase]]("_TEST_CASE",default=None)class_UTExtra(TypedDict,total=False):client:Optional[ls_client.Client]id:Optional[uuid.UUID]output_keys:Optional[Sequence[str]]test_suite_name:Optional[str]cache:Optional[str]def_get_test_repr(func:Callable,sig:inspect.Signature)->str:name=getattr(func,"__name__",None)or""description=getattr(func,"__doc__",None)or""ifdescription:description=f" - {description.strip()}"returnf"{name}{sig}{description}"def_create_test_case(func:Callable,*args:Any,pytest_request:Any,langtest_extra:_UTExtra,**kwargs:Any,)->_TestCase:client=langtest_extra["client"]orrt.get_cached_client()output_keys=langtest_extra["output_keys"]signature=inspect.signature(func)inputs=rh._get_inputs_safe(signature,*args,**kwargs)orNoneoutputs=Noneifoutput_keys:outputs={}ifnotinputs:msg=("'output_keys' should only be specified when marked test function has ""input arguments.")raiseValueError(msg)forkinoutput_keys:outputs[k]=inputs.pop(k,None)test_suite=_LangSmithTestSuite.from_test(client,func,langtest_extra.get("test_suite_name"))example_id,example_name=_get_example_id(func,inputs,test_suite.id)example_id=langtest_extra["id"]orexample_idpytest_plugin=(pytest_request.config.pluginmanager.get_plugin("langsmith_output_plugin")ifpytest_requestelseNone)pytest_nodeid=pytest_request.node.nodeidifpytest_requestelseNoneifpytest_plugin:pytest_plugin.test_suite_urls[test_suite._dataset.name]=(cast(str,test_suite._dataset.url)+"/compare?selectedSessions="+str(test_suite.experiment_id))test_case=_TestCase(test_suite,example_id,run_id=uuid.uuid4(),inputs=inputs,reference_outputs=outputs,pytest_plugin=pytest_plugin,pytest_nodeid=pytest_nodeid,)returntest_casedef_run_test(func:Callable,*test_args:Any,pytest_request:Any,langtest_extra:_UTExtra,**test_kwargs:Any,)->None:test_case=_create_test_case(func,*test_args,**test_kwargs,pytest_request=pytest_request,langtest_extra=langtest_extra,)_TEST_CASE.set(test_case)def_test():test_case.start_time()withrh.trace(name=getattr(func,"__name__","Test"),run_id=test_case.run_id,reference_example_id=test_case.example_id,inputs=test_case.inputs,project_name=test_case.test_suite.name,exceptions_to_handle=(SkipException,),_end_on_exit=False,)asrun_tree:try:result=func(*test_args,**test_kwargs)exceptSkipExceptionase:test_case.submit_test_result(error=repr(e),skipped=True)test_case.end_run(run_tree,{"skipped_reason":repr(e)})raiseeexceptBaseExceptionase:test_case.submit_test_result(error=repr(e))test_case.end_run(run_tree,None)raiseeelse:test_case.end_run(run_tree,result)finally:test_case.end_time()try:test_case.submit_test_result()exceptBaseExceptionase:logger.warning(f"Failed to create feedback for run_id {test_case.run_id}:\n{e}")iflangtest_extra["cache"]:cache_path=Path(langtest_extra["cache"])/f"{test_case.test_suite.id}.yaml"else:cache_path=Nonecurrent_context=rh.get_tracing_context()metadata={**(current_context["metadata"]or{}),**{"experiment":test_case.test_suite.experiment.name,"reference_example_id":str(test_case.example_id),},}withrh.tracing_context(**{**current_context,"metadata":metadata}),ls_utils.with_optional_cache(cache_path,ignore_hosts=[test_case.test_suite.client.api_url]):_test()asyncdef_arun_test(func:Callable,*test_args:Any,pytest_request:Any,langtest_extra:_UTExtra,**test_kwargs:Any,)->None:test_case=_create_test_case(func,*test_args,**test_kwargs,pytest_request=pytest_request,langtest_extra=langtest_extra,)_TEST_CASE.set(test_case)asyncdef_test():test_case.start_time()withrh.trace(name=getattr(func,"__name__","Test"),run_id=test_case.run_id,reference_example_id=test_case.example_id,inputs=test_case.inputs,project_name=test_case.test_suite.name,exceptions_to_handle=(SkipException,),_end_on_exit=False,)asrun_tree:try:result=awaitfunc(*test_args,**test_kwargs)exceptSkipExceptionase:test_case.submit_test_result(error=repr(e),skipped=True)test_case.end_run(run_tree,{"skipped_reason":repr(e)})raiseeexceptBaseExceptionase:test_case.submit_test_result(error=repr(e))test_case.end_run(run_tree,None)raiseeelse:test_case.end_run(run_tree,result)finally:test_case.end_time()try:test_case.submit_test_result()exceptBaseExceptionase:logger.warning(f"Failed to create feedback for run_id {test_case.run_id}:\n{e}")iflangtest_extra["cache"]:cache_path=Path(langtest_extra["cache"])/f"{test_case.test_suite.id}.yaml"else:cache_path=Nonecurrent_context=rh.get_tracing_context()metadata={**(current_context["metadata"]or{}),**{"experiment":test_case.test_suite.experiment.name,"reference_example_id":str(test_case.example_id),},}withrh.tracing_context(**{**current_context,"metadata":metadata}),ls_utils.with_optional_cache(cache_path,ignore_hosts=[test_case.test_suite.client.api_url]):await_test()# For backwards compatibilityunit=test
[docs]deflog_inputs(inputs:dict,/)->None:"""Log run inputs from within a pytest test run. .. warning:: This API is in beta and might change in future versions. Should only be used in pytest tests decorated with @pytest.mark.langsmith. Args: inputs: Inputs to log. Example: .. code-block:: python from langsmith import testing as t @pytest.mark.langsmith def test_foo() -> None: x = 0 y = 1 t.log_inputs({"x": x, "y": y}) assert foo(x, y) == 2 """ifls_utils.test_tracking_is_disabled():logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping log_inputs.")returnrun_tree=rh.get_current_run_tree()test_case=_TEST_CASE.get()ifnotrun_treeornottest_case:msg=("log_inputs should only be called within a pytest test decorated with ""@pytest.mark.langsmith, and with tracing enabled (by setting the ""LANGSMITH_TRACING environment variable to 'true').")raiseValueError(msg)run_tree.add_inputs(inputs)test_case.log_inputs(inputs)
[docs]deflog_outputs(outputs:dict,/)->None:"""Log run outputs from within a pytest test run. .. warning:: This API is in beta and might change in future versions. Should only be used in pytest tests decorated with @pytest.mark.langsmith. Args: outputs: Outputs to log. Example: .. code-block:: python from langsmith import testing as t @pytest.mark.langsmith def test_foo() -> None: x = 0 y = 1 result = foo(x, y) t.log_outputs({"foo": result}) assert result == 2 """ifls_utils.test_tracking_is_disabled():logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping log_outputs.")returnrun_tree=rh.get_current_run_tree()test_case=_TEST_CASE.get()ifnotrun_treeornottest_case:msg=("log_outputs should only be called within a pytest test decorated with ""@pytest.mark.langsmith, and with tracing enabled (by setting the ""LANGSMITH_TRACING environment variable to 'true').")raiseValueError(msg)outputs=_dumpd(outputs)run_tree.add_outputs(outputs)test_case.log_outputs(outputs)
[docs]deflog_reference_outputs(reference_outputs:dict,/)->None:"""Log example reference outputs from within a pytest test run. .. warning:: This API is in beta and might change in future versions. Should only be used in pytest tests decorated with @pytest.mark.langsmith. Args: outputs: Reference outputs to log. Example: .. code-block:: python from langsmith import testing @pytest.mark.langsmith def test_foo() -> None: x = 0 y = 1 expected = 2 testing.log_reference_outputs({"foo": expected}) assert foo(x, y) == expected """ifls_utils.test_tracking_is_disabled():logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping log_reference_outputs.")returntest_case=_TEST_CASE.get()ifnottest_case:msg=("log_reference_outputs should only be called within a pytest test ""decorated with @pytest.mark.langsmith.")raiseValueError(msg)test_case.log_reference_outputs(reference_outputs)
[docs]deflog_feedback(feedback:Optional[Union[dict,list[dict]]]=None,/,*,key:str,score:Optional[Union[int,bool,float]]=None,value:Optional[Union[str,int,float,bool]]=None,**kwargs:Any,)->None:"""Log run feedback from within a pytest test run. .. warning:: This API is in beta and might change in future versions. Should only be used in pytest tests decorated with @pytest.mark.langsmith. Args: key: Feedback name. score: Numerical feedback value. value: Categorical feedback value kwargs: Any other Client.create_feedback args. Example: .. code-block:: python import pytest from langsmith import testing as t @pytest.mark.langsmith def test_foo() -> None: x = 0 y = 1 expected = 2 result = foo(x, y) t.log_feedback(key="right_type", score=isinstance(result, int)) assert result == expected """ifls_utils.test_tracking_is_disabled():logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping log_feedback.")returniffeedbackandany((key,score,value)):msg="Must specify one of 'feedback' and ('key', 'score', 'value'), not both."raiseValueError(msg)elifnot(feedbackorkey):msg="Must specify at least one of 'feedback' or ('key', 'score', value')."raiseValueError(msg)elifkey:feedback={"key":key}ifscoreisnotNone:feedback["score"]=scoreifvalueisnotNone:feedback["value"]=valueelse:passrun_tree=rh.get_current_run_tree()test_case=_TEST_CASE.get()ifnotrun_treeornottest_case:msg=("log_feedback should only be called within a pytest test decorated with ""@pytest.mark.langsmith, and with tracing enabled (by setting the ""LANGSMITH_TRACING environment variable to 'true').")raiseValueError(msg)ifrun_tree.session_name=="evaluators"andrun_tree.metadata.get("reference_run_id"):run_id=run_tree.metadata["reference_run_id"]run_tree.add_outputs(feedbackifisinstance(feedback,dict)else{"feedback":feedback})kwargs["source_run_id"]=run_tree.idelse:run_id=run_tree.trace_idtest_case.submit_feedback(run_id,cast(Union[list,dict],feedback),**kwargs)
[docs]@contextlib.contextmanagerdeftrace_feedback(*,name:str="Feedback")->Generator[Optional[run_trees.RunTree],None,None]:"""Trace the computation of a pytest run feedback as its own run. .. warning:: This API is in beta and might change in future versions. Args: name: Feedback run name. Defaults to "Feedback". Example: .. code-block:: python import openai import pytest from langsmith import testing as t from langsmith import wrappers oai_client = wrappers.wrap_openai(openai.Client()) @pytest.mark.langsmith def test_openai_says_hello(): # Traced code will be included in the test case text = "Say hello!" response = oai_client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": text}, ], ) t.log_inputs({"text": text}) t.log_outputs({"response": response.choices[0].message.content}) t.log_reference_outputs({"response": "hello!"}) # Use this context manager to trace any steps used for generating evaluation # feedback separately from the main application logic with t.trace_feedback(): grade = oai_client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": "Return 1 if 'hello' is in the user message and 0 otherwise.", }, { "role": "user", "content": response.choices[0].message.content, }, ], ) # Make sure to log relevant feedback within the context for the # trace to be associated with this feedback. t.log_feedback( key="llm_judge", score=float(grade.choices[0].message.content) ) assert "hello" in response.choices[0].message.content.lower() """# noqa: E501ifls_utils.test_tracking_is_disabled():logger.info("LANGSMITH_TEST_TRACKING is set to 'false'."" Skipping log_feedback.")yieldNonereturnparent_run=rh.get_current_run_tree()test_case=_TEST_CASE.get()ifnotparent_runornottest_case:msg=("trace_feedback should only be called within a pytest test decorated with ""@pytest.mark.langsmith, and with tracing enabled (by setting the ""LANGSMITH_TRACING environment variable to 'true').")raiseValueError(msg)metadata={"experiment":test_case.test_suite.experiment.name,"reference_example_id":test_case.example_id,"reference_run_id":parent_run.id,}withrh.trace(name=name,inputs=parent_run.outputs,parent="ignore",project_name="evaluators",metadata=metadata,)asrun_tree:yieldrun_tree