Client#
- class langsmith.client.Client(api_url: str | None = None, *, api_key: str | None = None, retry_config: Retry | None = None, timeout_ms: int | Tuple[int, int] | None = None, web_url: str | None = None, session: Session | None = None, auto_batch_tracing: bool = True, anonymizer: Callable[[dict], dict] | None = None, hide_inputs: Callable[[dict], dict] | bool | None = None, hide_outputs: Callable[[dict], dict] | bool | None = None, info: dict | LangSmithInfo | None = None, api_urls: Dict[str, str] | None = None)[source]#
Client for interacting with the LangSmith API.
Initialize a Client instance.
- Parameters:
api_url (str or None, default=None) – URL for the LangSmith API. Defaults to the LANGCHAIN_ENDPOINT environment variable or https://api.smith.langchain.com if not set.
api_key (str or None, default=None) – API key for the LangSmith API. Defaults to the LANGCHAIN_API_KEY environment variable.
retry_config (Retry or None, default=None) – Retry configuration for the HTTPAdapter.
timeout_ms (int, tuple[int, int], or None, default=None) – Timeout for the HTTPAdapter. Can also be a 2-tuple of (connect timeout, read timeout) to set them separately.
web_url (str or None, default=None) – URL for the LangSmith web app. Default is auto-inferred from the ENDPOINT.
session (requests.Session or None, default=None) – The session to use for requests. If None, a new session will be created.
anonymizer (Optional[Callable[[dict], dict]]) – A function applied for masking serialized run inputs and outputs, before sending to the API.
hide_inputs (Whether to hide run inputs when tracing with this client.) – If True, hides the entire inputs. If a function, applied to all run inputs when creating runs.
hide_outputs (Whether to hide run outputs when tracing with this client.) – If True, hides the entire outputs. If a function, applied to all run outputs when creating runs.
info (Optional[ls_schemas.LangSmithInfo]) – The information about the LangSmith API. If not provided, it will be fetched from the API.
api_urls (Optional[Dict[str, str]]) – A dictionary of write API URLs and their corresponding API keys. Useful for multi-tenant setups. Data is only read from the first URL in the dictionary. However, ONLY Runs are written (POST and PATCH) to all URLs in the dictionary. Feedback, sessions, datasets, examples, annotation queues and evaluation results are only written to the first.
Raises –
------ –
LangSmithUserError – If the API key is not provided when using the hosted service. If both api_url and api_urls are provided.
auto_batch_tracing (bool) –
Attributes
api_url
api_key
retry_config
timeout_ms
session
tracing_sample_rate
tracing_queue
info
Get the information about the LangSmith API.
Methods
__init__
([api_url, api_key, retry_config, ...])Initialize a Client instance.
add_runs_to_annotation_queue
(queue_id, *, ...)Add runs to an annotation queue with the specified queue ID.
aevaluate
(target, /[, data, evaluators, ...])Evaluate an async target system on a given dataset.
aevaluate_run
(run, evaluator, *[, ...])Evaluate a run asynchronously.
arun_on_dataset
(dataset_name, ...[, ...])Asynchronously run the Chain or language model on a dataset.
batch_ingest_runs
([create, update, pre_sampled])Batch ingest/upsert multiple runs in the Langsmith system.
cleanup
()Manually trigger cleanup of the background thread.
clone_public_dataset
(token_or_url, *[, ...])Clone a public dataset to your own langsmith tenant.
create_annotation_queue
(*, name[, ...])Create an annotation queue on the LangSmith API.
create_chat_example
(messages[, generations, ...])Add an example (row) to a Chat-type dataset.
create_commit
(prompt_identifier, object, *)Create a commit for an existing prompt.
create_comparative_experiment
(name, ...[, ...])Create a comparative experiment on the LangSmith API.
create_dataset
(dataset_name, *[, ...])Create a dataset in the LangSmith API.
create_example
(inputs[, dataset_id, ...])Create a dataset example in the LangSmith API.
create_example_from_run
(run[, dataset_id, ...])Add an example (row) to a dataset from a run.
create_examples
(*, inputs[, outputs, ...])Create examples in a dataset.
create_feedback
(run_id, key, *[, score, ...])Create a feedback in the LangSmith API.
create_feedback_from_token
(token_or_url[, ...])Create feedback from a presigned token or URL.
create_llm_example
(prompt[, generation, ...])Add an example (row) to an LLM-type dataset.
create_presigned_feedback_token
(run_id, ...)Create a pre-signed URL to send feedback data to.
create_presigned_feedback_tokens
(run_id, ...)Create a pre-signed URL to send feedback data to.
create_project
(project_name, *[, ...])Create a project on the LangSmith API.
create_prompt
(prompt_identifier, *[, ...])Create a new prompt.
create_run
(name, inputs, run_type, *[, ...])Persist a run to the LangSmith API.
delete_annotation_queue
(queue_id)Delete an annotation queue with the specified queue ID.
delete_dataset
(*[, dataset_id, dataset_name])Delete a dataset from the LangSmith API.
delete_example
(example_id)Delete an example by ID.
delete_feedback
(feedback_id)Delete a feedback by ID.
delete_project
(*[, project_name, project_id])Delete a project from LangSmith.
delete_prompt
(prompt_identifier)Delete a prompt.
delete_run_from_annotation_queue
(queue_id, ...)Delete a run from an annotation queue with the specified queue ID and run ID.
diff_dataset_versions
([dataset_id, dataset_name])Get the difference between two versions of a dataset.
evaluate
(target, /[, data, evaluators, ...])Evaluate a target system on a given dataset.
evaluate_run
(run, evaluator, *[, ...])Evaluate a run.
get_prompt
(prompt_identifier)Get a specific prompt by its identifier.
get_run_from_annotation_queue
(queue_id, *, index)Get a run from an annotation queue at the specified index.
get_run_stats
(*[, id, trace, parent_run, ...])Get aggregate statistics over queried runs.
get_run_url
(*, run[, project_name, project_id])Get the URL for a run.
get_test_results
(*[, project_id, project_name])Read the record-level information from an experiment into a Pandas DF.
has_dataset
(*[, dataset_name, dataset_id])Check whether a dataset exists in your tenant.
has_project
(project_name, *[, project_id])Check if a project exists.
index_dataset
(*, dataset_id[, tag])Enable dataset indexing.
like_prompt
(prompt_identifier)Like a prompt.
list_annotation_queues
(*[, queue_ids, name, ...])List the annotation queues on the LangSmith API.
list_dataset_splits
(*[, dataset_id, ...])Get the splits for a dataset.
list_dataset_versions
(*[, dataset_id, ...])List dataset versions.
list_datasets
(*[, dataset_ids, data_type, ...])List the datasets on the LangSmith API.
list_examples
([dataset_id, dataset_name, ...])Retrieve the example rows of the specified dataset.
list_feedback
(*[, run_ids, feedback_key, ...])List the feedback objects on the LangSmith API.
list_presigned_feedback_tokens
(run_id, *[, ...])List the feedback ingest tokens for a run.
list_projects
([project_ids, name, ...])List projects from the LangSmith API.
list_prompt_commits
(prompt_identifier, *[, ...])List commits for a given prompt.
list_prompts
(*[, limit, offset, is_public, ...])List prompts with pagination.
list_runs
(*[, project_id, project_name, ...])List runs from the LangSmith API.
list_shared_examples
(share_token, *[, ...])Get shared examples.
list_shared_projects
(*, dataset_share_token)List shared projects.
list_shared_runs
(share_token[, run_ids])Get shared runs.
multipart_ingest
([create, update, pre_sampled])Batch ingest/upsert multiple runs in the Langsmith system.
pull_prompt
(prompt_identifier, *[, ...])Pull a prompt and return it as a LangChain PromptTemplate.
pull_prompt_commit
(prompt_identifier, *[, ...])Pull a prompt object from the LangSmith API.
push_prompt
(prompt_identifier, *[, object, ...])Push a prompt to the LangSmith API.
read_annotation_queue
(queue_id)Read an annotation queue with the specified queue ID.
read_dataset
(*[, dataset_name, dataset_id])Read a dataset from the LangSmith API.
read_dataset_openai_finetuning
([dataset_id, ...])Download a dataset in OpenAI Jsonl format and load it as a list of dicts.
read_dataset_shared_schema
([dataset_id, ...])Retrieve the shared schema of a dataset.
read_dataset_version
(*[, dataset_id, ...])Get dataset version by as_of or exact tag.
read_example
(example_id, *[, as_of])Read an example from the LangSmith API.
read_feedback
(feedback_id)Read a feedback from the LangSmith API.
read_project
(*[, project_id, project_name, ...])Read a project from the LangSmith API.
read_run
(run_id[, load_child_runs])Read a run from the LangSmith API.
read_run_shared_link
(run_id)Retrieve the shared link for a specific run.
read_shared_dataset
(share_token)Get shared datasets.
read_shared_run
(share_token[, run_id])Get shared runs.
request_with_retries
(method, pathname, *[, ...])Send a request with retries.
run_is_shared
(run_id)Get share state for a run.
run_on_dataset
(dataset_name, ...[, ...])Run the Chain or language model on a dataset.
share_dataset
([dataset_id, dataset_name])Get a share link for a dataset.
share_run
(run_id, *[, share_id])Get a share link for a run.
similar_examples
(inputs, /, *, limit, dataset_id)Retrieve the dataset examples whose inputs best match the current inputs.
unlike_prompt
(prompt_identifier)Unlike a prompt.
unshare_dataset
(dataset_id)Delete share link for a dataset.
unshare_run
(run_id)Delete share link for a run.
update_annotation_queue
(queue_id, *, name[, ...])Update an annotation queue with the specified queue_id.
update_dataset_splits
(*[, dataset_id, ...])Update the splits for a dataset.
update_dataset_tag
(*[, dataset_id, dataset_name])Update the tags of a dataset.
update_example
(example_id, *[, inputs, ...])Update a specific example.
update_examples
(*, example_ids[, inputs, ...])Update multiple examples.
update_examples_multipart
(*, dataset_id[, ...])Upload examples.
update_feedback
(feedback_id, *[, score, ...])Update a feedback in the LangSmith API.
update_project
(project_id, *[, name, ...])Update a LangSmith project.
update_prompt
(prompt_identifier, *[, ...])Update a prompt's metadata.
update_run
(run_id, *[, name, end_time, ...])Update a run in the LangSmith API.
upload_csv
(csv_file, input_keys, output_keys, *)Upload a CSV file to the LangSmith API.
upload_dataframe
(df, name, input_keys, ...)Upload a dataframe as individual examples to the LangSmith API.
upload_examples_multipart
(*, dataset_id[, ...])Upload examples.
upsert_examples_multipart
(*[, upserts])Upsert examples.
- __init__(api_url: str | None = None, *, api_key: str | None = None, retry_config: Retry | None = None, timeout_ms: int | Tuple[int, int] | None = None, web_url: str | None = None, session: Session | None = None, auto_batch_tracing: bool = True, anonymizer: Callable[[dict], dict] | None = None, hide_inputs: Callable[[dict], dict] | bool | None = None, hide_outputs: Callable[[dict], dict] | bool | None = None, info: dict | LangSmithInfo | None = None, api_urls: Dict[str, str] | None = None) None [source]#
Initialize a Client instance.
- Parameters:
api_url (str or None, default=None) – URL for the LangSmith API. Defaults to the LANGCHAIN_ENDPOINT environment variable or https://api.smith.langchain.com if not set.
api_key (str or None, default=None) – API key for the LangSmith API. Defaults to the LANGCHAIN_API_KEY environment variable.
retry_config (Retry or None, default=None) – Retry configuration for the HTTPAdapter.
timeout_ms (int, tuple[int, int], or None, default=None) – Timeout for the HTTPAdapter. Can also be a 2-tuple of (connect timeout, read timeout) to set them separately.
web_url (str or None, default=None) – URL for the LangSmith web app. Default is auto-inferred from the ENDPOINT.
session (requests.Session or None, default=None) – The session to use for requests. If None, a new session will be created.
anonymizer (Optional[Callable[[dict], dict]]) – A function applied for masking serialized run inputs and outputs, before sending to the API.
hide_inputs (Whether to hide run inputs when tracing with this client.) – If True, hides the entire inputs. If a function, applied to all run inputs when creating runs.
hide_outputs (Whether to hide run outputs when tracing with this client.) – If True, hides the entire outputs. If a function, applied to all run outputs when creating runs.
info (Optional[ls_schemas.LangSmithInfo]) – The information about the LangSmith API. If not provided, it will be fetched from the API.
api_urls (Optional[Dict[str, str]]) – A dictionary of write API URLs and their corresponding API keys. Useful for multi-tenant setups. Data is only read from the first URL in the dictionary. However, ONLY Runs are written (POST and PATCH) to all URLs in the dictionary. Feedback, sessions, datasets, examples, annotation queues and evaluation results are only written to the first.
Raises –
------ –
LangSmithUserError – If the API key is not provided when using the hosted service. If both api_url and api_urls are provided.
auto_batch_tracing (bool) –
- Return type:
None
- add_runs_to_annotation_queue(queue_id: UUID | str, *, run_ids: List[UUID | str]) None [source]#
Add runs to an annotation queue with the specified queue ID.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue.
run_ids (List[ID_TYPE]) – The IDs of the runs to be added to the annotation queue.
- Return type:
None
- async aevaluate(target: ATARGET_T | AsyncIterable[dict] | Runnable | str | uuid.UUID | schemas.TracerSession, /, data: DATA_T | AsyncIterable[schemas.Example] | Iterable[schemas.Example] | None = None, evaluators: Sequence[EVALUATOR_T | AEVALUATOR_T] | None = None, summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None, metadata: dict | None = None, experiment_prefix: str | None = None, description: str | None = None, max_concurrency: int | None = 0, num_repetitions: int = 1, blocking: bool = True, experiment: schemas.TracerSession | str | uuid.UUID | None = None, upload_results: bool = True, **kwargs: Any) AsyncExperimentResults [source]#
Evaluate an async target system on a given dataset.
- Parameters:
target (AsyncCallable[[dict], dict] | AsyncIterable[dict] | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T]) – The target system or experiment(s) to evaluate. Can be an async function that takes a dict and returns a dict, a langchain Runnable, an existing experiment ID, or a two-tuple of experiment IDs.
data (Union[DATA_T, AsyncIterable[schemas.Example]]) – The dataset to evaluate on. Can be a dataset name, a list of examples, an async generator of examples, or an async iterable of examples.
evaluators (Optional[Sequence[EVALUATOR_T]]) – A list of evaluators to run on each example. Defaults to None.
summary_evaluators (Optional[Sequence[SUMMARY_EVALUATOR_T]]) – A list of summary evaluators to run on the entire dataset. Defaults to None.
metadata (Optional[dict]) – Metadata to attach to the experiment. Defaults to None.
experiment_prefix (Optional[str]) – A prefix to provide for your experiment name. Defaults to None.
description (Optional[str]) – A description of the experiment.
max_concurrency (int | None) – The maximum number of concurrent evaluations to run. If None then no limit is set. If 0 then no concurrency. Defaults to 0.
num_repetitions (int) – The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.
blocking (bool) – Whether to block until the evaluation is complete. Defaults to True.
experiment (Optional[schemas.TracerSession]) – An existing experiment to extend. If provided, experiment_prefix is ignored. For advanced usage only.
load_nested – Whether to load all child runs for the experiment. Default is to only load the top-level root runs. Should only be specified when evaluating an existing experiment.
upload_results (bool) –
kwargs (Any) –
- Returns:
An async iterator over the experiment results.
- Return type:
AsyncIterator[ExperimentResultRow]
- Environment:
- LANGSMITH_TEST_CACHE: If set, API calls will be cached to disk to save time and
cost during testing. Recommended to commit the cache files to your repository for faster CI/CD runs. Requires the ‘langsmith[vcr]’ package to be installed.
Examples
>>> import asyncio >>> from langsmith import Client >>> client = Client() >>> dataset = client.clone_public_dataset( ... "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d" ... ) >>> dataset_name = "Evaluate Examples"
Basic usage:
>>> def accuracy(outputs: dict, reference_outputs: dict) -> dict: ... # Row-level evaluator for accuracy. ... pred = outputs["resposen"] ... expected = reference_outputs["answer"] ... return {"score": expected.lower() == pred.lower()}
>>> def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict: ... # Experiment-level evaluator for precision. ... # TP / (TP + FP) ... predictions = [out["response"].lower() for out in outputs] ... expected = [ref["answer"].lower() for ref in reference_outputs] ... # yes and no are the only possible answers ... tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"]) ... fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)]) ... return {"score": tp / (tp + fp)}
>>> async def apredict(inputs: dict) -> dict: ... # This can be any async function or just an API call to your app. ... await asyncio.sleep(0.1) ... return {"response": "Yes"} >>> results = asyncio.run( ... client.aevaluate( ... apredict, ... data=dataset_name, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Experiment", ... description="Evaluate the accuracy of the model asynchronously.", ... metadata={ ... "my-prompt-version": "abcd-1234", ... }, ... ) ... ) View the evaluation results for experiment:...
Evaluating over only a subset of the examples using an async generator:
>>> async def example_generator(): ... examples = client.list_examples(dataset_name=dataset_name, limit=5) ... for example in examples: ... yield example >>> results = asyncio.run( ... client.aevaluate( ... apredict, ... data=example_generator(), ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Subset Experiment", ... description="Evaluate a subset of examples asynchronously.", ... ) ... ) View the evaluation results for experiment:...
Streaming each prediction to more easily + eagerly debug.
>>> results = asyncio.run( ... client.aevaluate( ... apredict, ... data=dataset_name, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Streaming Experiment", ... description="Streaming predictions for debugging.", ... blocking=False, ... ) ... ) View the evaluation results for experiment:...
>>> async def aenumerate(iterable): ... async for elem in iterable: ... print(elem) >>> asyncio.run(aenumerate(results))
Running without concurrency:
>>> results = asyncio.run( ... client.aevaluate( ... apredict, ... data=dataset_name, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Experiment Without Concurrency", ... description="This was run without concurrency.", ... max_concurrency=0, ... ) ... ) View the evaluation results for experiment:...
Using Async evaluators:
>>> async def helpfulness(outputs: dict) -> dict: ... # Row-level evaluator for helpfulness. ... await asyncio.sleep(5) # Replace with your LLM API call ... return {"score": outputs["output"] == "Yes"}
>>> results = asyncio.run( ... client.aevaluate( ... apredict, ... data=dataset_name, ... evaluators=[helpfulness], ... summary_evaluators=[precision], ... experiment_prefix="My Helpful Experiment", ... description="Applying async evaluators example.", ... ) ... ) View the evaluation results for experiment:...
New in version 0.2.0.
- async aevaluate_run(run: ls_schemas.Run | str | uuid.UUID, evaluator: ls_evaluator.RunEvaluator, *, source_info: Dict[str, Any] | None = None, reference_example: ls_schemas.Example | str | dict | uuid.UUID | None = None, load_child_runs: bool = False) ls_evaluator.EvaluationResult [source]#
Evaluate a run asynchronously.
- Parameters:
run (Run or str or UUID) – The run to evaluate.
evaluator (RunEvaluator) – The evaluator to use.
source_info (Dict[str, Any] or None, default=None) – Additional information about the source of the evaluation to log as feedback metadata.
reference_example (Optional Example or UUID, default=None) – The example to use as a reference for the evaluation. If not provided, the run’s reference example will be used.
load_child_runs (bool, default=False) – Whether to load child runs when resolving the run ID.
Returns –
------- –
EvaluationResult – The evaluation result object created by the evaluation.
- Return type:
ls_evaluator.EvaluationResult
- async arun_on_dataset(dataset_name: str, llm_or_chain_factory: Any, *, evaluation: Any | None = None, concurrency_level: int = 5, project_name: str | None = None, project_metadata: Dict[str, Any] | None = None, dataset_version: datetime | str | None = None, verbose: bool = False, input_mapper: Callable[[Dict], Any] | None = None, revision_id: str | None = None, **kwargs: Any) Dict[str, Any] [source]#
Asynchronously run the Chain or language model on a dataset.
Deprecated since version 0.1.0: This method is deprecated. Use
langsmith.aevaluate()
instead.- Parameters:
dataset_name (str) –
llm_or_chain_factory (Any) –
evaluation (Any | None) –
concurrency_level (int) –
project_name (str | None) –
project_metadata (Dict[str, Any] | None) –
dataset_version (datetime | str | None) –
verbose (bool) –
input_mapper (Callable[[Dict], Any] | None) –
revision_id (str | None) –
kwargs (Any) –
- Return type:
Dict[str, Any]
- batch_ingest_runs(create: Sequence[Run | RunLikeDict | Dict] | None = None, update: Sequence[Run | RunLikeDict | Dict] | None = None, *, pre_sampled: bool = False) None [source]#
Batch ingest/upsert multiple runs in the Langsmith system.
- Parameters:
create (Optional[Sequence[Union[ls_schemas.Run, RunLikeDict]]]) – A sequence of Run objects or equivalent dictionaries representing runs to be created / posted.
update (Optional[Sequence[Union[ls_schemas.Run, RunLikeDict]]]) – A sequence of Run objects or equivalent dictionaries representing runs that have already been created and should be updated / patched.
pre_sampled (bool, optional) – Whether the runs have already been subject to sampling, and therefore should not be sampled again. Defaults to False.
- Returns:
None
- Raises:
LangsmithAPIError – If there is an error in the API request.
- Return type:
None
Note
- The run objects MUST contain the dotted_order and trace_id fields
to be accepted by the API.
- clone_public_dataset(token_or_url: str, *, source_api_url: str | None = None, dataset_name: str | None = None) Dataset [source]#
Clone a public dataset to your own langsmith tenant.
This operation is idempotent. If you already have a dataset with the given name, this function will do nothing.
- Parameters:
token_or_url (str) – The token of the public dataset to clone.
source_api_url (str | None) – The URL of the langsmith server where the data is hosted. Defaults to the API URL of your current client.
dataset_name (str) – The name of the dataset to create in your tenant. Defaults to the name of the public dataset.
- Return type:
- create_annotation_queue(*, name: str, description: str | None = None, queue_id: UUID | str | None = None) AnnotationQueue [source]#
Create an annotation queue on the LangSmith API.
- Parameters:
name (str) – str The name of the annotation queue.
description (str | None) – str, optional The description of the annotation queue.
queue_id (UUID | str | None) – str or UUID, optional The ID of the annotation queue.
- Returns:
- AnnotationQueue
The created annotation queue object.
- Return type:
- create_chat_example(messages: List[Mapping[str, Any] | BaseMessageLike], generations: Mapping[str, Any] | BaseMessageLike | None = None, dataset_id: UUID | str | None = None, dataset_name: str | None = None, created_at: datetime | None = None) Example [source]#
Add an example (row) to a Chat-type dataset.
- Parameters:
messages (List[Mapping[str, Any] | BaseMessageLike]) –
generations (Mapping[str, Any] | BaseMessageLike | None) –
dataset_id (UUID | str | None) –
dataset_name (str | None) –
created_at (datetime | None) –
- Return type:
- create_commit(prompt_identifier: str, object: Any, *, parent_commit_hash: str | None = None) str [source]#
Create a commit for an existing prompt.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
object (Any) – The LangChain object to commit.
parent_commit_hash (Optional[str]) – The hash of the parent commit. Defaults to latest commit.
- Returns:
The url of the prompt commit.
- Return type:
str
- Raises:
HTTPError – If the server request fails.
ValueError – If the prompt does not exist.
- create_comparative_experiment(name: str, experiments: Sequence[UUID | str], *, reference_dataset: UUID | str | None = None, description: str | None = None, created_at: datetime | None = None, metadata: Dict[str, Any] | None = None, id: UUID | str | None = None) ComparativeExperiment [source]#
Create a comparative experiment on the LangSmith API.
These experiments compare 2 or more experiment results over a shared dataset.
- Parameters:
name (str) – The name of the comparative experiment.
experiments (Sequence[UUID | str]) – The IDs of the experiments to compare.
reference_dataset (UUID | str | None) – The ID of the dataset these experiments are compared on.
description (str | None) – The description of the comparative experiment.
created_at (datetime | None) – The creation time of the comparative experiment.
metadata (Dict[str, Any] | None) – Additional metadata for the comparative experiment.
id (UUID | str | None) –
- Returns:
The created comparative experiment object.
- Return type:
- create_dataset(dataset_name: str, *, description: str | None = None, data_type: DataType = DataType.kv, inputs_schema: Dict[str, Any] | None = None, outputs_schema: Dict[str, Any] | None = None, transformations: List[DatasetTransformation] | None = None, metadata: dict | None = None) Dataset [source]#
Create a dataset in the LangSmith API.
- Parameters:
dataset_name (str) – The name of the dataset.
description (Optional[str], default=None) – The description of the dataset.
data_type (ls_schemas.DataType, default=ls_schemas.DataType.kv) – The data type of the dataset.
inputs_schema (Optional[Dict[str, Any]], default=None) – The schema definition for the inputs of the dataset.
outputs_schema (Optional[Dict[str, Any]], default=None) – The schema definition for the outputs of the dataset.
transformations (Optional[List[ls_schemas.DatasetTransformation]], default=None) – A list of transformations to apply to the dataset.
metadata (Optional[dict], default=None) – Additional metadata to associate with the dataset.
Returns –
------- –
ls_schemas.Dataset – The created dataset.
Raises –
------ –
requests.HTTPError – If the request to create the dataset fails.
- Return type:
- create_example(inputs: Mapping[str, Any], dataset_id: UUID | str | None = None, dataset_name: str | None = None, created_at: datetime | None = None, outputs: Mapping[str, Any] | None = None, metadata: Mapping[str, Any] | None = None, split: str | List[str] | None = None, example_id: UUID | str | None = None, source_run_id: UUID | str | None = None) Example [source]#
Create a dataset example in the LangSmith API.
Examples are rows in a dataset, containing the inputs and expected outputs (or other reference information) for a model or chain.
- Parameters:
inputs (Mapping[str, Any]) – Mapping[str, Any] The input values for the example.
dataset_id (UUID | str | None) – UUID or None, default=None The ID of the dataset to create the example in.
dataset_name (str | None) – str or None, default=None The name of the dataset to create the example in.
created_at (datetime | None) – datetime or None, default=None The creation timestamp of the example.
outputs (Mapping[str, Any] | None) – Mapping[str, Any] or None, default=None The output values for the example.
metadata (Mapping[str, Any] | None) – Mapping[str, Any] or None, default=None The metadata for the example.
split (str | List[str] | None) – str or List[str] or None, default=None The splits for the example, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’.
example_id (UUID | str | None) – UUID or None, default=None The ID of the example to create. If not provided, a new example will be created.
source_run_id (UUID | str | None) – UUID or None, default=None The ID of the source run associated with this example.
- Returns:
The created example.
- Return type:
- create_example_from_run(run: Run, dataset_id: UUID | str | None = None, dataset_name: str | None = None, created_at: datetime | None = None) Example [source]#
Add an example (row) to a dataset from a run.
- create_examples(*, inputs: Sequence[Mapping[str, Any]], outputs: Sequence[Mapping[str, Any] | None] | None = None, metadata: Sequence[Mapping[str, Any] | None] | None = None, splits: Sequence[str | List[str] | None] | None = None, source_run_ids: Sequence[UUID | str | None] | None = None, ids: Sequence[UUID | str | None] | None = None, dataset_id: UUID | str | None = None, dataset_name: str | None = None, **kwargs: Any) None [source]#
Create examples in a dataset.
- Parameters:
inputs (Sequence[Mapping[str, Any]]) – The input values for the examples.
outputs (Optional[Sequence[Optional[Mapping[str, Any]]]], default=None) – The output values for the examples.
metadata (Optional[Sequence[Optional[Mapping[str, Any]]]], default=None) – The metadata for the examples.
splits (Optional[Sequence[Optional[str | List[str]]]], default=None) – The splits for the examples, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’.
source_run_ids (Optional[Sequence[Optional[ID_TYPE]]], default=None) – The IDs of the source runs associated with the examples.
ids (Optional[Sequence[ID_TYPE]], default=None) – The IDs of the examples.
dataset_id (Optional[ID_TYPE], default=None) – The ID of the dataset to create the examples in.
dataset_name (Optional[str], default=None) – The name of the dataset to create the examples in.
kwargs (Any) –
- Return type:
None
- create_feedback(run_id: UUID | str | None, key: str, *, score: float | int | bool | None = None, value: str | dict | None = None, correction: dict | None = None, comment: str | None = None, source_info: Dict[str, Any] | None = None, feedback_source_type: FeedbackSourceType | str = FeedbackSourceType.API, source_run_id: UUID | str | None = None, feedback_id: UUID | str | None = None, feedback_config: FeedbackConfig | None = None, stop_after_attempt: int = 10, project_id: UUID | str | None = None, comparative_experiment_id: UUID | str | None = None, feedback_group_id: UUID | str | None = None, extra: Dict | None = None, trace_id: UUID | str | None = None, **kwargs: Any) Feedback [source]#
Create a feedback in the LangSmith API.
- Parameters:
run_id (str or UUID) – The ID of the run to provide feedback for. Either the run_id OR the project_id must be provided.
trace_id (Optional[ID_TYPE] = The trace ID of the run to provide feedback for. Enables batch ingestion.) – The trace ID of the run to provide feedback for. This is optional.
key (str) – The name of the metric or ‘aspect’ this feedback is about.
score (float or int or bool or None, default=None) – The score to rate this run on the metric or aspect.
value (float or int or bool or str or dict or None, default=None) – The display value or non-numeric value for this feedback.
correction (dict or None, default=None) – The proper ground truth for this run.
comment (str or None, default=None) – A comment about this feedback, such as a justification for the score or chain-of-thought trajectory for an LLM judge.
source_info (Dict[str, Any] or None, default=None) – Information about the source of this feedback.
feedback_source_type (FeedbackSourceType or str, default=FeedbackSourceType.API) –
- The type of feedback source, such as model (for model-generated feedback)
or API.
source_run_id (str or UUID or None, default=None,) – The ID of the run that generated this feedback, if a “model” type.
feedback_id (str or UUID or None, default=None) – The ID of the feedback to create. If not provided, a random UUID will be generated.
feedback_config (langsmith.schemas.FeedbackConfig or None, default=None,) – The configuration specifying how to interpret feedback with this key. Examples include continuous (with min/max bounds), categorical, or freeform.
stop_after_attempt (int, default=10) – The number of times to retry the request before giving up.
project_id (str or UUID) – The ID of the project_id to provide feedback on. One - and only one - of this and run_id must be provided.
comparative_experiment_id (str or UUID) – If this feedback was logged as a part of a comparative experiment, this associates the feedback with that experiment.
feedback_group_id (str or UUID) – When logging preferences, ranking runs, or other comparative feedback, this is used to group feedback together.
extra (dict) – Metadata for the feedback.
trace_id –
kwargs (Any) –
- Return type:
- create_feedback_from_token(token_or_url: str | UUID, score: float | int | bool | None = None, *, value: float | int | bool | str | dict | None = None, correction: dict | None = None, comment: str | None = None, metadata: dict | None = None) None [source]#
Create feedback from a presigned token or URL.
- Parameters:
token_or_url (Union[str, uuid.UUID]) – The token or URL from which to create feedback.
score (Union[float, int, bool, None], optional) – The score of the feedback. Defaults to None.
value (Union[float, int, bool, str, dict, None], optional) – The value of the feedback. Defaults to None.
correction (Union[dict, None], optional) – The correction of the feedback. Defaults to None.
comment (Union[str, None], optional) – The comment of the feedback. Defaults to None.
metadata (Optional[dict], optional) – Additional metadata for the feedback. Defaults to None.
- Raises:
ValueError – If the source API URL is invalid.
- Returns:
This method does not return anything.
- Return type:
None
- create_llm_example(prompt: str, generation: str | None = None, dataset_id: UUID | str | None = None, dataset_name: str | None = None, created_at: datetime | None = None) Example [source]#
Add an example (row) to an LLM-type dataset.
- Parameters:
prompt (str) –
generation (str | None) –
dataset_id (UUID | str | None) –
dataset_name (str | None) –
created_at (datetime | None) –
- Return type:
- create_presigned_feedback_token(run_id: UUID | str, feedback_key: str, *, expiration: datetime | timedelta | None = None, feedback_config: FeedbackConfig | None = None, feedback_id: UUID | str | None = None) FeedbackIngestToken [source]#
Create a pre-signed URL to send feedback data to.
This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.
- Parameters:
run_id (UUID | str) –
feedback_key (str) –
expiration (datetime | timedelta | None) – The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours.
feedback_config (FeedbackConfig | None) – FeedbackConfig or None. If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values.
feedback_id (UUID | str | None) – The ID of the feedback to create. If not provided, a new feedback will be created.
- Returns:
The pre-signed URL for uploading feedback data.
- Return type:
- create_presigned_feedback_tokens(run_id: UUID | str, feedback_keys: Sequence[str], *, expiration: datetime | timedelta | None = None, feedback_configs: Sequence[FeedbackConfig | None] | None = None) Sequence[FeedbackIngestToken] [source]#
Create a pre-signed URL to send feedback data to.
This is useful for giving browser-based clients a way to upload feedback data directly to LangSmith without accessing the API key.
- Parameters:
run_id (UUID | str) –
feedback_key –
expiration (datetime | timedelta | None) – The expiration time of the pre-signed URL. Either a datetime or a timedelta offset from now. Default to 3 hours.
feedback_config – FeedbackConfig or None. If creating a feedback_key for the first time, this defines how the metric should be interpreted, such as a continuous score (w/ optional bounds), or distribution over categorical values.
feedback_keys (Sequence[str]) –
feedback_configs (Sequence[FeedbackConfig | None] | None) –
- Returns:
The pre-signed URL for uploading feedback data.
- Return type:
Sequence[FeedbackIngestToken]
- create_project(project_name: str, *, description: str | None = None, metadata: dict | None = None, upsert: bool = False, project_extra: dict | None = None, reference_dataset_id: UUID | str | None = None) TracerSession [source]#
Create a project on the LangSmith API.
- Parameters:
project_name (str) – The name of the project.
project_extra (dict or None, default=None) – Additional project information.
metadata (dict or None, default=None) – Additional metadata to associate with the project.
description (str or None, default=None) – The description of the project.
upsert (bool, default=False) – Whether to update the project if it already exists.
reference_dataset_id (UUID or None, default=None) – The ID of the reference dataset to associate with the project.
Returns –
------- –
TracerSession – The created project.
- Return type:
- create_prompt(prompt_identifier: str, *, description: str | None = None, readme: str | None = None, tags: Sequence[str] | None = None, is_public: bool = False) Prompt [source]#
Create a new prompt.
Does not attach prompt object, just creates an empty prompt.
- Parameters:
prompt_name (str) – The name of the prompt.
description (Optional[str]) – A description of the prompt.
readme (Optional[str]) – A readme for the prompt.
tags (Optional[Sequence[str]]) – A list of tags for the prompt.
is_public (bool) – Whether the prompt should be public. Defaults to False.
prompt_identifier (str) –
- Returns:
The created prompt object.
- Return type:
ls_schemas.Prompt
- Raises:
ValueError – If the current tenant is not the owner.
HTTPError – If the server request fails.
- create_run(name: str, inputs: Dict[str, Any], run_type: Literal['tool', 'chain', 'llm', 'retriever', 'embedding', 'prompt', 'parser'], *, project_name: str | None = None, revision_id: str | None = None, **kwargs: Any) None [source]#
Persist a run to the LangSmith API.
- Parameters:
name (str) – The name of the run.
inputs (Dict[str, Any]) – The input values for the run.
run_type (str) – The type of the run, such as tool, chain, llm, retriever, embedding, prompt, or parser.
revision_id (ID_TYPE or None, default=None) – The revision ID of the run.
**kwargs (Any) – Additional keyword arguments.
Raises –
------ –
LangSmithUserError – If the API key is not provided when using the hosted service.
project_name (str | None) –
- Return type:
None
- delete_annotation_queue(queue_id: UUID | str) None [source]#
Delete an annotation queue with the specified queue ID.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue to delete.
- Return type:
None
- delete_dataset(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None) None [source]#
Delete a dataset from the LangSmith API.
- Parameters:
dataset_id (UUID or None, default=None) – The ID of the dataset to delete.
dataset_name (str or None, default=None) – The name of the dataset to delete.
- Return type:
None
- delete_example(example_id: UUID | str) None [source]#
Delete an example by ID.
- Parameters:
example_id (str or UUID) – The ID of the example to delete.
- Return type:
None
- delete_feedback(feedback_id: UUID | str) None [source]#
Delete a feedback by ID.
- Parameters:
feedback_id (str or UUID) – The ID of the feedback to delete.
- Return type:
None
- delete_project(*, project_name: str | None = None, project_id: str | None = None) None [source]#
Delete a project from LangSmith.
- Parameters:
project_name (str or None, default=None) – The name of the project to delete.
project_id (str or None, default=None) – The ID of the project to delete.
- Return type:
None
- delete_prompt(prompt_identifier: str) None [source]#
Delete a prompt.
- Parameters:
prompt_identifier (str) – The identifier of the prompt to delete.
- Returns:
True if the prompt was successfully deleted, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the current tenant is not the owner of the prompt.
- delete_run_from_annotation_queue(queue_id: UUID | str, *, run_id: UUID | str) None [source]#
Delete a run from an annotation queue with the specified queue ID and run ID.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue.
run_id (ID_TYPE) – The ID of the run to be added to the annotation queue.
- Return type:
None
- diff_dataset_versions(dataset_id: UUID | str | None = None, *, dataset_name: str | None = None, from_version: str | datetime, to_version: str | datetime) DatasetDiffInfo [source]#
Get the difference between two versions of a dataset.
- Parameters:
dataset_id (str or None, default=None) – The ID of the dataset.
dataset_name (str or None, default=None) – The name of the dataset.
from_version (str or datetime.datetime) – The starting version for the diff.
to_version (str or datetime.datetime) – The ending version for the diff.
Returns –
------- –
DatasetDiffInfo – The difference between the two versions of the dataset.
Examples –
-------- –
code-block: (..) –
python: # Get the difference between two tagged versions of a dataset from_version = “prod” to_version = “dev” diff = client.diff_dataset_versions(
dataset_name=”my-dataset”, from_version=from_version, to_version=to_version,
) print(diff)
# Get the difference between two timestamped versions of a dataset from_version = datetime.datetime(2024, 1, 1) to_version = datetime.datetime(2024, 2, 1) diff = client.diff_dataset_versions(
dataset_name=”my-dataset”, from_version=from_version, to_version=to_version,
) print(diff)
- Return type:
- evaluate(target: TARGET_T | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T], /, data: DATA_T | None = None, evaluators: Sequence[EVALUATOR_T] | Sequence[COMPARATIVE_EVALUATOR_T] | None = None, summary_evaluators: Sequence[SUMMARY_EVALUATOR_T] | None = None, metadata: dict | None = None, experiment_prefix: str | None = None, description: str | None = None, max_concurrency: int | None = 0, num_repetitions: int = 1, blocking: bool = True, experiment: EXPERIMENT_T | None = None, upload_results: bool = True, **kwargs: Any) ExperimentResults | ComparativeExperimentResults [source]#
Evaluate a target system on a given dataset.
- Parameters:
target (TARGET_T | Runnable | EXPERIMENT_T | Tuple[EXPERIMENT_T, EXPERIMENT_T]) – The target system or experiment(s) to evaluate. Can be a function that takes a dict and returns a dict, a langchain Runnable, an existing experiment ID, or a two-tuple of experiment IDs.
data (DATA_T) – The dataset to evaluate on. Can be a dataset name, a list of examples, or a generator of examples.
evaluators (Sequence[EVALUATOR_T] | Sequence[COMPARATIVE_EVALUATOR_T] | None) – A list of evaluators to run on each example. The evaluator signature depends on the target type. Default to None.
summary_evaluators (Sequence[SUMMARY_EVALUATOR_T] | None) – A list of summary evaluators to run on the entire dataset. Should not be specified if comparing two existing experiments. Defaults to None.
metadata (dict | None) – Metadata to attach to the experiment. Defaults to None.
experiment_prefix (str | None) – A prefix to provide for your experiment name. Defaults to None.
description (str | None) – A free-form text description for the experiment.
max_concurrency (int | None) – The maximum number of concurrent evaluations to run. If None then no limit is set. If 0 then no concurrency. Defaults to 0.
blocking (bool) – Whether to block until the evaluation is complete. Defaults to True.
num_repetitions (int) – The number of times to run the evaluation. Each item in the dataset will be run and evaluated this many times. Defaults to 1.
experiment (schemas.TracerSession | None) – An existing experiment to extend. If provided, experiment_prefix is ignored. For advanced usage only. Should not be specified if target is an existing experiment or two-tuple fo experiments.
load_nested (bool) – Whether to load all child runs for the experiment. Default is to only load the top-level root runs. Should only be specified when target is an existing experiment or two-tuple of experiments.
randomize_order (bool) – Whether to randomize the order of the outputs for each evaluation. Default is False. Should only be specified when target is a two-tuple of existing experiments.
upload_results (bool) –
kwargs (Any) –
- Returns:
If target is a function, Runnable, or existing experiment. ComparativeExperimentResults: If target is a two-tuple of existing experiments.
- Return type:
Examples
Prepare the dataset:
>>> from langsmith import Client >>> client = Client() >>> dataset = client.clone_public_dataset( ... "https://smith.langchain.com/public/419dcab2-1d66-4b94-8901-0357ead390df/d" ... ) >>> dataset_name = "Evaluate Examples"
Basic usage:
>>> def accuracy(outputs: dict, reference_outputs: dict) -> dict: ... # Row-level evaluator for accuracy. ... pred = outputs["response"] ... expected = reference_outputs["answer"] ... return {"score": expected.lower() == pred.lower()}
>>> def precision(outputs: list[dict], reference_outputs: list[dict]) -> dict: ... # Experiment-level evaluator for precision. ... # TP / (TP + FP) ... predictions = [out["response"].lower() for out in outputs] ... expected = [ref["answer"].lower() for ref in reference_outputs] ... # yes and no are the only possible answers ... tp = sum([p == e for p, e in zip(predictions, expected) if p == "yes"]) ... fp = sum([p == "yes" and e == "no" for p, e in zip(predictions, expected)]) ... return {"score": tp / (tp + fp)} >>> def predict(inputs: dict) -> dict: ... # This can be any function or just an API call to your app. ... return {"response": "Yes"} >>> results = client.evaluate( ... predict, ... data=dataset_name, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Experiment", ... description="Evaluating the accuracy of a simple prediction model.", ... metadata={ ... "my-prompt-version": "abcd-1234", ... }, ... ) View the evaluation results for experiment:...
Evaluating over only a subset of the examples
>>> experiment_name = results.experiment_name >>> examples = client.list_examples(dataset_name=dataset_name, limit=5) >>> results = client.evaluate( ... predict, ... data=examples, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... experiment_prefix="My Experiment", ... description="Just testing a subset synchronously.", ... ) View the evaluation results for experiment:...
Streaming each prediction to more easily + eagerly debug.
>>> results = client.evaluate( ... predict, ... data=dataset_name, ... evaluators=[accuracy], ... summary_evaluators=[precision], ... description="I don't even have to block!", ... blocking=False, ... ) View the evaluation results for experiment:... >>> for i, result in enumerate(results): ... pass
Using the evaluate API with an off-the-shelf LangChain evaluator:
>>> from langsmith.evaluation import LangChainStringEvaluator >>> from langchain.chat_models import init_chat_model >>> def prepare_criteria_data(run: Run, example: Example): ... return { ... "prediction": run.outputs["output"], ... "reference": example.outputs["answer"], ... "input": str(example.inputs), ... } >>> results = client.evaluate( ... predict, ... data=dataset_name, ... evaluators=[ ... accuracy, ... LangChainStringEvaluator("embedding_distance"), ... LangChainStringEvaluator( ... "labeled_criteria", ... config={ ... "criteria": { ... "usefulness": "The prediction is useful if it is correct" ... " and/or asks a useful followup question." ... }, ... "llm": init_chat_model("gpt-4o"), ... }, ... prepare_data=prepare_criteria_data, ... ), ... ], ... description="Evaluating with off-the-shelf LangChain evaluators.", ... summary_evaluators=[precision], ... ) View the evaluation results for experiment:...
Evaluating a LangChain object:
>>> from langchain_core.runnables import chain as as_runnable >>> @as_runnable ... def nested_predict(inputs): ... return {"response": "Yes"} >>> @as_runnable ... def lc_predict(inputs): ... return nested_predict.invoke(inputs) >>> results = client.evaluate( ... lc_predict, ... data=dataset_name, ... evaluators=[accuracy], ... description="This time we're evaluating a LangChain object.", ... summary_evaluators=[precision], ... ) View the evaluation results for experiment:...
New in version 0.2.0.
- evaluate_run(run: ls_schemas.Run | ls_schemas.RunBase | str | uuid.UUID, evaluator: ls_evaluator.RunEvaluator, *, source_info: Dict[str, Any] | None = None, reference_example: ls_schemas.Example | str | dict | uuid.UUID | None = None, load_child_runs: bool = False) ls_evaluator.EvaluationResult [source]#
Evaluate a run.
- Parameters:
evaluator (RunEvaluator) – The evaluator to use.
source_info (Dict[str, Any] or None, default=None) – Additional information about the source of the evaluation to log as feedback metadata.
reference_example (Example or str or dict or UUID or None, default=None) – The example to use as a reference for the evaluation. If not provided, the run’s reference example will be used.
load_child_runs (bool, default=False) – Whether to load child runs when resolving the run ID.
Returns –
------- –
Feedback – The feedback object created by the evaluation.
- Return type:
ls_evaluator.EvaluationResult
- get_prompt(prompt_identifier: str) Prompt | None [source]#
Get a specific prompt by its identifier.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
"owner/prompt_name". (The identifier should be in the format "prompt_name" or) –
- Returns:
The prompt object.
- Return type:
Optional[ls_schemas.Prompt]
- Raises:
requests.exceptions.HTTPError – If the prompt is not found or
another error occurs. –
- get_run_from_annotation_queue(queue_id: UUID | str, *, index: int) RunWithAnnotationQueueInfo [source]#
Get a run from an annotation queue at the specified index.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue.
index (int) – The index of the run to retrieve.
- Returns:
The run at the specified index.
- Return type:
ls_schemas.RunWithAnnotationQueueInfo
- Raises:
ls_utils.LangSmithNotFoundError – If the run is not found at the given index.
ls_utils.LangSmithError – For other API-related errors.
- get_run_stats(*, id: List[UUID | str] | None = None, trace: UUID | str | None = None, parent_run: UUID | str | None = None, run_type: str | None = None, project_names: List[str] | None = None, project_ids: List[UUID | str] | None = None, reference_example_ids: List[UUID | str] | None = None, start_time: str | None = None, end_time: str | None = None, error: bool | None = None, query: str | None = None, filter: str | None = None, trace_filter: str | None = None, tree_filter: str | None = None, is_root: bool | None = None, data_source_type: str | None = None) Dict[str, Any] [source]#
Get aggregate statistics over queried runs.
Takes in similar query parameters to list_runs and returns statistics based on the runs that match the query.
- Parameters:
id (Optional[List[ID_TYPE]]) – List of run IDs to filter by.
trace (Optional[ID_TYPE]) – Trace ID to filter by.
parent_run (Optional[ID_TYPE]) – Parent run ID to filter by.
run_type (Optional[str]) – Run type to filter by.
projects (Optional[List[ID_TYPE]]) – List of session IDs to filter by.
reference_example (Optional[List[ID_TYPE]]) – List of reference example IDs to filter by.
start_time (Optional[str]) – Start time to filter by.
end_time (Optional[str]) – End time to filter by.
error (Optional[bool]) – Filter by error status.
query (Optional[str]) – Query string to filter by.
filter (Optional[str]) – Filter string to apply.
trace_filter (Optional[str]) – Trace filter string to apply.
tree_filter (Optional[str]) – Tree filter string to apply.
is_root (Optional[bool]) – Filter by root run status.
data_source_type (Optional[str]) – Data source type to filter by.
project_names (List[str] | None) –
project_ids (List[UUID | str] | None) –
reference_example_ids (List[UUID | str] | None) –
- Returns:
A dictionary containing the run statistics.
- Return type:
Dict[str, Any]
- get_run_url(*, run: RunBase, project_name: str | None = None, project_id: UUID | str | None = None) str [source]#
Get the URL for a run.
Not recommended for use within your agent runtime. More for use interacting with runs after the fact for data analysis or ETL workloads.
- Parameters:
run (Run) – The run.
project_name (str or None, default=None) – The name of the project.
project_id (UUID or None, default=None) – The ID of the project.
Returns –
------- –
str – The URL for the run.
- Return type:
str
- get_test_results(*, project_id: ID_TYPE | None = None, project_name: str | None = None) pd.DataFrame [source]#
Read the record-level information from an experiment into a Pandas DF.
Note: this will fetch whatever data exists in the DB. Results are not immediately available in the DB upon evaluation run completion.
Returns:#
- pd.DataFrame
A dataframe containing the test results.
- Parameters:
project_id (Optional[ID_TYPE]) –
project_name (Optional[str]) –
- Return type:
pd.DataFrame
- has_dataset(*, dataset_name: str | None = None, dataset_id: str | None = None) bool [source]#
Check whether a dataset exists in your tenant.
- Parameters:
dataset_name (str or None, default=None) – The name of the dataset to check.
dataset_id (str or None, default=None) – The ID of the dataset to check.
Returns –
------- –
bool – Whether the dataset exists.
- Return type:
bool
- has_project(project_name: str, *, project_id: str | None = None) bool [source]#
Check if a project exists.
- Parameters:
project_name (str) – The name of the project to check for.
project_id (str or None, default=None) – The ID of the project to check for.
Returns –
------- –
bool – Whether the project exists.
- Return type:
bool
- index_dataset(*, dataset_id: UUID | str, tag: str = 'latest', **kwargs: Any) None [source]#
Enable dataset indexing. Examples are indexed by their inputs.
This enables searching for similar examples by inputs with
client.similar_examples()
.- Parameters:
dataset_id (UUID) – The ID of the dataset to index.
tag (str, optional) – The version of the dataset to index. If ‘latest’ then any updates to the dataset (additions, updates, deletions of examples) will be reflected in the index.
kwargs (Any) –
- Returns:
None
- Raises:
requests.HTTPError –
- Return type:
None
- like_prompt(prompt_identifier: str) Dict[str, int] [source]#
Like a prompt.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
- Returns:
A dictionary with the key ‘likes’ and the count of likes as the value.
- Return type:
Dict[str, int]
- list_annotation_queues(*, queue_ids: List[UUID | str] | None = None, name: str | None = None, name_contains: str | None = None, limit: int | None = None) Iterator[AnnotationQueue] [source]#
List the annotation queues on the LangSmith API.
- Parameters:
queue_ids (List[UUID | str] | None) – List[str or UUID] or None, default=None The IDs of the queues to filter by.
name (str | None) – str or None, default=None The name of the queue to filter by.
name_contains (str | None) – str or None, default=None The substring that the queue name should contain.
limit (int | None) – int or None, default=None
- Yields:
- AnnotationQueue
The annotation queues.
- Return type:
Iterator[AnnotationQueue]
- list_dataset_splits(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, as_of: datetime | str | None = None) List[str] [source]#
Get the splits for a dataset.
- Parameters:
dataset_id (ID_TYPE) – The ID of the dataset.
as_of (Optional[Union[str, datetime.datetime]], optional) – The version of the dataset to retrieve splits for. Can be a timestamp or a string tag. Defaults to “latest”.
dataset_name (str | None) –
- Returns:
The names of this dataset’s.
- Return type:
List[str]
- list_dataset_versions(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, search: str | None = None, limit: int | None = None) Iterator[DatasetVersion] [source]#
List dataset versions.
- Parameters:
dataset_id (Optional[ID_TYPE]) – The ID of the dataset.
dataset_name (Optional[str]) – The name of the dataset.
search (Optional[str]) – The search query.
limit (Optional[int]) – The maximum number of versions to return.
- Returns:
An iterator of dataset versions.
- Return type:
Iterator[ls_schemas.DatasetVersion]
- list_datasets(*, dataset_ids: List[UUID | str] | None = None, data_type: str | None = None, dataset_name: str | None = None, dataset_name_contains: str | None = None, metadata: Dict[str, Any] | None = None, limit: int | None = None) Iterator[Dataset] [source]#
List the datasets on the LangSmith API.
Yields:#
- Dataset
The datasets.
- Parameters:
dataset_ids (List[UUID | str] | None) –
data_type (str | None) –
dataset_name (str | None) –
dataset_name_contains (str | None) –
metadata (Dict[str, Any] | None) –
limit (int | None) –
- Return type:
Iterator[Dataset]
- list_examples(dataset_id: UUID | str | None = None, dataset_name: str | None = None, example_ids: Sequence[UUID | str] | None = None, as_of: datetime | str | None = None, splits: Sequence[str] | None = None, inline_s3_urls: bool = True, *, offset: int = 0, limit: int | None = None, metadata: dict | None = None, filter: str | None = None, include_attachments: bool = False, **kwargs: Any) Iterator[Example] [source]#
Retrieve the example rows of the specified dataset.
- Parameters:
dataset_id (UUID, optional) – The ID of the dataset to filter by. Defaults to None.
dataset_name (str, optional) – The name of the dataset to filter by. Defaults to None.
example_ids (List[UUID], optional) – The IDs of the examples to filter by. Defaults to None.
as_of (datetime, str, or optional) – The dataset version tag OR timestamp to retrieve the examples as of. Response examples will only be those that were present at the time of the tagged (or timestamped) version.
splits (List[str], optional) – A list of dataset splits, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’. Returns examples only from the specified splits.
inline_s3_urls (bool, optional) – Whether to inline S3 URLs. Defaults to True.
offset (int) – The offset to start from. Defaults to 0.
limit (int, optional) – The maximum number of examples to return.
filter (str, optional) – A structured fileter string to apply to the examples.
metadata (dict | None) –
include_attachments (bool) –
kwargs (Any) –
- Yields:
Example – The examples.
- Return type:
Iterator[Example]
- list_feedback(*, run_ids: Sequence[UUID | str] | None = None, feedback_key: Sequence[str] | None = None, feedback_source_type: Sequence[FeedbackSourceType] | None = None, limit: int | None = None, **kwargs: Any) Iterator[Feedback] [source]#
List the feedback objects on the LangSmith API.
- Parameters:
run_ids (List[str or UUID] or None, default=None) – The IDs of the runs to filter by.
feedback_key (List[str] or None, default=None) – The feedback key(s) to filter by. Example: ‘correctness’ The query performs a union of all feedback keys.
feedback_source_type (List[FeedbackSourceType] or None, default=None) – The type of feedback source, such as model (for model-generated feedback) or API.
limit (int or None, default=None) –
**kwargs (Any) – Additional keyword arguments.
Yields –
------ –
Feedback – The feedback objects.
- Return type:
Iterator[Feedback]
- list_presigned_feedback_tokens(run_id: UUID | str, *, limit: int | None = None) Iterator[FeedbackIngestToken] [source]#
List the feedback ingest tokens for a run.
- Parameters:
run_id (UUID | str) – The ID of the run to filter by.
limit (int | None) – The maximum number of tokens to return.
- Yields:
- FeedbackIngestToken
The feedback ingest tokens.
- Return type:
Iterator[FeedbackIngestToken]
- list_projects(project_ids: List[UUID | str] | None = None, name: str | None = None, name_contains: str | None = None, reference_dataset_id: UUID | str | None = None, reference_dataset_name: str | None = None, reference_free: bool | None = None, limit: int | None = None, metadata: Dict[str, Any] | None = None) Iterator[TracerSession] [source]#
List projects from the LangSmith API.
- Parameters:
project_ids (Optional[List[ID_TYPE]], optional) – A list of project IDs to filter by, by default None
name (Optional[str], optional) – The name of the project to filter by, by default None
name_contains (Optional[str], optional) – A string to search for in the project name, by default None
reference_dataset_id (Optional[List[ID_TYPE]], optional) – A dataset ID to filter by, by default None
reference_dataset_name (Optional[str], optional) – The name of the reference dataset to filter by, by default None
reference_free (Optional[bool], optional) – Whether to filter for only projects not associated with a dataset.
limit (Optional[int], optional) – The maximum number of projects to return, by default None
metadata (Optional[Dict[str, Any]], optional) – Metadata to filter by.
Yields –
------ –
TracerSession – The projects.
- Return type:
Iterator[TracerSession]
- list_prompt_commits(prompt_identifier: str, *, limit: int | None = None, offset: int = 0, include_model: bool = False) Iterator[ListedPromptCommit] [source]#
List commits for a given prompt.
- Parameters:
prompt_identifier (str) – The identifier of the prompt in the format ‘owner/repo_name’.
limit (Optional[int], optional) – The maximum number of commits to return. If None, returns all commits. Defaults to None.
offset (int, optional) – The number of commits to skip before starting to return results. Defaults to 0.
include_model (bool, optional) – Whether to include the model information in the commit data. Defaults to False.
- Returns:
An iterator of ListedPromptCommit objects representing the commits.
- Return type:
Iterator[ls_schemas.ListedPromptCommit]
- Yields:
ls_schemas.ListedPromptCommit – A ListedPromptCommit object for each commit.
Note
This method uses pagination to retrieve commits. It will make multiple API calls if necessary to retrieve all commits or up to the specified limit.
- list_prompts(*, limit: int = 100, offset: int = 0, is_public: bool | None = None, is_archived: bool | None = False, sort_field: PromptSortField = PromptSortField.updated_at, sort_direction: Literal['desc', 'asc'] = 'desc', query: str | None = None) ListPromptsResponse [source]#
List prompts with pagination.
- Parameters:
limit (int) – The maximum number of prompts to return. Defaults to 100.
offset (int) – The number of prompts to skip. Defaults to 0.
is_public (Optional[bool]) – Filter prompts by if they are public.
is_archived (Optional[bool]) – Filter prompts by if they are archived.
sort_field (ls_schemas.PromptsSortField) – The field to sort by. Defaults to “updated_at”.
sort_direction (Literal["desc", "asc"]) – The order to sort by. Defaults to “desc”.
query (Optional[str]) – Filter prompts by a search query.
- Returns:
A response object containing the list of prompts.
- Return type:
ls_schemas.ListPromptsResponse
- list_runs(*, project_id: UUID | str | Sequence[UUID | str] | None = None, project_name: str | Sequence[str] | None = None, run_type: str | None = None, trace_id: UUID | str | None = None, reference_example_id: UUID | str | None = None, query: str | None = None, filter: str | None = None, trace_filter: str | None = None, tree_filter: str | None = None, is_root: bool | None = None, parent_run_id: UUID | str | None = None, start_time: datetime | None = None, error: bool | None = None, run_ids: Sequence[UUID | str] | None = None, select: Sequence[str] | None = None, limit: int | None = None, **kwargs: Any) Iterator[Run] [source]#
List runs from the LangSmith API.
- Parameters:
project_id (UUID or None, default=None) – The ID(s) of the project to filter by.
project_name (str or None, default=None) – The name(s) of the project to filter by.
run_type (str or None, default=None) – The type of the runs to filter by.
trace_id (UUID or None, default=None) – The ID of the trace to filter by.
reference_example_id (UUID or None, default=None) – The ID of the reference example to filter by.
query (str or None, default=None) – The query string to filter by.
filter (str or None, default=None) – The filter string to filter by.
trace_filter (str or None, default=None) – Filter to apply to the ROOT run in the trace tree. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of the root run within a trace.
tree_filter (str or None, default=None) – Filter to apply to OTHER runs in the trace tree, including sibling and child runs. This is meant to be used in conjunction with the regular filter parameter to let you filter runs by attributes of any run within a trace.
is_root (bool or None, default=None) – Whether to filter by root runs.
parent_run_id (UUID or None, default=None) – The ID of the parent run to filter by.
start_time (datetime or None, default=None) – The start time to filter by.
error (bool or None, default=None) – Whether to filter by error status.
run_ids (List[str or UUID] or None, default=None) – The IDs of the runs to filter by.
limit (int or None, default=None) – The maximum number of runs to return.
**kwargs (Any) – Additional keyword arguments.
Yields –
------ –
Run – The runs.
Examples –
-------- –
code-block: (..) –
python: # List all runs in a project project_runs = client.list_runs(project_name=”<your_project>”)
# List LLM and Chat runs in the last 24 hours todays_llm_runs = client.list_runs(
project_name=”<your_project>”, start_time=datetime.now() - timedelta(days=1), run_type=”llm”,
)
# List root traces in a project root_runs = client.list_runs(project_name=”<your_project>”, is_root=1)
# List runs without errors correct_runs = client.list_runs(project_name=”<your_project>”, error=False)
# List runs and only return their inputs/outputs (to speed up the query) input_output_runs = client.list_runs(
project_name=”<your_project>”, select=[“inputs”, “outputs”]
)
# List runs by run ID run_ids = [
”a36092d2-4ad5-4fb4-9c0d-0dba9a2ed836”, “9398e6be-964f-4aa4-8ae9-ad78cd4b7074”,
] selected_runs = client.list_runs(id=run_ids)
# List all “chain” type runs that took more than 10 seconds and had # total_tokens greater than 5000 chain_runs = client.list_runs(
project_name=”<your_project>”, filter=’and(eq(run_type, “chain”), gt(latency, 10), gt(total_tokens, 5000))’,
)
# List all runs called “extractor” whose root of the trace was assigned feedback “user_score” score of 1 good_extractor_runs = client.list_runs(
project_name=”<your_project>”, filter=’eq(name, “extractor”)’, trace_filter=’and(eq(feedback_key, “user_score”), eq(feedback_score, 1))’,
)
# List all runs that started after a specific timestamp and either have “error” not equal to null or a “Correctness” feedback score equal to 0 complex_runs = client.list_runs(
project_name=”<your_project>”, filter=’and(gt(start_time, “2023-07-15T12:34:56Z”), or(neq(error, null), and(eq(feedback_key, “Correctness”), eq(feedback_score, 0.0))))’,
)
# List all runs where tags include “experimental” or “beta” and latency is greater than 2 seconds tagged_runs = client.list_runs(
project_name=”<your_project>”, filter=’and(or(has(tags, “experimental”), has(tags, “beta”)), gt(latency, 2))’,
)
select (Sequence[str] | None) –
- Return type:
Iterator[Run]
Get shared examples.
- Parameters:
share_token (str) –
example_ids (List[UUID | str] | None) –
- Return type:
List[Example]
List shared projects.
- Parameters:
dataset_share_token (str) – str The share token of the dataset.
project_ids (List[UUID | str] | None) – List[ID_TYPE], optional List of project IDs to filter the results, by default None.
name (str | None) – str, optional Name of the project to filter the results, by default None.
name_contains (str | None) – str, optional Substring to search for in project names, by default None.
limit (int | None) – int, optional
- Yields:
TracerSessionResult – The shared projects.
- Return type:
Iterator[TracerSessionResult]
Get shared runs.
- Parameters:
share_token (UUID | str) –
run_ids (List[str] | None) –
- Return type:
Iterator[Run]
- multipart_ingest(create: Sequence[Run | RunLikeDict | Dict] | None = None, update: Sequence[Run | RunLikeDict | Dict] | None = None, *, pre_sampled: bool = False) None [source]#
Batch ingest/upsert multiple runs in the Langsmith system.
- Parameters:
create (Optional[Sequence[Union[ls_schemas.Run, RunLikeDict]]]) – A sequence of Run objects or equivalent dictionaries representing runs to be created / posted.
update (Optional[Sequence[Union[ls_schemas.Run, RunLikeDict]]]) – A sequence of Run objects or equivalent dictionaries representing runs that have already been created and should be updated / patched.
pre_sampled (bool, optional) – Whether the runs have already been subject to sampling, and therefore should not be sampled again. Defaults to False.
- Returns:
None
- Raises:
LangsmithAPIError – If there is an error in the API request.
- Return type:
None
Note
- The run objects MUST contain the dotted_order and trace_id fields
to be accepted by the API.
- pull_prompt(prompt_identifier: str, *, include_model: bool | None = False) Any [source]#
Pull a prompt and return it as a LangChain PromptTemplate.
This method requires langchain_core.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
include_model (bool | None) –
- Returns:
The prompt object in the specified format.
- Return type:
Any
- pull_prompt_commit(prompt_identifier: str, *, include_model: bool | None = False) PromptCommit [source]#
Pull a prompt object from the LangSmith API.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
include_model (bool | None) –
- Returns:
The prompt object.
- Return type:
ls_schemas.PromptObject
- Raises:
ValueError – If no commits are found for the prompt.
- push_prompt(prompt_identifier: str, *, object: Any | None = None, parent_commit_hash: str = 'latest', is_public: bool | None = None, description: str | None = None, readme: str | None = None, tags: Sequence[str] | None = None) str [source]#
Push a prompt to the LangSmith API.
Can be used to update prompt metadata or prompt content.
If the prompt does not exist, it will be created. If the prompt exists, it will be updated.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
object (Optional[Any]) – The LangChain object to push.
parent_commit_hash (str) – The parent commit hash. Defaults to “latest”.
is_public (Optional[bool]) – Whether the prompt should be public. If None (default), the current visibility status is maintained for existing prompts. For new prompts, None defaults to private. Set to True to make public, or False to make private.
description (Optional[str]) – A description of the prompt. Defaults to an empty string.
readme (Optional[str]) – A readme for the prompt. Defaults to an empty string.
tags (Optional[Sequence[str]]) – A list of tags for the prompt. Defaults to an empty list.
- Returns:
The URL of the prompt.
- Return type:
str
- read_annotation_queue(queue_id: UUID | str) AnnotationQueue [source]#
Read an annotation queue with the specified queue ID.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue to read.
- Returns:
The annotation queue object.
- Return type:
ls_schemas.AnnotationQueue
- read_dataset(*, dataset_name: str | None = None, dataset_id: UUID | str | None = None) Dataset [source]#
Read a dataset from the LangSmith API.
- Parameters:
dataset_name (str or None, default=None) – The name of the dataset to read.
dataset_id (UUID or None, default=None) – The ID of the dataset to read.
Returns –
------- –
Dataset – The dataset.
- Return type:
- read_dataset_openai_finetuning(dataset_id: str | None = None, *, dataset_name: str | None = None) list [source]#
Download a dataset in OpenAI Jsonl format and load it as a list of dicts.
- Parameters:
dataset_id (str) – The ID of the dataset to download.
dataset_name (str) – The name of the dataset to download.
Returns –
------- –
list – The dataset loaded as a list of dicts.
- Return type:
list
Retrieve the shared schema of a dataset.
- Parameters:
dataset_id (Optional[ID_TYPE]) – The ID of the dataset. Either dataset_id or dataset_name must be given.
dataset_name (Optional[str]) – The name of the dataset. Either dataset_id or dataset_name must be given.
- Returns:
The shared schema of the dataset.
- Return type:
ls_schemas.DatasetShareSchema
- Raises:
ValueError – If neither dataset_id nor dataset_name is given.
- read_dataset_version(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, as_of: datetime | None = None, tag: str | None = None) DatasetVersion [source]#
Get dataset version by as_of or exact tag.
Ues this to resolve the nearest version to a given timestamp or for a given tag.
- Parameters:
dataset_id (Optional[ID_TYPE]) – The ID of the dataset.
dataset_name (Optional[str]) – The name of the dataset.
as_of (Optional[datetime.datetime]) – The timestamp of the dataset to retrieve.
tag (Optional[str]) – The tag of the dataset to retrieve.
- Returns:
The dataset version.
- Return type:
ls_schemas.DatasetVersion
Examples:#
# Get the latest version of a dataset client.read_dataset_version(dataset_name="my-dataset", tag="latest") # Get the version of a dataset <= a given timestamp client.read_dataset_version( dataset_name="my-dataset", as_of=datetime.datetime(2024, 1, 1), ) # Get the version of a dataset with a specific tag client.read_dataset_version(dataset_name="my-dataset", tag="prod")
- read_example(example_id: UUID | str, *, as_of: datetime | None = None) Example [source]#
Read an example from the LangSmith API.
- Parameters:
example_id (UUID) – The ID of the example to read.
as_of (datetime | None) –
- Returns:
The example.
- Return type:
- read_feedback(feedback_id: UUID | str) Feedback [source]#
Read a feedback from the LangSmith API.
- Parameters:
feedback_id (str or UUID) – The ID of the feedback to read.
Returns –
------- –
Feedback – The feedback.
- Return type:
- read_project(*, project_id: str | None = None, project_name: str | None = None, include_stats: bool = False) TracerSessionResult [source]#
Read a project from the LangSmith API.
- Parameters:
project_id (str or None, default=None) – The ID of the project to read.
project_name (str or None, default=None) –
- The name of the project to read.
Note: Only one of project_id or project_name may be given.
include_stats (bool, default=False) – Whether to include a project’s aggregate statistics in the response.
Returns –
------- –
TracerSessionResult – The project.
- Return type:
- read_run(run_id: UUID | str, load_child_runs: bool = False) Run [source]#
Read a run from the LangSmith API.
- Parameters:
run_id (str or UUID) – The ID of the run to read.
load_child_runs (bool, default=False) – Whether to load nested child runs.
Returns –
------- –
Run – The run.
- Return type:
Retrieve the shared link for a specific run.
- Parameters:
run_id (ID_TYPE) – The ID of the run.
- Returns:
The shared link for the run, or None if the link is not available.
- Return type:
Optional[str]
Get shared datasets.
- Parameters:
share_token (str) –
- Return type:
Get shared runs.
- Parameters:
share_token (UUID | str) –
run_id (UUID | str | None) –
- Return type:
- request_with_retries(method: Literal['GET', 'POST', 'PUT', 'PATCH', 'DELETE'], pathname: str, *, request_kwargs: Mapping | None = None, stop_after_attempt: int = 1, retry_on: Sequence[Type[BaseException]] | None = None, to_ignore: Sequence[Type[BaseException]] | None = None, handle_response: Callable[[Response, int], Any] | None = None, _context: str = '', **kwargs: Any) Response [source]#
Send a request with retries.
- Parameters:
request_method (str) – The HTTP request method.
pathname (str) – The pathname of the request URL. Will be appended to the API URL.
request_kwargs (Mapping) – Additional request parameters.
stop_after_attempt (int, default=1) – The number of attempts to make.
retry_on (Sequence[Type[BaseException]] or None, default=None) – The exceptions to retry on. In addition to: [LangSmithConnectionError, LangSmithAPIError].
to_ignore (Sequence[Type[BaseException]] or None, default=None) – The exceptions to ignore / pass on.
handle_response (Callable[[requests.Response, int], Any] or None, default=None) – A function to handle the response and return whether to continue retrying.
**kwargs (Any) – Additional keyword arguments to pass to the request.
Returns –
------- –
Response – The response object.
Raises –
------ –
LangSmithAPIError – If a server error occurs.
LangSmithUserError – If the request fails.
LangSmithConnectionError – If a connection error occurs.
LangSmithError – If the request fails.
method (Literal['GET', 'POST', 'PUT', 'PATCH', 'DELETE']) –
_context (str) –
**kwargs –
- Return type:
Response
Get share state for a run.
- Parameters:
run_id (UUID | str) –
- Return type:
bool
- run_on_dataset(dataset_name: str, llm_or_chain_factory: Any, *, evaluation: Any | None = None, concurrency_level: int = 5, project_name: str | None = None, project_metadata: Dict[str, Any] | None = None, dataset_version: datetime | str | None = None, verbose: bool = False, input_mapper: Callable[[Dict], Any] | None = None, revision_id: str | None = None, **kwargs: Any) Dict[str, Any] [source]#
Run the Chain or language model on a dataset.
Deprecated since version 0.1.0: This method is deprecated. Use
langsmith.aevaluate()
instead.- Parameters:
dataset_name (str) –
llm_or_chain_factory (Any) –
evaluation (Any | None) –
concurrency_level (int) –
project_name (str | None) –
project_metadata (Dict[str, Any] | None) –
dataset_version (datetime | str | None) –
verbose (bool) –
input_mapper (Callable[[Dict], Any] | None) –
revision_id (str | None) –
kwargs (Any) –
- Return type:
Dict[str, Any]
Get a share link for a dataset.
- Parameters:
dataset_id (UUID | str | None) –
dataset_name (str | None) –
- Return type:
Get a share link for a run.
- Parameters:
run_id (UUID | str) –
share_id (UUID | str | None) –
- Return type:
str
- similar_examples(inputs: dict, /, *, limit: int, dataset_id: UUID | str, filter: str | None = None, **kwargs: Any) List[ExampleSearch] [source]#
Retrieve the dataset examples whose inputs best match the current inputs.
Note: Must have few-shot indexing enabled for the dataset. See client.index_dataset().
- Parameters:
inputs (dict) – The inputs to use as a search query. Must match the dataset input schema. Must be JSON serializable.
limit (int) – The maximum number of examples to return.
dataset_id (str or UUID) – The ID of the dataset to search over.
filter (str, optional) –
A filter string to apply to the search results. Uses the same syntax as the filter parameter in list_runs(). Only a subset of operations are supported. Defaults to None.
For example, you can use
and(eq(metadata.some_tag, 'some_value'), neq(metadata.env, 'dev'))
to filter only examples where some_tag has some_value, and the environment is not dev. kwargs (Any): Additional keyword args to pass as part of request body.kwargs (Any) –
- Return type:
List[ExampleSearch]
Examples
from langsmith import Client client = Client() client.similar_examples( {"question": "When would i use the runnable generator"}, limit=3, dataset_id="...", )
[ ExampleSearch( inputs={'question': 'How do I cache a Chat model? What caches can I use?'}, outputs={'answer': 'You can use LangChain\'s caching layer for Chat Models. This can save you money by reducing the number of API calls you make to the LLM provider, if you\'re often requesting the same completion multiple times, and speed up your application.\n\nfrom langchain.cache import InMemoryCache\nlangchain.llm_cache = InMemoryCache()\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict(\'Tell me a joke\')\n\nYou can also use SQLite Cache which uses a SQLite database:\n\nrm .langchain.db\n\nfrom langchain.cache import SQLiteCache\nlangchain.llm_cache = SQLiteCache(database_path=".langchain.db")\n\n# The first time, it is not yet in cache, so it should take longer\nllm.predict(\'Tell me a joke\') \n'}, metadata=None, id=UUID('b2ddd1c4-dff6-49ae-8544-f48e39053398'), dataset_id=UUID('01b6ce0f-bfb6-4f48-bbb8-f19272135d40') ), ExampleSearch( inputs={'question': "What's a runnable lambda?"}, outputs={'answer': "A runnable lambda is an object that implements LangChain's `Runnable` interface and runs a callbale (i.e., a function). Note the function must accept a single argument."}, metadata=None, id=UUID('f94104a7-2434-4ba7-8293-6a283f4860b4'), dataset_id=UUID('01b6ce0f-bfb6-4f48-bbb8-f19272135d40') ), ExampleSearch( inputs={'question': 'Show me how to use RecursiveURLLoader'}, outputs={'answer': 'The RecursiveURLLoader comes from the langchain.document_loaders.recursive_url_loader module. Here\'s an example of how to use it:\n\nfrom langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader\n\n# Create an instance of RecursiveUrlLoader with the URL you want to load\nloader = RecursiveUrlLoader(url="https://example.com")\n\n# Load all child links from the URL page\nchild_links = loader.load()\n\n# Print the child links\nfor link in child_links:\n print(link)\n\nMake sure to replace "https://example.com" with the actual URL you want to load. The load() method returns a list of child links found on the URL page. You can iterate over this list to access each child link.'}, metadata=None, id=UUID('0308ea70-a803-4181-a37d-39e95f138f8c'), dataset_id=UUID('01b6ce0f-bfb6-4f48-bbb8-f19272135d40') ), ]
- unlike_prompt(prompt_identifier: str) Dict[str, int] [source]#
Unlike a prompt.
- Parameters:
prompt_identifier (str) – The identifier of the prompt.
- Returns:
A dictionary with the key ‘likes’ and the count of likes as the value.
- Return type:
Dict[str, int]
Delete share link for a dataset.
- Parameters:
dataset_id (UUID | str) –
- Return type:
None
Delete share link for a run.
- Parameters:
run_id (UUID | str) –
- Return type:
None
- update_annotation_queue(queue_id: UUID | str, *, name: str, description: str | None = None) None [source]#
Update an annotation queue with the specified queue_id.
- Parameters:
queue_id (ID_TYPE) – The ID of the annotation queue to update.
name (str) – The new name for the annotation queue.
description (Optional[str], optional) – The new description for the annotation queue. Defaults to None.
- Return type:
None
- update_dataset_splits(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, split_name: str, example_ids: List[UUID | str], remove: bool = False) None [source]#
Update the splits for a dataset.
- Parameters:
dataset_id (ID_TYPE) – The ID of the dataset to update.
split_name (str) – The name of the split to update.
example_ids (List[ID_TYPE]) – The IDs of the examples to add to or remove from the split.
remove (bool, optional) – If True, remove the examples from the split. If False, add the examples to the split. Defaults to False.
dataset_name (str | None) –
- Returns:
None
- Return type:
None
- update_dataset_tag(*, dataset_id: UUID | str | None = None, dataset_name: str | None = None, as_of: datetime, tag: str) None [source]#
Update the tags of a dataset.
If the tag is already assigned to a different version of this dataset, the tag will be moved to the new version. The as_of parameter is used to determine which version of the dataset to apply the new tags to. It must be an exact version of the dataset to succeed. You can use the read_dataset_version method to find the exact version to apply the tags to.
- Parameters:
dataset_id (UUID) – The ID of the dataset to update.
as_of (datetime.datetime) – The timestamp of the dataset to apply the new tags to.
tag (str) – The new tag to apply to the dataset.
Examples –
-------- –
code-block: (..) –
python: dataset_name = “my-dataset” # Get the version of a dataset <= a given timestamp dataset_version = client.read_dataset_version(
dataset_name=dataset_name, as_of=datetime.datetime(2024, 1, 1)
) # Assign that version a new tag client.update_dataset_tags(
dataset_name=”my-dataset”, as_of=dataset_version.as_of, tag=”prod”,
)
dataset_name (str | None) –
- Return type:
None
- update_example(example_id: UUID | str, *, inputs: Dict[str, Any] | None = None, outputs: Mapping[str, Any] | None = None, metadata: Dict | None = None, split: str | List[str] | None = None, dataset_id: UUID | str | None = None, attachments_operations: AttachmentsOperations | None = None) Dict[str, Any] [source]#
Update a specific example.
- Parameters:
example_id (str or UUID) – The ID of the example to update.
inputs (Dict[str, Any] or None, default=None) – The input values to update.
outputs (Mapping[str, Any] or None, default=None) – The output values to update.
metadata (Dict or None, default=None) – The metadata to update.
split (str or List[str] or None, default=None) – The dataset split to update, such as ‘train’, ‘test’, or ‘validation’.
dataset_id (UUID or None, default=None) – The ID of the dataset to update.
Returns –
------- –
Dict[str – The updated example.
Any] – The updated example.
attachments_operations (AttachmentsOperations | None) –
- Return type:
Dict[str, Any]
- update_examples(*, example_ids: Sequence[UUID | str], inputs: Sequence[Dict[str, Any] | None] | None = None, outputs: Sequence[Mapping[str, Any] | None] | None = None, metadata: Sequence[Dict | None] | None = None, splits: Sequence[str | List[str] | None] | None = None, dataset_ids: Sequence[UUID | str | None] | None = None, attachments_operations: Sequence[AttachmentsOperations | None] | None = None) Dict[str, Any] [source]#
Update multiple examples.
- Parameters:
example_ids (Sequence[ID_TYPE]) – The IDs of the examples to update.
inputs (Optional[Sequence[Optional[Dict[str, Any]]], default=None) – The input values for the examples.
outputs (Optional[Sequence[Optional[Mapping[str, Any]]]], default=None) – The output values for the examples.
metadata (Optional[Sequence[Optional[Mapping[str, Any]]]], default=None) – The metadata for the examples.
split (Optional[Sequence[Optional[str | List[str]]]], default=None) – The splits for the examples, which are divisions of your dataset such as ‘train’, ‘test’, or ‘validation’.
dataset_ids (Optional[Sequence[Optional[ID_TYPE]]], default=None) – The IDs of the datasets to move the examples to.
Returns –
------- –
Dict[str – The response from the server (specifies the number of examples updated).
Any] – The response from the server (specifies the number of examples updated).
splits (Sequence[str | List[str] | None] | None) –
attachments_operations (Sequence[AttachmentsOperations | None] | None) –
- Return type:
Dict[str, Any]
- update_examples_multipart(*, dataset_id: UUID | str, updates: List[ExampleUpdateWithAttachments] | None = None) UpsertExamplesResponse [source]#
Upload examples.
- Parameters:
dataset_id (UUID | str) –
updates (List[ExampleUpdateWithAttachments] | None) –
- Return type:
- update_feedback(feedback_id: UUID | str, *, score: float | int | bool | None = None, value: float | int | bool | str | dict | None = None, correction: dict | None = None, comment: str | None = None) None [source]#
Update a feedback in the LangSmith API.
- Parameters:
feedback_id (str or UUID) – The ID of the feedback to update.
score (float or int or bool or None, default=None) – The score to update the feedback with.
value (float or int or bool or str or dict or None, default=None) – The value to update the feedback with.
correction (dict or None, default=None) – The correction to update the feedback with.
comment (str or None, default=None) – The comment to update the feedback with.
- Return type:
None
- update_project(project_id: UUID | str, *, name: str | None = None, description: str | None = None, metadata: dict | None = None, project_extra: dict | None = None, end_time: datetime | None = None) TracerSession [source]#
Update a LangSmith project.
- Parameters:
project_id (UUID) – The ID of the project to update.
name (str or None, default=None) – The new name to give the project. This is only valid if the project has been assigned an end_time, meaning it has been completed/closed.
description (str or None, default=None) – The new description to give the project.
metadata (dict or None, default=None) –
project_extra (dict or None, default=None) – Additional project information.
Returns –
------- –
TracerSession – The updated project.
end_time (datetime | None) –
- Return type:
- update_prompt(prompt_identifier: str, *, description: str | None = None, readme: str | None = None, tags: Sequence[str] | None = None, is_public: bool | None = None, is_archived: bool | None = None) Dict[str, Any] [source]#
Update a prompt’s metadata.
To update the content of a prompt, use push_prompt or create_commit instead.
- Parameters:
prompt_identifier (str) – The identifier of the prompt to update.
description (Optional[str]) – New description for the prompt.
readme (Optional[str]) – New readme for the prompt.
tags (Optional[Sequence[str]]) – New list of tags for the prompt.
is_public (Optional[bool]) – New public status for the prompt.
is_archived (Optional[bool]) – New archived status for the prompt.
- Returns:
The updated prompt data as returned by the server.
- Return type:
Dict[str, Any]
- Raises:
ValueError – If the prompt_identifier is empty.
HTTPError – If the server request fails.
- update_run(run_id: UUID | str, *, name: str | None = None, end_time: datetime | None = None, error: str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, events: Sequence[dict] | None = None, extra: Dict | None = None, tags: List[str] | None = None, attachments: Dict[str, Tuple[str, bytes] | Attachment] | None = None, **kwargs: Any) None [source]#
Update a run in the LangSmith API.
- Parameters:
run_id (str or UUID) – The ID of the run to update.
name (str or None, default=None) – The name of the run.
end_time (datetime or None) – The end time of the run.
error (str or None, default=None) – The error message of the run.
inputs (Dict or None, default=None) – The input values for the run.
outputs (Dict or None, default=None) – The output values for the run.
events (Sequence[dict] or None, default=None) – The events for the run.
extra (Dict or None, default=None) – The extra information for the run.
tags (List[str] or None, default=None) – The tags for the run.
attachments (dict[str, ls_schemas.Attachment] or None, default=None) – A dictionary of attachments to add to the run. The keys are the attachment names, and the values are Attachment objects containing the data and mime type.
**kwargs (Any) – Kwargs are ignored.
- Return type:
None
- upload_csv(csv_file: str | Tuple[str, BytesIO], input_keys: Sequence[str], output_keys: Sequence[str], *, name: str | None = None, description: str | None = None, data_type: DataType | None = DataType.kv) Dataset [source]#
Upload a CSV file to the LangSmith API.
- Parameters:
csv_file (str or Tuple[str, BytesIO]) – The CSV file to upload. If a string, it should be the path If a tuple, it should be a tuple containing the filename and a BytesIO object.
input_keys (Sequence[str]) – The input keys.
output_keys (Sequence[str]) – The output keys.
name (str or None, default=None) – The name of the dataset.
description (str or None, default=None) – The description of the dataset.
data_type (DataType or None, default=DataType.kv) – The data type of the dataset.
Returns –
------- –
Dataset – The uploaded dataset.
Raises –
------ –
ValueError – If the csv_file is not a string or tuple.
- Return type:
- upload_dataframe(df: pd.DataFrame, name: str, input_keys: Sequence[str], output_keys: Sequence[str], *, description: str | None = None, data_type: ls_schemas.DataType | None = DataType.kv) ls_schemas.Dataset [source]#
Upload a dataframe as individual examples to the LangSmith API.
- Parameters:
df (pd.DataFrame) – The dataframe to upload.
name (str) – The name of the dataset.
input_keys (Sequence[str]) – The input keys.
output_keys (Sequence[str]) – The output keys.
description (str or None, default=None) – The description of the dataset.
data_type (DataType or None, default=DataType.kv) – The data type of the dataset.
Returns –
------- –
Dataset – The uploaded dataset.
Raises –
------ –
ValueError – If the csv_file is not a string or tuple.
- Return type:
ls_schemas.Dataset
- upload_examples_multipart(*, dataset_id: UUID | str, uploads: List[ExampleUploadWithAttachments] | None = None) UpsertExamplesResponse [source]#
Upload examples.
- Parameters:
dataset_id (UUID | str) –
uploads (List[ExampleUploadWithAttachments] | None) –
- Return type:
- upsert_examples_multipart(*, upserts: List[ExampleUpsertWithAttachments] | None = None) UpsertExamplesResponse [source]#
Upsert examples.
Deprecated since version 0.1.0: This method is deprecated. Use
langsmith.upload_examples_multipart()
instead.- Parameters:
upserts (List[ExampleUpsertWithAttachments] | None) –
- Return type: