# Utilizing Metadata & Tags

## Setup

Let's load in our environment variables

In [69]:
# Using a .env file
from dotenv import load_dotenv
load_dotenv(dotenv_path="../.env", override=True)

True

## Applying Metadata

In Python, you can use the trace context manager to add metadata to your traces. This can be useful for querying, grouping, and aggregating your trace information.

In [48]:
from langsmith import traceable, trace
import langsmith as ls
from openai import OpenAI
from typing import List
import nest_asyncio
from utils import get_vector_db_retriever

MODEL_PROVIDER = "openai"
MODEL_NAME = "gpt-4o-mini"
APP_VERSION = 1.0
RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""

openai_client = OpenAI()
nest_asyncio.apply()
retriever = get_vector_db_retriever()

"""
retrieve_documents
- Returns documents fetched from a vectorstore based on the user's question
"""
@traceable(
  run_type="chain",
  tags=["retriever-1.0"],
  metadata={"datasource": "docs.smith.langchain.com"}
)
def retrieve_documents(question: str):
    documents = retriever.invoke(question)
    quality_docs = [doc for doc in documents if len(doc.page_content) > 20]
    rt = ls.get_current_run_tree()
    if len(quality_docs) > 3:
        rt.metadata["data-availability"] = "high"
    else:
        rt.metadata["data-availability"] = "low"
    return documents


@traceable
def generate_response(question: str, documents):
    # NOTE: Our documents came in as a list of objects, but we just want to log a string
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)

    messages = [
        {
            "role": "system",
            "content": RAG_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    response = call_openai(messages)
    return response

"""
call_openai
- Returns the chat completion output from OpenAI
"""
@traceable(run_type="llm")
def call_openai(
    messages: List[dict], model: str = MODEL_NAME, temperature: float = 0.0
) -> str:
    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )
    return response

"""
langsmith_rag
- Calls `retrieve_documents` to fetch documents
- Calls `generate_response` to generate a response based on the fetched documents
- Returns the model response
"""
@traceable
def metadata_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response.choices[0].message.content


In [49]:
question = "How do I add metadata to my trace?"
ai_answer = metadata_rag(question)
print(ai_answer)

To add metadata to your trace, you can follow the instructions provided in the LangSmith documentation on adding metadata to traces. This typically involves attaching key-value pairs to your runs, which can be done through the LangSmith UI or API. For detailed steps, refer to the "Learn how to add metadata to your traces" section in the documentation.


In [50]:
question = "Nocturnal animals eat what?"
ai_answer = metadata_rag(question)
print(ai_answer)

Nocturnal animals typically eat a variety of foods depending on their species, including insects, small mammals, fruits, and plants. Their diet can vary widely, but many are predators or scavengers. Some nocturnal animals, like bats, may also feed on nectar or blood.


## Querying Traces

Once you've added metadata to your traces, you can then utilize this metadata to help you query and organize your traces. This can be accomplished through LangSmith's SDK or API. We'll initialize a LangSmith client to utilize the SDK.

In [28]:
from langsmith import Client
import os

client = Client()

project = os.getenv("LANGSMITH_PROJECT")

We'll also define a helper function for us to print information retrieved about our traces.

In [102]:
def print_trunc_runs(runs):
    try:
        for _ in range(3):
            run = next(runs)
            print(f"Run ID: {run.id}")
            print(f"Name: {run.name}")
            print(f"Run Type: {run.run_type}")
            print(f"Start Time: {run.start_time}")
            print(f"Inputs: {str(run.inputs)[:50]}...")
            print(f"Outputs: {str(run.outputs)[:50]}...")
            print("-" * 40)
    except StopIteration:
        pass

#### Basic Querying

In [103]:
from datetime import datetime, timedelta

todays_successful_llm_runs = client.list_runs(
  project_name=project,
  start_time=datetime.now() - timedelta(days=1),
  run_type="llm",
  error=False
)

print_trunc_runs(todays_successful_llm_runs)

Run ID: 84eec6ea-3264-4a4d-a9de-87def0f661ab
Name: call_openai
Run Type: llm
Start Time: 2025-07-20 03:13:30.079547
Inputs: {'messages': [{'role': 'system', 'content': "You a...
Outputs: {'id': 'chatcmpl-BvEmQ0KgS6drawar0Nv6cTwMIN6sn', '...
----------------------------------------
Run ID: f518f8cc-5a6d-4ac6-a80f-8c5dfc3ef6f9
Name: call_openai
Run Type: llm
Start Time: 2025-07-20 03:13:27.093930
Inputs: {'messages': [{'role': 'system', 'content': "You a...
Outputs: {'id': 'chatcmpl-BvEmNLHwOGn74o3uHh04E4m2X3dhg', '...
----------------------------------------


#### Filter Queries

We'll use LangSmith's filter query language to check for which runs have a high data-availability based on our metadata tags. This should only return retrieve_documents calls because we set the metadata on that run.

In [104]:
high_available_runs = client.list_runs(
  project_name=project,
  start_time=datetime.now() - timedelta(days=1),
  filter="and(eq(metadata_key, 'data-availability'), eq(metadata_value, 'high'))",
  error=False,
)

print_trunc_runs(high_available_runs)

Run ID: 5d0a7398-b56a-4a8a-8cb0-857f64946e81
Name: retrieve_documents
Run Type: chain
Start Time: 2025-07-20 03:13:26.725430
Inputs: {'question': 'How do I add metadata to my trace?'}...
Outputs: {'output': [{'metadata': {'id': '3272b2f7-e6b7-462...
----------------------------------------
Run ID: 11d54205-6a4e-462f-a3d7-d43bd63b1b16
Name: retrieve_documents
Run Type: chain
Start Time: 2025-07-20 01:22:11.839627
Inputs: {'question': 'How do I add metadata to my trace?'}...
Outputs: {'output': [{'metadata': {'id': '3272b2f7-e6b7-462...
----------------------------------------
Run ID: 2ae1fe9a-e1a5-407c-a4dc-01260de27777
Name: retrieve_documents
Run Type: chain
Start Time: 2025-07-20 01:21:48.966818
Inputs: {'question': 'How do I add metadata to my trace?'}...
Outputs: {'output': [{'metadata': {'id': '3272b2f7-e6b7-462...
----------------------------------------


We can also filter based on tags to check which version we're using.

In [105]:
retriever_v1_runs = client.list_runs(
  project_name=project,
  start_time=datetime.now() - timedelta(days=1),
  filter="has(tags, 'retriever-1.0')",
  error=False,
)

print_trunc_runs(retriever_v1_runs)

Run ID: 1761b20c-c2f7-44b7-b84f-7f18b0215e6f
Name: VectorStoreRetriever
Run Type: retriever
Start Time: 2025-07-20 03:13:29.654445
Inputs: {'query': 'Nocturnal animals eat what?'}...
Outputs: {'documents': [{'metadata': {'id': '6e3d78fa-b4b6-...
----------------------------------------
Run ID: cc19544a-1ccc-46d0-b3e0-c8c69a5d42c3
Name: retrieve_documents
Run Type: chain
Start Time: 2025-07-20 03:13:29.652924
Inputs: {'question': 'Nocturnal animals eat what?'}...
Outputs: {'output': [{'metadata': {'id': '6e3d78fa-b4b6-4a1...
----------------------------------------
Run ID: 2c175338-4a06-4adb-a7b2-3e60ef0e33a2
Name: VectorStoreRetriever
Run Type: retriever
Start Time: 2025-07-20 03:13:26.726982
Inputs: {'query': 'How do I add metadata to my trace?'}...
Outputs: {'documents': [{'metadata': {'id': '3272b2f7-e6b7-...
----------------------------------------


We can easily combine complex criteria using filter queries.

In [107]:
slow_retriever_runs = client.list_runs(
  project_name=project,
  filter="and(has(tags, 'retriever-1.0'), gt(latency, 0.2), search('Nocturnal'))",
  error=False,
)

print_trunc_runs(slow_retriever_runs)

Run ID: 1761b20c-c2f7-44b7-b84f-7f18b0215e6f
Name: VectorStoreRetriever
Run Type: retriever
Start Time: 2025-07-20 03:13:29.654445
Inputs: {'query': 'Nocturnal animals eat what?'}...
Outputs: {'documents': [{'metadata': {'id': '6e3d78fa-b4b6-...
----------------------------------------
Run ID: cc19544a-1ccc-46d0-b3e0-c8c69a5d42c3
Name: retrieve_documents
Run Type: chain
Start Time: 2025-07-20 03:13:29.652924
Inputs: {'question': 'Nocturnal animals eat what?'}...
Outputs: {'output': [{'metadata': {'id': '6e3d78fa-b4b6-4a1...
----------------------------------------
Run ID: 0f08e134-b69e-4226-bf11-b381c6b838d6
Name: VectorStoreRetriever
Run Type: retriever
Start Time: 2025-07-20 01:21:46.314481
Inputs: {'query': 'Nocturnal animals eat what?'}...
Outputs: {'documents': [{'metadata': {'id': '6e3d78fa-b4b6-...
----------------------------------------


For more complex queries, we can use tree and trace filters. 
tree filters are conditions that are fulfilled if they're met by any run within the trace tree (children, siblings, etc.).
trace filters are conditions that are fulfilled if they're met by the root run of the trace tree (parent).

For example, let's query all runs named "call_openai" whose root run has a latency less than 2 seconds and whose trace contains a run with "high" "data-availability":

In [111]:
fast_high_availability_llm_calls = client.list_runs(
    project_name=project,
    filter='eq(name, "call_openai")',
    trace_filter='lt(latency, 2)',
    tree_filter="and(eq(metadata_key, 'data-availability'), eq(metadata_value, 'high'))"
)
print_trunc_runs(fast_high_availability_llm_calls)

Run ID: f518f8cc-5a6d-4ac6-a80f-8c5dfc3ef6f9
Name: call_openai
Run Type: llm
Start Time: 2025-07-20 03:13:27.093930
Inputs: {'messages': [{'role': 'system', 'content': "You a...
Outputs: {'id': 'chatcmpl-BvEmNLHwOGn74o3uHh04E4m2X3dhg', '...
----------------------------------------
Run ID: 6724b322-df70-4646-b64c-76d2e2904195
Name: call_openai
Run Type: chain
Start Time: 2025-07-20 01:22:12.063047
Inputs: {'messages': [{'role': 'system', 'content': "You a...
Outputs: {'id': 'chatcmpl-BvD2iUuqR4OKgi0klbjgAPOqI9AMv', '...
----------------------------------------
Run ID: 6a018638-32f8-451a-a2d6-c749acea02cb
Name: call_openai
Run Type: chain
Start Time: 2025-07-20 01:21:49.257544
Inputs: {'messages': [{'role': 'system', 'content': "You a...
Outputs: {'id': 'chatcmpl-BvD2LotIZZpslCNCAOavMQRCmLbCi', '...
----------------------------------------
