# LangSmith in the LangChain 1.0 Era: A Beginner's Guide

## Introduction

What exactly is LangSmith? How does it help you? And how has it evolved to support the new LangChain 1.0 and LangGraph 1.0 ecosystem?

This notebook will explain everything in simple terms, with clear examples and analogies.

### Think of it this way:

- **LangChain/LangGraph** = Your car (the application you build)
- **LangSmith** = Your car's dashboard + mechanic shop + test track

Just like a car dashboard shows you speed, fuel, and engine status, LangSmith shows you what's happening inside your LLM application. And just like a mechanic shop helps you fix problems, LangSmith helps you debug and improve your agents.

Let's dive in!

---

## Part 1: What is LangSmith?

### The Simple Answer

**LangSmith is a platform that helps you build better LLM applications by showing you what's happening, testing if it works, and making it easy to deploy.**

### The Three Pillars of LangSmith

LangSmith has three main superpowers:

1. **üëÅÔ∏è Observability ("See what's happening")** - Tracing and monitoring
2. **‚úÖ Evaluation ("Test if it works")** - Quality assurance and testing
3. **üöÄ Deployment ("Ship it to production")** - Hosting and scaling

Let's explore each one.

---

## Part 2: Pillar 1 - Observability (See What's Happening)

### The Problem

Imagine you built an agent that:
1. Searches the web
2. Reads a document
3. Generates an answer

But sometimes it gives wrong answers. **Why?**

- Is the search returning bad results?
- Is the LLM misunderstanding the document?
- Is the prompt confusing?

**Without LangSmith, you're debugging blind.** üôà

### The Solution: Tracing

LangSmith **records every single step** your agent takes. This is called **tracing**.

Think of it like a flight recorder (black box) in an airplane - it records everything so you can understand what happened.

### What Gets Traced?

LangSmith records:
- üìù **Every prompt** sent to the LLM
- üí¨ **Every response** from the LLM
- üîß **Every tool call** (search, database query, etc.)
- üìä **Results** from each tool
- ‚è±Ô∏è **How long** each step took
- üí∞ **How much it cost** (tokens used)
- ‚ùå **Any errors** that occurred

### Example: Simple Agent Trace

Let's see what a trace looks like:

In [None]:
# Setup: Enable LangSmith tracing
import os

# Set these environment variables to enable tracing
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-api-key-here"  # Get from smith.langchain.com
os.environ["LANGSMITH_PROJECT"] = "my-first-project"  # Optional: organize traces

# That's it! Now all your LangChain code will automatically trace to LangSmith

In [None]:
# Example: Simple agent that searches and answers
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool

# Define a simple search tool
@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    # Simulated search
    return f"Search results for: {query}\nFound: LangChain is a framework for LLM apps."

# Create an agent
model = init_chat_model("gpt-4o")
agent = create_agent(model, tools=[search_web])

# Run the agent - THIS WILL AUTOMATICALLY TRACE TO LANGSMITH!
result = agent.invoke({
    "messages": [{"role": "user", "content": "What is LangChain?"}]
})

print(result["messages"][-1].content)

**What happens in LangSmith:**

When you run this code, LangSmith creates a **trace** that shows:

```
Run: Agent Execution
  ‚îú‚îÄ Input: "What is LangChain?"
  ‚îú‚îÄ LLM Call #1
  ‚îÇ   ‚îú‚îÄ Prompt: "You are a helpful assistant... What is LangChain?"
  ‚îÇ   ‚îú‚îÄ Response: Tool call to search_web("LangChain")
  ‚îÇ   ‚îú‚îÄ Tokens: 150
  ‚îÇ   ‚îî‚îÄ Duration: 1.2s
  ‚îú‚îÄ Tool Call: search_web
  ‚îÇ   ‚îú‚îÄ Input: "LangChain"
  ‚îÇ   ‚îú‚îÄ Output: "LangChain is a framework..."
  ‚îÇ   ‚îî‚îÄ Duration: 0.3s
  ‚îú‚îÄ LLM Call #2
  ‚îÇ   ‚îú‚îÄ Prompt: "You are a helpful assistant... [search results]"
  ‚îÇ   ‚îú‚îÄ Response: "LangChain is a framework for building LLM applications..."
  ‚îÇ   ‚îú‚îÄ Tokens: 200
  ‚îÇ   ‚îî‚îÄ Duration: 1.5s
  ‚îî‚îÄ Output: Final answer
  
Total Duration: 3.0s
Total Cost: $0.002
```

**You can see EVERYTHING!** This is incredibly powerful for debugging.

### Advanced: Selective Tracing

Sometimes you don't want to trace EVERYTHING (especially in production). LangSmith lets you be selective:

In [None]:
from langchain.tracing import tracing_context

# Only trace specific calls
with tracing_context(enabled=True, project_name="production-debug"):
    result = agent.invoke({"messages": [{"role": "user", "content": "Test query"}]})
    # This will be traced

# This won't be traced
result = agent.invoke({"messages": [{"role": "user", "content": "Another query"}]})

### Metadata and Tags

You can add custom information to traces:

In [None]:
# Add metadata to help filter and analyze traces
result = agent.invoke(
    {"messages": [{"role": "user", "content": "What is LangChain?"}]},
    config={
        "metadata": {
            "user_id": "user123",
            "session_id": "session456",
            "environment": "production"
        },
        "tags": ["customer-support", "premium-user"]
    }
)

# Now you can filter traces by user, session, or tag in LangSmith!

### Key Benefits of Tracing

1. **üêõ Debug faster** - See exactly where things go wrong
2. **‚ö° Optimize performance** - Find slow steps
3. **üí∞ Control costs** - See which LLM calls are expensive
4. **üìä Understand behavior** - See how your agent makes decisions
5. **üîç Monitor production** - Catch issues before users complain

---

## Part 3: LangSmith Studio (now the LangChain Team is proposing Agent Builder as a better alternative)

### What is Studio?

**LangSmith Studio is a FREE visual interface for developing and testing your agents locally.**

Think of it as:
- A **playground** where you can test your agent
- A **debugger** that shows you every step in real-time
- A **time machine** that lets you replay conversations

### Key Features

1. **üëÅÔ∏è Real-time visualization** - See your agent's execution as it happens
2. **üîÑ Hot reloading** - Change your code and see updates instantly
3. **‚èÆÔ∏è Thread replay** - Re-run conversations from any point
4. **üî¨ Step-by-step inspection** - Examine every prompt, tool call, and response
5. **üìä Metrics display** - See token counts, latency, and costs

### Setup Studio

In [None]:
# 1. Install the LangGraph CLI
# Run in terminal:
# pip install langgraph-cli

# 2. Create a langgraph.json config file
# In your project directory, create: langgraph.json
'''
{
  "dependencies": ["."],
  "graphs": {
    "my_agent": "./my_agent.py:agent"
  },
  "env": ".env"
}
'''

# 3. Run the development server
# In terminal:
# langgraph dev

# 4. Open the UI at http://localhost:8123
# Now you have a visual interface to test your agent!

### Using Studio

Once Studio is running, you can:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  LangSmith Studio                       ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                         ‚îÇ
‚îÇ  Input: "What is LangChain?"           ‚îÇ
‚îÇ                                         ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ
‚îÇ  ‚îÇ Step 1: Agent Decision            ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Prompt: "You are a helpful..."    ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Response: Call search_web tool    ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Tokens: 150 | Time: 1.2s          ‚îÇ ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ
‚îÇ                                         ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ
‚îÇ  ‚îÇ Step 2: Tool Execution            ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Tool: search_web                  ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Input: "LangChain"                ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Output: "LangChain is..."         ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Time: 0.3s                        ‚îÇ ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ
‚îÇ                                         ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ
‚îÇ  ‚îÇ Step 3: Final Answer              ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Prompt: "Based on search..."      ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Response: "LangChain is a..."     ‚îÇ ‚îÇ
‚îÇ  ‚îÇ Tokens: 200 | Time: 1.5s          ‚îÇ ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ
‚îÇ                                         ‚îÇ
‚îÇ  Total Time: 3.0s | Total Cost: $0.002 ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

**You can click on each step to see:**
- The exact prompt sent to the LLM
- The full response
- All metadata
- Any errors

**And you can:**
- Re-run from any step
- Try different inputs
- Compare multiple runs

### Why Studio is Amazing for Beginners

When you're learning, Studio helps you:

1. **Understand how agents work** - See the decision-making process visually
2. **Debug without frustration** - No more `print()` debugging!
3. **Experiment quickly** - Test different prompts and see results immediately
4. **Learn from mistakes** - See exactly what went wrong

**It's like having a teacher show you step-by-step how your code works!**

---

## Part 4: Pillar 2 - Evaluation (Test If It Works)

### The Problem

You built an agent. It works on your test input. **But:**

- Does it work on 100 different inputs?
- Is it better than version 1?
- Will it work for your users?

**Testing manually is slow and unreliable.** You need automated evaluation.

### The Solution: Datasets and Evaluators

LangSmith helps you:
1. Create **test datasets** (collections of inputs and expected outputs)
2. Run your agent on the entire dataset
3. Automatically **evaluate** the quality of responses
4. Track improvements over time

### Creating a Dataset

In [None]:
from langsmith import Client

client = Client()

# Create a dataset of test cases
dataset = client.create_dataset(
    dataset_name="langchain-qa-test",
    description="Test cases for LangChain Q&A agent"
)

# Add examples
examples = [
    {
        "input": {"question": "What is LangChain?"},
        "expected_output": "LangChain is a framework for building LLM applications."
    },
    {
        "input": {"question": "What is LangGraph?"},
        "expected_output": "LangGraph is a library for building stateful, multi-agent applications."
    },
    {
        "input": {"question": "How do I install LangChain?"},
        "expected_output": "You can install LangChain using: pip install langchain"
    },
]

for example in examples:
    client.create_example(
        inputs=example["input"],
        outputs={"answer": example["expected_output"]},
        dataset_id=dataset.id
    )

### Running Evaluations

In [None]:
from langsmith.evaluation import evaluate

# Define how to run your agent
def run_agent(inputs):
    result = agent.invoke({
        "messages": [{"role": "user", "content": inputs["question"]}]
    })
    return {"answer": result["messages"][-1].content}

# Define how to evaluate responses
def evaluate_answer(run, example):
    """Check if the answer is correct."""
    predicted = run.outputs["answer"]
    expected = example.outputs["answer"]
    
    # Simple check: does the answer contain key information?
    score = 1.0 if expected.lower() in predicted.lower() else 0.0
    
    return {"key": "correctness", "score": score}

# Run evaluation on the entire dataset
results = evaluate(
    run_agent,
    data="langchain-qa-test",
    evaluators=[evaluate_answer],
    experiment_prefix="agent-v1"
)

# Results show:
# - How many passed/failed
# - Average score
# - Detailed results for each example

### Advanced: LLM-as-Judge

Instead of writing rules, you can use an LLM to evaluate responses:

In [None]:
from langchain.evaluation import load_evaluator

# Use an LLM to judge if the answer is helpful
evaluator = load_evaluator(
    "labeled_criteria",
    criteria="helpfulness",
    llm=model
)

def llm_evaluate(run, example):
    result = evaluator.evaluate_strings(
        input=example.inputs["question"],
        prediction=run.outputs["answer"],
        reference=example.outputs["answer"]
    )
    return {
        "key": "helpfulness",
        "score": result["score"],
        "comment": result["reasoning"]
    }

# Now the LLM judges if responses are helpful!

### Comparing Versions

LangSmith lets you compare different versions of your agent:

```
Version 1 (old prompt):
  Correctness: 70%
  Helpfulness: 3.5/5
  Avg Response Time: 2.5s

Version 2 (new prompt):
  Correctness: 85%  ‚úÖ Improved!
  Helpfulness: 4.2/5  ‚úÖ Improved!
  Avg Response Time: 2.1s  ‚úÖ Faster!
```

This helps you make **data-driven decisions** about which version is better.

### Key Benefits of Evaluation

1. **üéØ Catch regressions** - Make sure updates don't break things
2. **üìà Track improvements** - See if your changes actually help
3. **üèÜ Compare approaches** - Test different prompts, models, or architectures
4. **‚úÖ Quality assurance** - Ensure consistent quality before deployment
5. **üí° Learn patterns** - Understand what works and what doesn't

---

## Part 5: Pillar 3 - Deployment (Ship It to Production)

### The Problem

Your agent works great on your laptop. Now you need to:
- Make it available 24/7
- Handle many users at once
- Keep it fast and reliable
- Monitor it in production

**Traditional hosting platforms aren't built for stateful agents.** They're designed for simple web apps.

### The Solution: LangSmith Deployments

LangSmith provides **managed hosting specifically designed for LangGraph agents**.

### Why Special Hosting for Agents?

Traditional web hosting:
```
Request ‚Üí Process ‚Üí Response
(Short-lived, stateless)
```

LangGraph agents:
```
Request ‚Üí Think ‚Üí Call Tool ‚Üí Wait ‚Üí Think ‚Üí Call Tool ‚Üí Response
(Long-running, stateful)
```

LangSmith handles:
- **State persistence** - Remember conversation history
- **Background execution** - Agents can take minutes to complete
- **Scaling** - Handle many concurrent users
- **Streaming** - Send partial results as they're generated

### Deploying Your Agent

In [None]:
# Deployment Process (Done through LangSmith UI):

# 1. Push your code to GitHub
# Your repository should have:
# - Your agent code
# - langgraph.json configuration
# - requirements.txt with dependencies

# 2. Connect GitHub to LangSmith
# Go to smith.langchain.com ‚Üí Deployments ‚Üí Connect GitHub

# 3. Select your repository and deploy
# LangSmith will:
# - Clone your repository
# - Install dependencies
# - Build your agent
# - Deploy to infrastructure
# (Takes ~15 minutes)

# 4. Get your deployment URL
# https://your-agent.langsmith.app

### Using Your Deployed Agent

In [None]:
# Option 1: Using the Python SDK
from langgraph_sdk import get_client

client = get_client(url="https://your-agent.langsmith.app")

# Create a thread (conversation)
thread = await client.threads.create()

# Send a message
response = await client.runs.create(
    thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "user", "content": "What is LangChain?"}]}
)

# Stream the response
async for chunk in client.runs.stream(
    thread["thread_id"],
    response["run_id"]
):
    print(chunk)

In [None]:
# Option 2: Using REST API (from any language)
import requests

url = "https://your-agent.langsmith.app/threads"

# Create thread
response = requests.post(url)
thread_id = response.json()["thread_id"]

# Send message
response = requests.post(
    f"{url}/{thread_id}/runs",
    json={
        "assistant_id": "my_agent",
        "input": {
            "messages": [{"role": "user", "content": "What is LangChain?"}]
        }
    }
)

print(response.json())

### Production Monitoring

Once deployed, LangSmith automatically:
- ‚úÖ Traces all production requests
- ‚úÖ Monitors performance and errors
- ‚úÖ Tracks costs
- ‚úÖ Shows usage analytics

You can:
- See which users are having problems
- Find slow or expensive requests
- Set up alerts for errors
- Roll back to previous versions

### Key Benefits of Deployment

1. **üöÄ Fast deployment** - From GitHub to production in 15 minutes
2. **üìà Auto-scaling** - Handles traffic spikes automatically
3. **üí™ Built for agents** - Supports long-running, stateful workflows
4. **üìä Monitoring included** - Traces and metrics out of the box
5. **üîß Easy updates** - Push to GitHub, automatic redeployment

---

## Part 6: How LangSmith Has Evolved

### The Old Days (Pre-2024)

**LangSmith 0.x:**
- Primarily a **tracing tool**
- Manual setup required
- Limited evaluation features
- No deployment capabilities
- Basic UI

**Development workflow:**
```
Code ‚Üí Test locally ‚Üí Deploy to separate platform ‚Üí Hope it works
```

### The New Era (LangChain 1.0 & LangGraph 1.0)

**LangSmith in 2024-2025:**
- **Complete platform** for the full development lifecycle
- Automatic tracing for `create_agent`
- Rich evaluation framework
- Integrated deployment
- Studio for local development
- Production monitoring

**New workflow:**
```
Develop in Studio ‚Üí Evaluate with datasets ‚Üí Deploy with one click ‚Üí Monitor in production
```

All in one platform!

### Key Evolution Milestones

#### 1. Studio Launch (2024)
- **Before:** Test agents by running code repeatedly
- **After:** Visual interface with hot reloading and replay

#### 2. Automatic Tracing for `create_agent`
- **Before:** Manual instrumentation required
- **After:** Just set environment variables, tracing works automatically

#### 3. Evaluation Framework
- **Before:** Write custom test scripts
- **After:** Built-in datasets, evaluators, and comparison tools

#### 4. LangSmith Deployments
- **Before:** Deploy to AWS/GCP/Azure manually
- **After:** Deploy from GitHub in 15 minutes

#### 5. Subgraph Tracing
- **Before:** Only see top-level agent traces
- **After:** See inside nested subgraphs and multi-agent systems

#### 6. Thread Management
- **Before:** Manage state manually
- **After:** Built-in thread and checkpoint management

### Integration with LangChain 1.0

LangSmith is now **deeply integrated** with LangChain 1.0:

```python
# LangChain 0.x
from langchain.callbacks import LangChainTracer
tracer = LangChainTracer()
# ... manual setup ...

# LangChain 1.0
os.environ["LANGSMITH_TRACING"] = "true"
# That's it! Everything traces automatically
```

No more callbacks, no more manual instrumentation!

### Integration with LangGraph 1.0

LangGraph 1.0 is **designed for LangSmith**:

- **Checkpointing** - LangSmith stores conversation state
- **Subgraph visibility** - See every level of multi-agent systems
- **Human-in-the-loop** - Built-in interrupt and resume
- **Studio support** - Visualize complex graphs
- **Deployment optimized** - LangGraph apps deploy seamlessly

```python
# LangGraph automatically works with LangSmith
from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

graph = StateGraph(...)
# ... build graph ...
graph = graph.compile(checkpointer=MemorySaver())

# Stream with subgraph visibility
for chunk in graph.stream(input, subgraphs=True):
    print(chunk)
    # All of this is traced in LangSmith!
```

---

## Part 7: Complete Workflow Example

Let's see how everything comes together in a real project:

### Phase 1: Development (Using Studio)

```bash
# Terminal
langgraph dev
# Opens Studio at http://localhost:8123
```

In Studio:
1. Test different prompts visually
2. See what works and what doesn't
3. Fix bugs by seeing exact execution
4. Hot reload changes instantly

### Phase 2: Evaluation (Creating Test Suite)

In [None]:
# Create test dataset
from langsmith import Client

client = Client()
dataset = client.create_dataset("production-test-suite")

# Add 100 test cases
test_cases = [
    {"question": "What is LangChain?", "expected": "..."},
    {"question": "How do I use tools?", "expected": "..."},
    # ... 98 more ...
]

for case in test_cases:
    client.create_example(
        inputs={"question": case["question"]},
        outputs={"answer": case["expected"]},
        dataset_id=dataset.id
    )

In [None]:
# Run evaluation
from langsmith.evaluation import evaluate

results = evaluate(
    lambda x: agent.invoke({"messages": [{"role": "user", "content": x["question"]}]}),
    data="production-test-suite",
    evaluators=[correctness_evaluator, helpfulness_evaluator],
    experiment_prefix="pre-deployment"
)

print(f"Correctness: {results['correctness']}%")
print(f"Helpfulness: {results['helpfulness']}/5")

# Only deploy if scores are good!

### Phase 3: Deployment

In [None]:
# Push to GitHub
# git push origin main

# In LangSmith UI:
# 1. Go to Deployments
# 2. Select repository
# 3. Click Deploy
# 4. Wait 15 minutes
# 5. Get deployment URL

# Now your agent is live!

### Phase 4: Production Monitoring

In [None]:
# All production requests are automatically traced
# Go to LangSmith UI to see:

# Dashboard:
# - Total requests: 10,000
# - Success rate: 98%
# - Average latency: 2.3s
# - Total cost: $45.67

# Errors:
# - 200 requests failed
# - Click to see traces
# - Identify the problem
# - Fix and redeploy

# User feedback:
# - 95% positive
# - See specific issues
# - Improve based on feedback

### Phase 5: Continuous Improvement

In [None]:
# Use production traces to improve

# 1. Find problematic traces
# LangSmith UI: Filter by "Status: Error" or "Duration > 10s"

# 2. Add them to test dataset
client.create_examples_from_traces(
    trace_ids=["trace-1", "trace-2", "trace-3"],
    dataset_id=dataset.id
)

# 3. Fix the issues
# Update prompts, add error handling, etc.

# 4. Re-evaluate
results = evaluate(..., experiment_prefix="v2")

# 5. Compare versions
# LangSmith UI: Compare v1 vs v2
# See: v2 is 10% better!

# 6. Deploy v2
# git push origin main

# 7. Repeat!

---

## Part 8: LangSmith Pricing & Plans

### Free Tier (Great for Learning!)

**Includes:**
- ‚úÖ Studio (unlimited local development)
- ‚úÖ Tracing (up to 5,000 traces/month)
- ‚úÖ Basic evaluation
- ‚úÖ Small datasets

**Perfect for:**
- Learning LangChain and LangGraph
- Personal projects
- Prototypes
- Small side projects

### Paid Tiers (For Production)

**Developer ($39/month):**
- More traces
- Larger datasets
- Advanced evaluations
- Team features

**Team ($199/month):**
- Unlimited traces
- Deployments included
- Priority support
- SSO and security features

**Enterprise (Custom pricing):**
- Self-hosted option
- Custom SLAs
- Dedicated support
- Advanced compliance

---

## Part 9: Getting Started - Step by Step

### Step 1: Sign Up (Free)

1. Go to [smith.langchain.com](https://smith.langchain.com)
2. Click "Sign Up"
3. Create an account (free!)
4. Get your API key

### Step 2: Enable Tracing

In [None]:
# Add to your code or .env file
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_pt_..."  # From smith.langchain.com
os.environ["LANGSMITH_PROJECT"] = "my-first-project"

# That's it! Now run your LangChain code

### Step 3: Run Some Code

In [None]:
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o")
agent = create_agent(model, tools=[])

# This will be traced!
result = agent.invoke({
    "messages": [{"role": "user", "content": "Hello!"}]
})

print(result["messages"][-1].content)

### Step 4: View Your First Trace

1. Go to [smith.langchain.com](https://smith.langchain.com)
2. Click on your project
3. See your trace!
4. Click to explore

**You'll see:**
- The exact prompt
- The model's response
- How long it took
- How much it cost
- All metadata

### Step 5: Set Up Studio (Optional but Recommended)

In [None]:
# Install CLI
# pip install langgraph-cli

# Create config file: langgraph.json
'''
{
  "dependencies": ["."],
  "graphs": {
    "my_agent": "./agent.py:agent"
  },
  "env": ".env"
}
'''

# Run
# langgraph dev

# Open http://localhost:8123

### Step 6: Experiment!

Now you have everything set up. Try:

1. **Build a simple agent**
2. **Watch traces** in LangSmith
3. **Test in Studio**
4. **Create a test dataset**
5. **Run evaluations**
6. **Improve based on results**

You're learning by doing!

---

## Part 10: Common Questions

### Q: Do I need LangSmith to use LangChain?

**A:** No! LangChain works fine without LangSmith. But LangSmith makes development much easier by showing you what's happening inside your agents.

### Q: Is LangSmith free?

**A:** Yes! The free tier is generous (5,000 traces/month, unlimited Studio use). Perfect for learning and small projects.

### Q: Can I use LangSmith with other frameworks?

**A:** Yes! LangSmith works with any Python code that calls LLMs. You can manually instrument non-LangChain code.

### Q: Does LangSmith store my data?

**A:** Yes, traces are stored in LangSmith. But you can:
- Control what gets traced
- Self-host LangSmith
- Use data retention policies
- LangSmith is SOC 2 Type 2, GDPR, and HIPAA compliant

### Q: How do I debug without LangSmith?

**A:** You can use `print()` statements, but it's much harder. LangSmith shows you the complete picture with nice visualizations.

### Q: Can I deploy without LangSmith Deployments?

**A:** Yes! You can deploy to AWS, GCP, Azure, or any other platform. But LangSmith Deployments makes it much easier.

### Q: How is Studio different from Jupyter notebooks?

**A:** Jupyter is for running code. Studio is specifically for testing agents visually. It shows execution flow, lets you replay threads, and updates when you change code.

### Q: Can I use LangSmith for production monitoring?

**A:** Yes! That's one of its main uses. All production traces are automatically captured, and you can set up alerts, dashboards, and analytics.

---

## Part 11: Best Practices

### For Development

1. **Always use Studio** - See your agent's execution visually
2. **Use descriptive project names** - `my-chatbot-v1` not `test`
3. **Add metadata to traces** - User IDs, session IDs help debugging
4. **Tag your traces** - Makes filtering easier
5. **Review traces regularly** - Learn from your agent's behavior

### For Testing

1. **Start with a small dataset** - 10-20 examples to begin
2. **Add failing cases** - Every bug should become a test
3. **Use multiple evaluators** - Correctness, helpfulness, safety, etc.
4. **Compare versions** - Always A/B test improvements
5. **Automate evaluation** - Run tests before deployment

### For Production

1. **Sample traces in production** - Don't trace 100% (use 10-20%)
2. **Monitor key metrics** - Success rate, latency, cost
3. **Set up alerts** - Get notified of errors
4. **Use production traces for improvement** - Add edge cases to datasets
5. **Version your prompts** - Track what changed when

### For Teams

1. **Use consistent project naming** - `team-name/agent-name/version`
2. **Document evaluators** - Explain what each one checks
3. **Share interesting traces** - LangSmith has sharing links
4. **Review traces together** - Great for learning
5. **Set quality standards** - "Must pass 90% of tests before deploy"

---

## Conclusion

### What We Learned

LangSmith is a **complete platform** for building LLM applications:

1. **üëÅÔ∏è Observability** - See what's happening with automatic tracing
2. **‚úÖ Evaluation** - Test if it works with datasets and evaluators
3. **üöÄ Deployment** - Ship to production in 15 minutes

### How It Has Evolved

**Old:** Just a tracing tool
**Now:** Complete development, testing, and deployment platform

**Old:** Manual setup required
**Now:** Automatic integration with LangChain 1.0 and LangGraph 1.0

**Old:** Limited to observability
**Now:** Studio, evaluation, deployment, monitoring - everything!

### Why Use LangSmith?

**For Learning:**
- See how agents actually work
- Debug problems visually
- Learn from mistakes

**For Building:**
- Develop faster with Studio
- Test thoroughly with evaluation
- Deploy easily with one click

**For Production:**
- Monitor everything
- Catch issues early
- Improve continuously

### Getting Started

1. Sign up at [smith.langchain.com](https://smith.langchain.com) (free!)
2. Set environment variables
3. Run your LangChain code
4. View traces in the UI
5. Install Studio for local development

That's it! You're ready to build better LLM applications with LangSmith.

Happy building! üöÄ

---

## Additional Resources

### Official Documentation
- [LangSmith Documentation](https://docs.langchain.com/langsmith)
- [LangChain Observability Guide](https://docs.langchain.com/oss/python/langchain/observability)
- [LangChain Studio Guide](https://docs.langchain.com/oss/python/langchain/studio)
- [LangSmith Deployment Guide](https://docs.langchain.com/oss/python/langchain/deploy)


Remember: The best way to learn is by doing. Build something, trace it, test it, deploy it!