# 05. Agent Integration with LangGraph

This notebook demonstrates how to integrate TensorZero with LangGraph agents:
- Using TensorZero's OpenAI-compatible endpoint with LangChain
- Creating a ReAct agent with tools  
- Observing agent interactions in TensorZero UI
- Collecting feedback on agent performance

**Key Learning**: TensorZero provides an OpenAI-compatible API endpoint at `/openai/v1`, making it easy to use with any OpenAI-compatible client!

In [12]:
# Setup: Use TensorZero's OpenAI-Compatible Endpoint
import httpx
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from rich.console import Console
from rich.panel import Panel

console = Console()

# Initialize TensorZero chat model using OpenAI-compatible endpoint
# This is much simpler than a custom chat model!
llm = init_chat_model(
    "tensorzero::model_name::openai::gpt-4o-mini",  # Use our agent_chat function through TensorZero
    model_provider="openai",
    base_url="http://localhost:3000/openai/v1",  # TensorZero's OpenAI-compatible endpoint
    api_key="dummy",  # TensorZero ignores the API key
    http_client=httpx.Client()
)

print("✅ TensorZero Chat Model initialized")
print("🔧 Using OpenAI-compatible endpoint: http://localhost:3000/openai/v1")
print("🏷️  Function: agent_chat, Variant: gpt4_mini")

# Test basic chat
try:
    test_response = llm.invoke("Hello! Can you introduce yourself?")
    console.print(Panel(test_response.content, title="🤖 TensorZero Response", border_style="blue"))
except Exception as e:
    print(f"❌ Connection test failed: {e}")
    print("💡 Make sure TensorZero services are running with 'poe up'")

✅ TensorZero Chat Model initialized
🔧 Using OpenAI-compatible endpoint: http://localhost:3000/openai/v1
🏷️  Function: agent_chat, Variant: gpt4_mini


## 1. Define Tools for the Agent

Let's create some Python tools that our agent can use. These complement the TensorZero-configured tools.

In [13]:
# Define Python-based tools for our agent
@tool
def python_calculator(expression: str) -> str:
    """
    Evaluate mathematical expressions using Python's built-in calculator.
    
    Args:
        expression: A mathematical expression like '2 + 2' or 'sqrt(16)'
    """
    try:
        import math
        # Safe evaluation with math functions
        allowed_names = {
            "sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan,
            "log": math.log, "exp": math.exp, "pi": math.pi, "e": math.e,
            "abs": abs, "pow": pow, "min": min, "max": max
        }
        
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return f"Python Calculator: {expression} = {result}"
    except Exception as e:
        return f"Calculator Error: {str(e)}"

@tool
def current_time(timezone: str = "UTC") -> str:
    """
    Get the current time in a specified timezone.
    
    Args:
        timezone: The timezone (UTC, EST, PST, etc.)
    """
    try:
        from datetime import datetime
        import pytz
        
        utc_now = datetime.now(pytz.UTC)
        timezone_map = {
            "UTC": "UTC", "EST": "US/Eastern", "PST": "US/Pacific",
            "CST": "US/Central", "MST": "US/Mountain"
        }
        
        tz_name = timezone_map.get(timezone.upper(), timezone)
        tz = pytz.timezone(tz_name)
        local_time = utc_now.astimezone(tz)
        return f"Current time in {timezone}: {local_time.strftime('%Y-%m-%d %H:%M:%S %Z')}"
    except Exception as e:
        return f"Time Error: {str(e)}"

@tool
def text_analyzer(text: str) -> str:
    """
    Analyze text properties including word count and basic sentiment.
    
    Args:
        text: The text to analyze
    """
    try:
        words = text.split()
        word_count = len(words)
        char_count = len(text)
        
        # Simple sentiment analysis
        positive_words = ["good", "great", "excellent", "awesome", "fantastic", "wonderful"]
        negative_words = ["bad", "terrible", "awful", "horrible", "worst", "hate"]
        
        positive_count = sum(1 for word in words if word.lower() in positive_words)
        negative_count = sum(1 for word in words if word.lower() in negative_words)
        
        sentiment = "neutral"
        if positive_count > negative_count:
            sentiment = "positive"
        elif negative_count > positive_count:
            sentiment = "negative"
        
        return f"""Text Analysis:
- Words: {word_count}
- Characters: {char_count}  
- Sentiment: {sentiment}
- Positive indicators: {positive_count}
- Negative indicators: {negative_count}"""
        
    except Exception as e:
        return f"Analysis Error: {str(e)}"

# List of tools for our agent
tools = [python_calculator, current_time, text_analyzer]

print("🔧 Python Tools Created:")
for tool_func in tools:
    print(f"  • {tool_func.name}: {tool_func.description.split('.')[0]}")
    
# Test a tool
test_result = python_calculator.invoke({"expression": "sqrt(144) + 5"})
console.print(Panel(test_result, title="🧪 Tool Test", border_style="green"))

🔧 Python Tools Created:
  • python_calculator: Evaluate mathematical expressions using Python's built-in calculator
  • current_time: Get the current time in a specified timezone
  • text_analyzer: Analyze text properties including word count and basic sentiment


## 2. Create the LangGraph Agent

Now we'll create a ReAct agent using LangGraph that uses TensorZero as the backend.

In [15]:
# Create a ReAct agent using TensorZero + LangGraph
try:
    # Create the agent - LangGraph handles all the ReAct logic!
    agent = create_react_agent(
        llm,  # Our TensorZero chat model
        tools,  # Our Python tools
        prompt="""You are a helpful assistant powered by TensorZero. 

Available tools:
- python_calculator: Advanced mathematical calculations with math functions
- current_time: Get current time in different timezones  
- text_analyzer: Analyze text properties and sentiment

Important: TensorZero may also provide additional tools like calculator, get_weather, and search_tensorzero_docs through its configuration. Use the most appropriate tool for each task."""
    )
    
    print("✅ ReAct Agent created successfully!")
    print("🤖 Backend: TensorZero Gateway")
    print("🔧 Tools: Python tools + TensorZero configured tools")
    print("⚡ Ready for tool calling and reasoning!")
    
except Exception as e:
    print(f"❌ Agent creation failed: {e}")
    import traceback
    traceback.print_exc()

✅ ReAct Agent created successfully!
🤖 Backend: TensorZero Gateway
🔧 Tools: Python tools + TensorZero configured tools
⚡ Ready for tool calling and reasoning!


## 3. Test the Agent

Let's test our agent with various tasks that require tool usage.

In [16]:
# Test function to run agent and display results nicely
def test_agent(query: str, max_iterations: int = 10):
    """Run the agent with a query and display results."""
    console.print(f"\n[bold blue]🔵 User:[/bold blue] {query}")
    console.print("[dim]" + "="*60 + "[/dim]")
    
    try:
        # Run the agent
        messages = [{"role": "user", "content": query}]
        result = agent.invoke({"messages": messages})
        
        if result and "messages" in result:
            for i, message in enumerate(result["messages"]):
                if hasattr(message, 'content') and message.content:
                    # Determine message type
                    if hasattr(message, 'type'):
                        msg_type = message.type
                    else:
                        msg_type = type(message).__name__.lower().replace('message', '')
                    
                    if msg_type == 'human':
                        console.print(f"[green]👤 Human:[/green] {message.content}")
                    elif msg_type == 'ai':
                        console.print(f"[blue]🤖 Agent:[/blue] {message.content}")
                        
                        # Check for tool calls
                        if hasattr(message, 'tool_calls') and message.tool_calls:
                            console.print("[cyan]🔧 Tool Calls:[/cyan]")
                            for tool_call in message.tool_calls:
                                console.print(f"  • {tool_call['name']}: {tool_call['args']}")
                    elif msg_type == 'tool':
                        console.print(f"[yellow]⚙️  Tool Result:[/yellow] {message.content}")
        else:
            console.print("[red]❌ No response received[/red]")
        
        return result
        
    except Exception as e:
        console.print(f"[bold red]❌ Error:[/bold red] {e}")
        import traceback
        traceback.print_exc()
        return None

# Test 1: Math calculation
print("🧮 Test 1: Mathematical Calculation")
result1 = test_agent("What is 234 multiplied by 567? Please calculate this for me.")

# Test 2: Time query  
print("\n🕐 Test 2: Time Query")
result2 = test_agent("What time is it in Tokyo right now?")

# Test 3: Text analysis
print("\n📊 Test 3: Text Analysis") 
result3 = test_agent("Can you analyze this text: 'This is an amazing product, I love it!'")

# Test 4: Multi-step task
print("\n🔄 Test 4: Multi-Step Task")
result4 = test_agent("Calculate the square root of 144, then tell me what time it is in EST.")

🧮 Test 1: Mathematical Calculation



🕐 Test 2: Time Query



📊 Test 3: Text Analysis



🔄 Test 4: Multi-Step Task


## 4. Agent Observability with TensorZero

All agent interactions are automatically tracked by TensorZero. Let's explore the observability features.

In [17]:
# Simulate collecting feedback on agent performance
from tensorzero import TensorZeroGateway

# We need TensorZero client for feedback collection
tz_client = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")

print("📊 Agent Observability Features")
print("=" * 40)

# In a real application, you would collect these inference IDs from the agent responses
# For this demo, we'll simulate the feedback collection process

def collect_agent_feedback(description: str, rating: float, helpful: bool, comment: str):
    """Simulate feedback collection."""
    try:
        # In a real scenario, you'd link this to actual inference IDs
        # For demo purposes, we'll just show the feedback collection process
        
        print(f"✅ Feedback simulated: {description}")
        print(f"   Rating: {rating}/1.0")
        print(f"   Helpful: {helpful}")
        print(f"   Comment: {comment}")
        print(f"   Status: Would be submitted to TensorZero")
        
    except Exception as e:
        print(f"❌ Feedback collection failed: {e}")

# Simulate feedback for our test cases
feedback_examples = [
    {
        "description": "Math calculation test",
        "rating": 0.95,
        "helpful": True,
        "comment": "Agent correctly used tools for mathematical calculations"
    },
    {
        "description": "Time query test", 
        "rating": 0.9,
        "helpful": True,
        "comment": "Provided accurate time information for requested timezone"
    },
    {
        "description": "Text analysis test",
        "rating": 0.85,
        "helpful": True,
        "comment": "Good text analysis with sentiment and basic statistics"
    },
    {
        "description": "Multi-step task test",
        "rating": 0.92,
        "helpful": True,
        "comment": "Successfully handled multiple tool calls in sequence"
    }
]

for feedback in feedback_examples:
    collect_agent_feedback(**feedback)

print(f"\n🌐 TensorZero UI: http://localhost:4000")
print("📈 In the TensorZero UI you can:")
print("   • View all agent conversations and tool calls")
print("   • Monitor performance metrics and costs")
print("   • Analyze tool usage patterns")
print("   • A/B test different agent configurations")
print("   • Set up alerts for performance issues")

print(f"\n🔍 Key Benefits of TensorZero for Agents:")
print("   ✅ Automatic observability - no custom logging needed")
print("   ✅ Multi-provider support - easy to switch models")
print("   ✅ Built-in experimentation - A/B testing made simple")
print("   ✅ Cost tracking - monitor LLM usage costs")
print("   ✅ Tool call tracing - see exactly how agents use tools")

📊 Agent Observability Features
✅ Feedback simulated: Math calculation test
   Rating: 0.95/1.0
   Helpful: True
   Comment: Agent correctly used tools for mathematical calculations
   Status: Would be submitted to TensorZero
✅ Feedback simulated: Time query test
   Rating: 0.9/1.0
   Helpful: True
   Comment: Provided accurate time information for requested timezone
   Status: Would be submitted to TensorZero
✅ Feedback simulated: Text analysis test
   Rating: 0.85/1.0
   Helpful: True
   Comment: Good text analysis with sentiment and basic statistics
   Status: Would be submitted to TensorZero
✅ Feedback simulated: Multi-step task test
   Rating: 0.92/1.0
   Helpful: True
   Comment: Successfully handled multiple tool calls in sequence
   Status: Would be submitted to TensorZero

🌐 TensorZero UI: http://localhost:4000
📈 In the TensorZero UI you can:
   • View all agent conversations and tool calls
   • Monitor performance metrics and costs
   • Analyze tool usage patterns
   • A/B tes

## 5. Performance Analysis

Let's create a simple performance analysis of our agent.

In [18]:
import time
import pandas as pd
from typing import List, Dict, Any

def benchmark_agent(test_cases: List[Dict[str, Any]]) -> pd.DataFrame:
    """Simple benchmark of our TensorZero agent."""
    results = []
    
    console.print(f"🏃 [bold]Running Agent Benchmark[/bold] ({len(test_cases)} test cases)")
    console.print("=" * 50)
    
    for i, test_case in enumerate(test_cases, 1):
        query = test_case["query"]
        category = test_case["category"]
        expected_tools = test_case.get("expected_tools", [])
        
        console.print(f"\n🧪 [cyan]Test {i}/{len(test_cases)}: {category}[/cyan]")
        console.print(f"Query: {query[:60]}...")
        
        start_time = time.time()
        
        try:
            # Run the agent
            messages = [{"role": "user", "content": query}]
            result = agent.invoke({"messages": messages})
            
            end_time = time.time()
            duration = round(end_time - start_time, 2)
            
            # Count tool calls in the response
            tool_count = 0
            response_length = 0
            
            if result and "messages" in result:
                for message in result["messages"]:
                    if hasattr(message, 'tool_calls') and message.tool_calls:
                        tool_count += len(message.tool_calls)
                    if hasattr(message, 'content') and message.content:
                        response_length += len(str(message.content))
            
            results.append({
                "test_case": i,
                "category": category,
                "query": query[:50] + "..." if len(query) > 50 else query,
                "success": True,
                "duration_seconds": duration,
                "tool_calls": tool_count,
                "response_length": response_length
            })
            
            console.print(f"✅ [green]Success:[/green] {duration}s, {tool_count} tools, {response_length} chars")
            
        except Exception as e:
            end_time = time.time()
            duration = round(end_time - start_time, 2)
            
            results.append({
                "test_case": i,
                "category": category,
                "query": query[:50] + "..." if len(query) > 50 else query,
                "success": False,
                "duration_seconds": duration,
                "tool_calls": 0,
                "response_length": 0,
                "error": str(e)[:50]
            })
            
            console.print(f"❌ [red]Failed:[/red] {duration}s, Error: {str(e)[:50]}...")
    
    return pd.DataFrame(results)

# Define our test cases
test_cases = [
    {
        "category": "Math",
        "query": "What is the square root of 256?",
        "expected_tools": ["python_calculator"]
    },
    {
        "category": "Time",
        "query": "What time is it in PST?",
        "expected_tools": ["current_time"]
    },
    {
        "category": "Analysis",
        "query": "Analyze this text: 'The weather is absolutely wonderful today!'",
        "expected_tools": ["text_analyzer"]
    },
    {
        "category": "Multi-step",
        "query": "Calculate 15 * 23, then analyze the sentiment of 'great result'",
        "expected_tools": ["python_calculator", "text_analyzer"]
    },
    {
        "category": "Conversational",
        "query": "Tell me about the benefits of using TensorZero for LLM applications",
        "expected_tools": []  # No tools expected for this
    }
]

# Run the benchmark
benchmark_results = benchmark_agent(test_cases)

# Display results summary
console.print("\n📊 [bold]Benchmark Results Summary[/bold]")
console.print("=" * 40)

# Calculate summary statistics
if not benchmark_results.empty:
    summary = benchmark_results.groupby('category').agg({
        'success': ['count', 'sum'],
        'duration_seconds': ['mean', 'std'],
        'tool_calls': 'mean',
        'response_length': 'mean'
    }).round(2)
    
    print(summary)
    
    # Overall statistics
    total_tests = len(benchmark_results)
    successful_tests = benchmark_results['success'].sum()
    avg_duration = benchmark_results['duration_seconds'].mean()
    avg_tools = benchmark_results['tool_calls'].mean()
    
    console.print(f"\n📈 [bold]Overall Performance:[/bold]")
    console.print(f"   • Success Rate: {successful_tests}/{total_tests} ({100*successful_tests/total_tests:.1f}%)")
    console.print(f"   • Average Duration: {avg_duration:.2f} seconds")
    console.print(f"   • Average Tool Calls: {avg_tools:.1f}")
    
else:
    console.print("[red]No benchmark results to display[/red]")

               success     duration_seconds     tool_calls response_length
                 count sum             mean std       mean            mean
category                                                                  
Analysis             1   1             2.26 NaN        1.0           436.0
Conversational       1   1             5.74 NaN        0.0          1769.0
Math                 1   1             1.87 NaN        1.0            95.0
Multi-step           1   1             3.27 NaN        2.0           439.0
Time                 1   1             2.04 NaN        1.0           121.0


## 6. Key Insights and Best Practices

Summary of what we learned about building agents with TensorZero.

console.print(Panel("""
## 🎯 TensorZero Agent Integration - Key Learnings

### ✅ What Works Great:
• **OpenAI-Compatible Endpoint**: Use `/openai/v1` - no custom wrappers needed!
• **LangChain Integration**: `init_chat_model()` works perfectly with TensorZero
• **LangGraph Agents**: `create_react_agent()` works out-of-the-box
• **Tool Calling**: Both Python tools and TensorZero-configured tools work together
• **Automatic Observability**: All interactions tracked without extra code

### 🛠️ Architecture Patterns:
• **Model Setup**: `init_chat_model("function/variant", base_url="http://localhost:3000/openai/v1")`
• **Tool Definition**: Use `@tool` decorator for Python tools + TensorZero config for others
• **Agent Creation**: Standard LangGraph patterns work unchanged
• **Observability**: Leverage TensorZero's built-in tracking and UI

### ⚡ Performance Benefits:
• **Multi-Provider**: Easy switching between OpenAI, Anthropic, xAI, etc.
• **Cost Optimization**: Automatic routing and cost tracking
• **Experimentation**: A/B testing different models/prompts
• **Production Ready**: <1ms latency overhead, enterprise observability

### 🔧 Best Practices:
1. **Use TensorZero's OpenAI endpoint** - simplest integration path
2. **Configure tools in tensorzero.toml** for TensorZero-managed tools
3. **Define Python tools locally** for custom logic
4. **Leverage built-in observability** instead of custom logging
5. **Use variants** for easy model switching and A/B testing

### 🚀 Production Considerations:
• **Error Handling**: Robust fallback mechanisms for tool failures
• **Rate Limiting**: Manage API costs through TensorZero configuration
• **Security**: Validate tool inputs and sanitize outputs
• **Monitoring**: Set up alerts using TensorZero's observability features
• **Scaling**: Use TensorZero's load balancing and caching

---

**🌐 Next Steps**: 
• Explore TensorZero UI at http://localhost:4000
• Try different model variants in tensorzero.toml
• Set up feedback collection and optimization
• Build multi-agent systems using the same patterns
""", title="📚 Complete Guide to TensorZero Agents", border_style="green"))