# Evaluate and Trace LangGraph MCP Tool Selection

This notebook demonstrates how to use TruLens to trace and evaluate LangGraph applications that use Model Context Protocol (MCP) tools.

## Overview

We'll build a health research agent that uses MCP tools to query:
- PubMed for medical literature
- Clinical trials databases for trial information

TruLens will automatically trace all tool calls, showing:
- Which MCP tools are being called
- Input arguments and outputs
- Execution time and errors
- Full conversation flow in the dashboard

## Setup

First, let's configure our API keys and MCP server connection:


In [None]:
import os

# Configure API keys and MCP server connection
os.environ["OPENAI_API_KEY"] = "..."
os.environ["SNOWFLAKE_PAT"] = "ey..."
os.environ["SNOWFLAKE_MCP_SERVER_URL"] = (
    "https://<account-id>.snowflakecomputing.com/api/v2/databases/health_db/schemas/mcp/mcp-servers/health_mcp_server"
)

## Create MCP Client and Get Tools

We'll use the `MultiServerMCPClient` from `langchain_mcp_adapters` to connect to the health research MCP server and retrieve available tools.

This step assumes you have already created an MCP Server in snowflake to use. If you need help doing that, you can use this [quickstart](https://quickstarts.snowflake.com/guide/getting-started-with-snowflake-mcp-server/).


In [None]:
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langgraph.graph import START
from langgraph.graph import MessagesState
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition

client = MultiServerMCPClient({
    "health_research": {
        "transport": "streamable_http",
        "url": os.environ["SNOWFLAKE_MCP_SERVER_URL"],
        "headers": {"Authorization": f"Bearer {os.environ['SNOWFLAKE_PAT']}"},
    }
})
tools = await client.get_tools()
model = ChatOpenAI(model="gpt-4o")

## Build the LangGraph Agent

Now we'll create a LangGraph application with:
1. **call_model** node - The LLM that decides which tools to use
2. **tools** node - Executes the selected MCP tools
3. **tools_condition** - Routes between the model and tools

The graph will loop between the model and tools until the agent has enough information to answer the question.


In [None]:
# Define the call_model function
def call_model(state: MessagesState):
    response = model.bind_tools(tools).invoke(state["messages"])
    return {"messages": response}


# Create the StateGraph
builder = StateGraph(MessagesState)
builder.add_node(call_model)
builder.add_node(ToolNode(tools))
builder.add_edge(START, "call_model")
builder.add_conditional_edges(
    "call_model",
    tools_condition,
)
builder.add_edge("tools", "call_model")
graph = builder.compile()

## Initialize TruLens Session

Set up TruLens to track and store traces. We'll reset the database to start fresh.


In [None]:
from trulens.core import TruSession

session = TruSession()

session.reset_database()

## Create Tool Selection Evaluations

In [None]:
from trulens.core import Feedback
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI as OpenAIProvider

provider = OpenAIProvider(model_engine="gpt-4.1")

f_tool_selection = Feedback(provider.tool_selection_with_cot_reasons).on({
    "trace": Selector(trace_level=True),
})

f_tool_calling = Feedback(provider.tool_calling_with_cot_reasons).on({
    "trace": Selector(trace_level=True),
})

## Record Agent Execution with TruLens

Wrap the LangGraph application with `TruGraph` to automatically instrument and trace all executions.

TruLens will capture:
- Each node execution in the graph
- MCP tool calls with their names (e.g., `pubmed_search`, `clinical_trials_search`)
- Input/output states at each step
- LLM generation calls
- Tool routing decisions

The trace will show the complete flow of the agent's reasoning and tool usage.


In [None]:
from trulens.apps.langgraph import TruGraph

tru_recorder = TruGraph(
    app=graph,
    app_name="health_research",
    app_version="base",
    feedbacks=[f_tool_selection, f_tool_calling],
)

with tru_recorder:
    response = await graph.ainvoke({
        "messages": "How do semaglutide and tirzepatide compare in published studies, and what head-to-head clinical trials are recruiting patients?"
    })

## View the Agent Response

Let's see what the agent found:


In [None]:
print("Agent Response:")
print("=" * 50)
print(response["messages"][-1].content)

## Launch TruLens Dashboard

Open the TruLens dashboard to explore the traces interactively.

In the dashboard you'll see:
- **Tree View**: Visual representation of the agent's execution flow showing:
  - Graph nodes (call_model, tools_condition)
  - MCP tool calls with their actual names (pubmed_search, clinical_trials_search)
  - LLM generation calls
- **Timeline View**: Chronological execution timeline
- **MCP Tool Details**: For each tool call, you can see:
  - `mcp.tool_name`: The specific tool that was called
  - `mcp.input_arguments`: Arguments passed to the tool
  - `mcp.output_content`: Results returned by the tool
  - Execution time and any errors

This provides complete observability into how your agent uses MCP tools to answer questions!


In [None]:
from trulens.dashboard import run_dashboard

run_dashboard()