<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/mcp/langchain-agent-with-vectara-mcp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Self-Correcting Research Agent with Vectara MCP

This notebook demonstrates a **self-correcting research agent** that automatically detects and fixes its own hallucinations using Vectara's MCP tools:

- **HHEM** (Hughes Hallucination Evaluation Model) via `eval_factual_consistency` - evaluates factual accuracy
- **VHC** (Vectara Hallucination Corrector) via `correct_hallucinations` - fixes hallucinated content

The agent combines multiple tools (Unreliable Knowledge Base, Calculator) with HHEM/VHC to:
1. Research topics using a sparse knowledge base
2. Draft responses based on sources
3. **Automatically verify** its responses using HHEM
4. **Auto-correct** if the factual consistency score is low

## Prerequisites

1. Install packages: `pip install vectara-mcp langchain-mcp-adapters langgraph langchain-openai python-dotenv`
2. Get a Vectara API key from [console.vectara.com](https://console.vectara.com)
3. Get an OpenAI API key

## Installation

In [1]:
#!pip install --quiet langchain-mcp-adapters langgraph langchain-openai langchain-community vectara-mcp wikipedia python-dotenv

## Environment Setup

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

# Set your API keys here or via environment variables
os.environ["VECTARA_API_KEY"] = os.getenv("VECTARA_API_KEY", "<YOUR_VECTARA_API_KEY>")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "<YOUR_OPENAI_API_KEY>")

if os.getenv("VECTARA_API_KEY", "").startswith("<"):
    raise EnvironmentError("Please set VECTARA_API_KEY")
if os.getenv("OPENAI_API_KEY", "").startswith("<"):
    raise EnvironmentError("Please set OPENAI_API_KEY")

print("Environment configured successfully")

Environment configured successfully


## Part 1: Connect to Vectara MCP Server

Connect to the Vectara MCP server to access HHEM and VHC tools for detecting and correcting hallucinations.

In [3]:
from langchain_openai import ChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
import sys

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Configure the Vectara MCP server connection
mcp_client = MultiServerMCPClient(
    {
        "vectara": {
            "command": sys.executable,
            "args": ["-m", "vectara_mcp", "--transport", "stdio"],
            "transport": "stdio",
            "env": {
                "VECTARA_API_KEY": os.environ["VECTARA_API_KEY"]
            }
        }
    }
)

In [4]:
# Load tools from the MCP server
tools = await mcp_client.get_tools()

# Initialize Vectara API key directly via MCP
setup_tool = next(t for t in tools if t.name == "setup_vectara_api_key")
await setup_tool.ainvoke({"api_key": os.environ["VECTARA_API_KEY"]})
print("Vectara API key initialized")

print("\nAvailable MCP tools:")
for tool in tools:
    print(f"  - {tool.name}")

Vectara API key initialized

Available MCP tools:
  - setup_vectara_api_key
  - clear_vectara_api_key
  - ask_vectara
  - search_vectara
  - correct_hallucinations
  - eval_factual_consistency


## Part 2: Add Research Tools

Beyond the Vectara MCP tools, we add:
- **Calculator** - for math operations  
- **Unreliable Knowledge Base** - returns sparse/limited information to demonstrate hallucination detection

In [5]:
from langchain_core.tools import tool

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression. Example: calculate('2 + 2 * 3')"""
    try:
        # Safe evaluation - only allow math operations
        allowed = set('0123456789+-*/.() ')
        if all(c in allowed for c in expression):
            return str(eval(expression))
        return "Error: Only numeric expressions allowed"
    except Exception as e:
        return f"Error: {e}"

@tool
def unreliable_knowledge_base(topic: str) -> str:
    """Query an internal knowledge base for information on a topic. Returns limited information."""
    data = {
        "quantum computing": "Quantum computing uses qubits instead of classical bits. IBM and Google have built quantum computers.",
        "tesla": "Tesla Inc. makes electric vehicles. Elon Musk is the CEO. Headquarters in Austin, Texas.",
        "black holes": "Black holes are regions of spacetime with strong gravity. They were predicted by Einstein's theory.",
    }
    # Case-insensitive lookup
    for key, value in data.items():
        if key in topic.lower():
            return value
    return "No information available on this topic."

# Filter out ask_vectara and search_vectara from MCP tools
excluded_tools = {'ask_vectara', 'search_vectara'}
filtered_mcp_tools = [t for t in tools if t.name not in excluded_tools]

# Combine all tools
all_tools = filtered_mcp_tools + [calculate, unreliable_knowledge_base]
print(f"Agent has {len(all_tools)} tools: {[t.name for t in all_tools]}")

Agent has 6 tools: ['setup_vectara_api_key', 'clear_vectara_api_key', 'correct_hallucinations', 'eval_factual_consistency', 'calculate', 'unreliable_knowledge_base']


## Part 3: Create Self-Correcting Research Agent

Create an agent with a system prompt that instructs it to:
1. Research using `unreliable_knowledge_base` (returns sparse information)
2. Draft a detailed response based on the sources
3. **Automatically verify** using HHEM
4. **Auto-correct** with VHC if the score is low

Since the knowledge base returns limited information, the LLM may elaborate beyond what's in the source, triggering hallucination detection.

In [6]:
SYSTEM_PROMPT = """You are a research assistant. You MUST follow this exact workflow for EVERY question:

STEP 1 - RESEARCH: Call unreliable_knowledge_base once to get source information.

STEP 2 - DRAFT: Write a detailed response to the user's question. You may elaborate beyond the source.

STEP 3 - VERIFY (REQUIRED): You MUST call eval_factual_consistency with:
  - generated_text: your draft response from Step 2
  - documents: the source text from Step 1
This returns an HHEM_score between 0 and 1.

STEP 4 - CORRECT (if needed): If HHEM_score < 0.5, you MUST call correct_hallucinations with:
  - generated_text: your draft response
  - documents: the source text
  - query: the user's original question

STEP 5 - RESPOND: Return the final response (corrected if Step 4 was used).
If correction was applied, note that at the end of your response - along with the original HHEM_score.

IMPORTANT: You MUST complete Steps 1-3 for every query. Never skip the verification step.
"""

# Create the self-correcting agent with all tools
agent = create_react_agent(
    model=llm,
    tools=all_tools,
    prompt=SYSTEM_PROMPT
)

print("Self-correcting agent created with system prompt")

Self-correcting agent created with system prompt


/var/folders/y0/qd7p5ft96k7br3ztvfwfbsqc0000gn/T/ipykernel_51392/3703883124.py:24: LangGraphDeprecatedSinceV10: create_react_agent has been moved to `langchain.agents`. Please update your import to `from langchain.agents import create_agent`. Deprecated in LangGraph V1.0 to be removed in V2.0.
  agent = create_react_agent(


## Part 4: Research Examples with Auto-Verification

The agent researches topics, drafts responses, and automatically verifies/corrects them using HHEM and VHC.

In [7]:
# Example: Query that forces LLM to elaborate beyond sparse source
# The unreliable_knowledge_base only has 2 sentences on quantum computing,
# but we ask for detailed history, inventors, milestones - forcing potential hallucination
query = "Give me a detailed explanation of quantum computing, including its history, who invented it, the key milestones, and how it compares to classical computing."
result = await agent.ainvoke({"messages": [("user", query)]})

# Display the full workflow showing each tool call
print("=" * 60)
print("AGENT WORKFLOW - Showing all steps")
print("=" * 60)

for i, msg in enumerate(result["messages"]):
    msg_type = type(msg).__name__
    print(f"\n[Step {i}] {msg_type}")
    print("-" * 40)

    if msg_type == "HumanMessage":
        print(f"User Query: {msg.content}")
    elif msg_type == "AIMessage":
        if msg.tool_calls:
            for tc in msg.tool_calls:
                print(f"Tool Call: {tc['name']}")
                print(f"Args: {tc['args']}")
        elif msg.content:
            print(f"Final Response:\n{msg.content}")
    elif msg_type == "ToolMessage":
        print(f"Tool: {msg.name}")
        print(f"Result: {msg.content}")

print("\n" + "=" * 60)

AGENT WORKFLOW - Showing all steps

[Step 0] HumanMessage
----------------------------------------
User Query: Give me a detailed explanation of quantum computing, including its history, who invented it, the key milestones, and how it compares to classical computing.

[Step 1] AIMessage
----------------------------------------
Tool Call: unreliable_knowledge_base
Args: {'topic': 'quantum computing history and comparison to classical computing'}

[Step 2] ToolMessage
----------------------------------------
Tool: unreliable_knowledge_base
Result: Quantum computing uses qubits instead of classical bits. IBM and Google have built quantum computers.

[Step 3] AIMessage
----------------------------------------
Tool Call: eval_factual_consistency
Args: {'generated_text': "Quantum computing is a revolutionary field of computing that leverages the principles of quantum mechanics to process information. Unlike classical computing, which uses bits as the smallest unit of data (representing eithe

## Cleanup

In [8]:
# Close the MCP client connection (cleanup subprocess)
try:
    await mcp_client.close()
    print("MCP client closed.")
except AttributeError:
    # Some versions may not have explicit close
    print("MCP client session completed.")

MCP client session completed.
