# LangChain + Tavily Integration Cookbook

The `langchain-tavily` package provides native LangChain tools that integrate seamlessly with LangChain's agent frameworks and LangGraph workflows.

By the end of this cookbook, you'll know how to:
- Use TavilySearch for real-time web search
- Use TavilyExtract to get full page content from URLs
- Use TavilyMap to discover site structure and internal links
- Use TavilyCrawl for deep website crawling
- Use TavilyResearch for comprehensive research tasks
- Build a LangGraph agent with all Tavily tools

---

## Getting Started

1. **Sign up** for Tavily at [app.tavily.com](https://app.tavily.com/home/) to get your API key.
2. **Copy your API key** from your Tavily account dashboard.
3. **Run the cells below** to set up your environment.

In [None]:
%pip install -qU langchain-tavily langchain-openai langgraph python-dotenv

In [2]:
import os
import getpass
from dotenv import load_dotenv

load_dotenv()

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY:\n")

---

## 1. TavilySearch - Web Search Tool

TavilySearch executes search queries using Tavily's Search API. It returns relevant web results with titles, URLs, content snippets, and relevance scores.


In [None]:
from langchain_tavily import TavilySearch

# Initialize the search tool with default settings
search = TavilySearch(
    max_results=5,
    topic="general",
    search_depth="basic",
)

In [None]:
# Basic search invocation
result = search.invoke({"query": "What are the latest developments in AI agents?"})

print(f"Query: {result['query']}")
print(f"Response time: {result['response_time']}s")
print(f"Number of results: {len(result['results'])}")
print("\n" + "="*60 + "\n")

for i, r in enumerate(result["results"], 1):
    print(f"Result {i}:")
    print(f"  Title: {r['title']}")
    print(f"  URL: {r['url']}")
    print(f"  Score: {r['score']:.4f}")
    print(f"  Content: {r['content'][:200]}...")
    print()

### News Search with Time Range

In [None]:
# Search for recent news
news_search = TavilySearch(
    max_results=5,
    topic="news",
    time_range="week",
)

news_result = news_search.invoke({"query": "OpenAI announcements"})

for i, r in enumerate(news_result["results"], 1):
    print(f"{i}. {r['title']}")
    print(f"   {r['url']}\n")

### Domain-Filtered Search

In [None]:
# Search only specific domains
domain_search = TavilySearch(
    max_results=5,
    include_domains=["arxiv.org", "openai.com", "anthropic.com"],
)

domain_result = domain_search.invoke({"query": "large language model research"})

for i, r in enumerate(domain_result["results"], 1):
    print(f"{i}. {r['title']}")
    print(f"   {r['url']}\n")

---

## 2. TavilyExtract - Content Extraction Tool

TavilyExtract retrieves the full content from one or more URLs. It's useful when you need the complete text of a webpage rather than just search snippets.

In [7]:
from langchain_tavily import TavilyExtract

extract = TavilyExtract(
    extract_depth="basic",
    include_images=False,
)

In [None]:
# Extract content from a single URL
extract_result = extract.invoke({
    "urls": ["https://en.wikipedia.org/wiki/Artificial_intelligence"]
})

print(f"Response time: {extract_result['response_time']}s")
print(f"Number of results: {len(extract_result['results'])}")
print(f"Failed results: {len(extract_result['failed_results'])}")
print("\n" + "="*60 + "\n")

for r in extract_result["results"]:
    print(f"URL: {r['url']}")
    print(f"Content length: {len(r['raw_content'])} characters")
    print(f"Preview: {r['raw_content'][:500]}...")

In [None]:
# Extract from multiple URLs at once (up to 20)
multi_extract = extract.invoke({
    "urls": [
        "https://docs.tavily.com",
        "https://python.langchain.com/docs/introduction/"
    ]
})

for r in multi_extract["results"]:
    print(f"URL: {r['url']}")
    print(f"Content length: {len(r['raw_content'])} characters\n")

---

## 3. TavilyMap - Site Structure Discovery

TavilyMap discovers all internal links from a base URL without extracting content. It's useful for understanding site structure before crawling.

In [10]:
from langchain_tavily import TavilyMap

map_tool = TavilyMap()

In [None]:
# Map a documentation site
map_result = map_tool.invoke({
    "url": "www.tavily.com",
    "instructions": "Find all API documentation and integration pages"
})

print(f"Base URL: {map_result['base_url']}")
print(f"Response time: {map_result['response_time']}s")
print(f"URLs discovered: {len(map_result['results'])}")
print("\nDiscovered URLs:")

for url in map_result["results"][:15]:
    print(f"  - {url}")

if len(map_result["results"]) > 15:
    print(f"  ... and {len(map_result['results']) - 15} more")

---

## 4. TavilyCrawl - Deep Website Crawling

TavilyCrawl combines mapping and extraction to crawl entire websites and retrieve content from multiple pages. Use natural language instructions to guide what content to focus on.

In [16]:
from langchain_tavily import TavilyCrawl

crawl = TavilyCrawl()

In [None]:
# Crawl documentation for specific content
crawl_result = crawl.invoke({
    "url": "www.tavily.com",
    "instructions": "Extract API reference and code examples",
})

print(f"Base URL: {crawl_result['base_url']}")
print(f"Response time: {crawl_result['response_time']}s")
print(f"Pages crawled: {len(crawl_result['results'])}")
print("\n" + "="*60 + "\n")

for i, page in enumerate(crawl_result["results"][:3], 1):
    print(f"Page {i}: {page['url']}")
    print(f"Content preview: {page['raw_content'][:300]}...")
    print()

---

## 5. TavilyResearch - Comprehensive Research

TavilyResearch creates comprehensive research reports on complex topics. It automatically searches, synthesizes, and cites sources.

### Key Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | str | "auto" | "mini", "pro", or "auto" |
| `citation_format` | str | "numbered" | "numbered", "mla", "apa", "chicago" |
| `stream` | bool | False | Stream results as generated |

In [19]:
from langchain_tavily import TavilyResearch, TavilyGetResearch
import time

research = TavilyResearch(
    model="mini",
    citation_format="numbered",
)

get_research = TavilyGetResearch()

In [None]:
# Start a research task
research_task = research.invoke({
    "input": "What are the key differences between LangChain and LlamaIndex for building RAG applications?"
})

print(f"Research task created:")
print(f"  Request ID: {research_task['request_id']}")
print(f"  Status: {research_task['status']}")
print(f"  Model: {research_task['model']}")

In [None]:
# Poll for results
request_id = research_task["request_id"]

while True:
    result = get_research.invoke({"request_id": request_id})
    status = result["status"]
    print(f"Status: {status}")
    
    if status == "completed":
        break
    elif status == "failed":
        print("Research failed")
        break
    
    time.sleep(2)

In [None]:
from IPython.display import display, Markdown

if result["status"] == "completed":
    display(Markdown(result["content"]))
    
    print("\n" + "="*60)
    print("Sources:")
    for i, source in enumerate(result.get("sources", []), 1):
        print(f"  [{i}] {source['title']}")
        print(f"      {source['url']}")

---

## 6. Building a LangGraph Agent with Tavily Tools

Now let's combine all the Tavily tools into a powerful research agent using LangGraph.

In [40]:
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain_core.prompts import ChatPromptTemplate
import datetime

# Initialize the LLM
llm = ChatOpenAI(model="gpt-5-mini-2025-08-07", temperature=0)

In [41]:
# Set up Tavily tools for the agent
search_tool = TavilySearch(
    max_results=10,
    search_depth="advanced",
    include_raw_content=True,
)

extract_tool = TavilyExtract(
    extract_depth="advanced",
)

crawl_tool = TavilyCrawl()

tools = [search_tool, extract_tool, crawl_tool]

In [46]:
# Create system prompt
today = datetime.datetime.now().strftime("%B %d, %Y")

system_prompt = f"""You are an expert research assistant with access to powerful web tools.

Today's date: {today}

Your available tools:

1. **TavilySearch**: Search the web for information
   - Use for finding recent news, articles, and general information
   - Returns ranked results with titles, URLs, and content snippets

2. **TavilyExtract**: Extract full content from URLs
   - Use when you need the complete text of a specific webpage
   - Can process up to 20 URLs at once

3. **TavilyCrawl**: Crawl websites for deep content
   - Use for exploring documentation sites or gathering comprehensive information
   - Provide clear instructions about what content to focus on

Research guidelines:
- Start with a search to identify relevant sources
- Use extract to get full content from promising URLs
- Use crawl for documentation sites or when you need comprehensive coverage
- Always cite your sources with URLs
- Synthesize information from multiple sources
- Be thorough but concise in your responses
"""

In [47]:
# Create the agent
agent = create_agent(
    model=llm,
    tools=tools,
    system_prompt=system_prompt,
)

In [48]:
# Test the agent
response = agent.invoke({
    "messages": [{"role": "user", "content": "What are the main features of LangGraph and how does it compare to other agent frameworks?"}]
})

In [None]:
# Display the final response
final_message = response["messages"][-1]
display(Markdown(final_message.content))

In [None]:
# View tool execution flow
print("Tool Execution Flow")
print("=" * 50)

for msg in response["messages"]:
    if hasattr(msg, "tool_calls") and msg.tool_calls:
        for tc in msg.tool_calls:
            print(f"Tool: {tc['name']}")
            args = tc.get("args", {})
            for key, value in args.items():
                if isinstance(value, str) and len(value) > 80:
                    value = value[:77] + "..."
                print(f"  {key}: {value}")
            print()