# Research Agent with Tavily Search, Map, and Extract

In this tutorial, you'll learn how to build a research agent that can crawl websites and reason over live web data.

You will leverage the official [Tavily-LangChain integration](https://www.tavily.com/integrations/langchain) to autonomously set the crawl API parameters such as the crawl `instructions` or `catergories` based on the context and user instructions. 

By the end of this lesson, you'll know how to:
- Seamlessly connect OpenAI foundation models to the web for up-to-date research
- Build a react-style web agent with LangGraph
- Dynamically configure search, extract, and crawl parameters with the Tavily-LangChain integration.

## Getting Started

Follow these steps to set up:

1. **Sign up** for Tavily at [app.tavily.com](https://app.tavily.com/home/) to get your API key.


2. **Sign up** for OpenAI to get your API key. Feel free to substitute any other LLM provider.
   

2. **Copy your API keys** from your Tavily and OpenAI account dashboard.

3. **Paste your API keys** into the cell below and execute the cell.

In [2]:
# To export your API keys into a .env file, run the following cell (replace with your actual keys):
!echo "TAVILY_API_KEY=<your-tavily-api-key>" >> .env
!echo "OPENAI_API_KEY=<your-openai-api-key>" >> .env

Install dependencies in the cell below.

In [None]:
%pip install -U tavily-python langchain-openai langchain langchain-tavily langgraph --quiet

### Setting Up Your Tavily API Client

The code below will instantiate the Tavily client with your API key.

In [4]:
import os
import getpass
from dotenv import load_dotenv
from tavily import TavilyClient

# Load environment variables from .env file
load_dotenv()

# Prompt the user to securely input the API key if not already set in the environment
if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

# Initialize the Tavily API client using the loaded or provided API key
tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

Let's define the following modular tools with the Tavily-LangChain integration:
1. **Search** the web for relevant information

2. **Extract** content from specific web pages

3. **Map** entire websites

The Map endpoint provides a sitemap of all links on a website, while crawling downloads the full content of each page. Sitemapping is more efficient when you just need the URL structure, as it avoids downloading and processing full page content. This makes it ideal for agents that need to understand site organization while staying within LLM context limits.

In [None]:
# Define the set of web tools our agent will use to interact with the Tavily API.
from langchain_tavily import TavilySearch
from langchain_tavily import TavilyExtract
from langchain_tavily import TavilyMap

# Define the LangChain search tool
search = TavilySearch(max_results=10, topic="general")

# Define the LangChain extract tool
extract = TavilyExtract(extract_depth="advanced", format="markdown")

# Define the LangChain map tool
map = TavilyMap()

Now let's set up the gpt-4.1 model to power our agent. If you prefer a different LLM provider, you can easily plug in any LangChain Chat Model.

In [31]:
from langchain_openai import ChatOpenAI

# instantiate the model
o3 = ChatOpenAI(model="o3-2025-04-16")

## Web Agent Setup

Next, we'll build a Web Agent powered by Tavily, which consists of three main components: the language model, a set of web tools, and a system prompt. The language model (gpt-4.1) serves as the agent's "brain," while the web tools (Search, Extract, and Crawl) allow the agent to interact with and gather information from the internet. The system prompt guides the agent's behavior, explaining how and when to use each tool to accomplish its research goals.

You are encouraged to experiment with the system prompt or try different language models (like swapping between gpt-4.1 and o3) to change the agent's style, personality, or optimize its performance for specific use cases.

In [None]:
import datetime

today = datetime.datetime.today().strftime("%A, %B %d, %Y")
PROMPT = f"""    
        You are a research assistant created by the company Tavily. 
        Your mission is to conduct comprehensive, thorough, accurate, and up-to-date research, grounding your findings in credible web data.
        
        Today's Date: {today}

        Guidelines:
        - Your responses must be formatted nicely in markdown format. 
        - You must always provide web source citations for every claim you make.
        - Ask follow up questions to the user before using the tools to ensure you have all the information you need to complete the task effectively.
        - Do not ask clarifying questions to the user, just use the tools as needed.

       You have access to the following tools: Web Search, Web Map, and Web Extract.

        Tavily Web Search
        - Retrieve relevant web pages from the public internet based on a search query.
        - Provide a search query to receive semantically ranked results, each containing the title, URL, and a content snippet.

        Tavily Web Map
        - Explore a website's structure by generating a sitemap.
        - Given a starting URL, find all the nested links.
        - Useful for deep information discovery from a single source.
        - You must be certain that the input URL is a valid URL.
        - The URLs returned by the Map tool can later be scraped with the Extract tool.

        Tavily Web Extract
        - Extract/Scrape the full content from specific web pages, given a URL or a list of URLs.
        - You can extract the full content of up to 20 URLs for efficient processing. Use this feature to enhance efficiency.

        Use the following format:

        Question: the input question you must answer
        Thought: you should always think about what to do
        Action: the action to take, should be one of Web Search, Web Crawl, and Web Extract
        Action Input: the input to the action
        Observation: the result of the action
        ... (this Thought/Action/Action Input/Observation can repeat N times)
        Thought: I now know the final answer
        Final Answer: the final answer to the original input question

        ---

        You will now receive a message from the user:

        """

This agent leverages a pre-built LangGraph reAct implementation, as illustrated in the diagram below. The reAct framework enables the agent to reason about which actions to take, use the web tools in sequence, and iterate as needed until it completes its research task. The system prompt is especially important—it instructs the agent on best practices for using the tools together, ensuring that the agent's responses are thorough, accurate, and well-sourced.

<img src="../assets/crawl/web-agent.svg" alt="Agent" width="500"/>


In [None]:
from langgraph.prebuilt import create_react_agent

# Create the web agent
web_agent = create_react_agent(model=o3, tools=[search, map, extract], prompt=PROMPT)

### Test Your Tavily Web Agent

Now we'll run the agent and see how it uses the different web tools.

In [None]:
from langchain.schema import HumanMessage

# Test the web agent
inputs = {
    "messages": [HumanMessage(content="find all of the latest Tavily blog posts")]
}

# Stream the web agent's response
for s in web_agent.stream(inputs, stream_mode="values"):
    message = s["messages"][-1]
    if isinstance(message, tuple):
        print(message)
    else:
        message.pretty_print()

Examine the agent's intermediate steps printed above, including how it chooses and configures different tool parameters. Then, display the agent's final answer in markdown format.

In [None]:
from IPython.display import Markdown

Markdown(message.content)

Notice how the agent cleverly combines Tavily’s tools—search, crawl, and extract—to complete the task end-to-end.

## Conclusion
 
In this tutorial, you learned how to:
- Set up Tavily web tools (search, extract, crawl) with LangChain
- Build an intelligent web research agent using LangGraph's `create_react_agent`
- Design effective system prompts for autonomous web research
 
You now have a fully functional web research agent that autonomously combines search, extraction, and crawling to complete complex research objectives.
