# Tavily Hybrid Research

In this notebook, we'll walk you through the process of performing hybrid research with Tavily Research Endpoint and internal RAG.

Tavily covers the web delivering you and your agents in depth, real-time research. However, you may want to combine this with your internal data, and provide it as context to our researcher to know what to focus on and to maximize coverage.


## The plan

In this notebook specifically, we will start by generating queries for the prompt, run a vector search and then feed that into the research prompt.

1. Generate queries
2. Vector Search on internal data
3. Generate research task for Tavily Research Endpoint
4. Tavily Research Call with internal data as input
5. Rewrite the research with the tavily report + RAG results


In [None]:
%pip install -q tavily-python langchain

Before running the next cell, **set your own LLM model and API keys** (e.g. `MODEL_OF_CHOICE`, `LLM_API_KEY`, `TAVILY_API_KEY`).


In [None]:
import getpass
import os
import time
import httpx
from IPython.display import display, Markdown
from tavily import TavilyClient
from dataclasses import Field
from pydantic import BaseModel
from langchain.chat_models import init_chat_model

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

# YOU NEED TO CHOOSE THE MODEL YOU WANT AND SET THE API KEY
LLM_API_KEY = os.getenv("LLM_API_KEY")
llm = init_chat_model("MODEL_OF_CHOICE")

tavily_client = TavilyClient(api_key=TAVILY_API_KEY)
headers = {"Authorization": f"Bearer {TAVILY_API_KEY}"}

url = "https://api.tavily.com/research/"

Here you need to plug in **your own internal RAG logic**.

Replace the placeholder `internal_RAG_research` implementation below with code that queries your internal data (e.g. vector DB, warehouse, or docs) and returns the most relevant findings for the generated queries.


In [None]:
class Queries(BaseModel):
    queries: list[str] = Field(description="A list of queries to be used for the research")

# Here we just want you to implement your own internal RAG research.
def internal_RAG_research(queries):
    # Implement your own internal RAG research
    return "Internal RAG results"

structured_llm = llm.with_structured_output(Queries)
task_description = "I launched a new AI Meeting Notes feature â€” now I want to evaluate customer feedback and what competitor products are out there."
# Generate subqueries to get full coverage of internal data
queries = structured_llm.invoke("Here is my research task: {task_description}. Break down the task into a list of 5 queries to be used for RAG on my internal data. I have internal data about usage, feedback, and initial R&D on the feature." )
internal_rag_results = internal_RAG_research(queries)

In [None]:
# Generate a web research task
web_research_task = llm.invoke(
    f"""
Using the user's research task and the internal RAG findings below, generate ONE concise research task for a web-research agent.

User Research Task:
{task_description}

Internal Findings:
{internal_rag_results}

Your goal:
Identify what information is still missing or requires external validation, and describe exactly what the web-research agent should investigate.

The research task should:
- Focus only on what needs external research.
- Specify key topics, entities, or sources to look into (e.g., reviews, news, competitors, documentation, benchmarks, financials, regulations, etc.).
- Not restate internal findings.
- Be a single clear paragraph or bullet list that an autonomous agent can execute.

Output ONLY the final research task.
    """
)

In [None]:
result = tavily_client.research(input=web_research_task, model="pro")
request_id = result["id"]

research_report = ""
sources = []

# Poll every 10 seconds
while True:
    response = httpx.get(url + request_id, headers=headers)
    response_json = response.json()

    status = response_json["status"]
    if status == "completed":
        research_report = response_json["content"]
        sources = response_json["sources"]
        break

    if status == "failed":
        raise RuntimeError(f"Research failed: {response_json['error']}")

    print(f"Status: {status}... polling again in 10 seconds")
    time.sleep(10)

In [None]:
final_synthesis_prompt = f"""You research task was: {task_description}.

Here is the internal RAG results:
{internal_rag_results}

Here is the Tavily research report:
{research_report}

Take these results and generate a final report to accomplish the research task.

Make sure you properly in text cite sources as you did with the web results, but if you use internal data, make sure to cite that as well.

Output the final report as a markdown document.
"""

final_report = llm.invoke(final_synthesis_prompt)
print("Research Complete!")
report = Markdown(final_report)
display(report)

In [None]:
sources