# Research API: Hybrid Research

Combine Tavily's web research with your internal data for comprehensive coverage.

**What you'll learn:**
- Generate sub-queries for internal RAG
- Create a web research task from internal findings
- Synthesize internal + external data into a final report

## Setup

In [None]:
%pip install -U tavily-python langchain-openai --quiet

In [None]:
import os
import getpass
import time

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY:\n")

In [None]:
from tavily import TavilyClient
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from IPython.display import display, Markdown

client = TavilyClient()
llm = ChatOpenAI(model="gpt-4o-mini")

## Step 1: Query Internal Data

Generate sub-queries and run against your internal RAG system.

> **Note:** Replace `internal_rag_search()` with your own implementation.

In [None]:
class Queries(BaseModel):
    queries: list[str] = Field(description="Sub-queries for internal RAG")

def internal_rag_search(queries: list[str]) -> str:
    """Replace with your internal RAG implementation."""
    # Example: query vector DB, data warehouse, or internal docs
    return "Internal RAG results: [placeholder - implement your own]"

task = "I launched a new AI Meeting Notes feature â€” evaluate customer feedback and competitor products."

# Generate sub-queries for internal data
queries = llm.with_structured_output(Queries).invoke(
    f"Break down this task into 5 queries for internal RAG (usage, feedback, R&D data): {task}"
)

internal_results = internal_rag_search(queries.queries)
print(f"Generated queries: {queries.queries}")

## Step 2: Generate Web Research Task

Identify gaps that require external research.

In [None]:
web_task = llm.invoke(f"""
Given the user's research task and internal findings, generate a web research task.

User Task: {task}
Internal Findings: {internal_results}

Focus on what needs external validation: competitors, reviews, benchmarks, news.
Output ONLY the research task.
""")

print(f"Web research task: {web_task.content}")

## Step 3: Run Tavily Research

In [None]:
result = client.research(input=web_task.content, model="pro")
request_id = result["request_id"]

response = client.get_research(request_id)

while response["status"] not in ["completed", "failed"]:
    print(f"Status: {response['status']}... polling again in 10 seconds")
    time.sleep(10)
    response = client.get_research(request_id)

if response["status"] == "failed":
    raise RuntimeError(f"Research failed: {response.get('error', 'Unknown error')}")

web_report = response["content"]
sources = response["sources"]
print("Web research complete!")

## Step 4: Synthesize Final Report

Combine internal and external findings.

In [None]:
final_report = llm.invoke(f"""
Research task: {task}

Internal findings:
{internal_results}

Web research:
{web_report}

Generate a final report combining both sources. Cite sources appropriately.
Output as markdown.
""")

print("Final report complete!")
display(Markdown(final_report.content))

In [None]:
sources

## Next Steps

- See [Query Refinement](./query_refinement.ipynb) for interactive query refinement
- See [Streaming](./streaming.ipynb) for real-time progress updates