Open Deep Research

Open Deep Research is an open source assistant that automates research and produces customizable reports on any topic. It allows you to customize the research and writing process with specific models, prompts, report structure, and search tools.

🚀 Quickstart

Ensure you have API keys set for your desired search tools and models.

Available search tools:

Tavily API - General web search
Perplexity API - General web search
Exa API - Powerful neural search for web content
ArXiv - Academic papers in physics, mathematics, computer science, and more
PubMed - Biomedical literature from MEDLINE, life science journals, and online books
Linkup API - General web search
DuckDuckGo API - General web search
Google Search API/Scrapper - Create custom search engine here and get API key here

Open Deep Research uses a planner LLM for report planning and a writer LLM for report writing:

You can select any model that is integrated with the init_chat_model() API
See full list of supported integrations here

Using the package

pip install open-deep-research

As mentioned above, ensure API keys for LLMs and search tools are set:

export TAVILY_API_KEY=<your_tavily_api_key>
export ANTHROPIC_API_KEY=<your_anthropic_api_key>

See src/open_deep_research/graph.ipynb for example usage in a Jupyter notebook:

Compile the graph:

from langgraph.checkpoint.memory import MemorySaver
from open_deep_research.graph import builder
memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

Run the graph with a desired topic and configuration:

import uuid 
thread = {"configurable": {"thread_id": str(uuid.uuid4()),
                           "search_api": "tavily",
                           "planner_provider": "anthropic",
                           "planner_model": "claude-3-7-sonnet-latest",
                           "writer_provider": "anthropic",
                           "writer_model": "claude-3-5-sonnet-latest",
                           "max_search_depth": 1,
                           }}

topic = "Overview of the AI inference market with focus on Fireworks, Together.ai, Groq"
async for event in graph.astream({"topic":topic,}, thread, stream_mode="updates"):
    print(event)

The graph will stop when the report plan is generated, and you can pass feedback to update the report plan:

from langgraph.types import Command
async for event in graph.astream(Command(resume="Include a revenue estimate (ARR) in the sections"), thread, stream_mode="updates"):
    print(event)

When you are satisfied with the report plan, you can pass True to proceed to report generation:

async for event in graph.astream(Command(resume=True), thread, stream_mode="updates"):
    print(event)

Running LangGraph Studio UI locally

Clone the repository:

git clone https://github.com/langchain-ai/open_deep_research.git
cd open_deep_research

Edit the .env file with your API keys (e.g., the API keys for default selections are shown below):

cp .env.example .env

Set whatever APIs needed for your model and search tools.

Here are examples for several of the model and tool integrations available:

export TAVILY_API_KEY=<your_tavily_api_key>
export ANTHROPIC_API_KEY=<your_anthropic_api_key>
export OPENAI_API_KEY=<your_openai_api_key>
export PERPLEXITY_API_KEY=<your_perplexity_api_key>
export EXA_API_KEY=<your_exa_api_key>
export PUBMED_API_KEY=<your_pubmed_api_key>
export PUBMED_EMAIL=<your_email@example.com>
export LINKUP_API_KEY=<your_linkup_api_key>
export GOOGLE_API_KEY=<your_google_api_key>
export GOOGLE_CX=<your_google_custom_search_engine_id>

Launch the assistant with the LangGraph server locally, which will open in your browser:

Mac

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies and start the LangGraph server
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev

Windows / Linux

# Install dependencies 
pip install -e .
pip install -U "langgraph-cli[inmem]" 

# Start the LangGraph server
langgraph dev

Use this to open the Studio UI:

- 🚀 API: http://127.0.0.1:2024
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
- 📚 API Docs: http://127.0.0.1:2024/docs

(1) Provide a Topic and hit Submit:

(2) This will generate a report plan and present it to the user for review.

(3) We can pass a string ("...") with feedback to regenerate the plan based on the feedback.

(4) Or, we can just pass true to accept the plan.

(5) Once accepted, the report sections will be generated.

The report is produced as markdown.

📖 Customizing the report

You can customize the research assistant's behavior through several parameters:

report_structure: Define a custom structure for your report (defaults to a standard research report format)
number_of_queries: Number of search queries to generate per section (default: 2)
max_search_depth: Maximum number of reflection and search iterations (default: 2)
planner_provider: Model provider for planning phase (default: "anthropic", but can be any provider from supported integrations with init_chat_model as listed here)
planner_model: Specific model for planning (default: "claude-3-7-sonnet-latest")
writer_provider: Model provider for writing phase (default: "anthropic", but can be any provider from supported integrations with init_chat_model as listed here)
writer_model: Model for writing the report (default: "claude-3-5-sonnet-latest")
search_api: API to use for web searches (default: "tavily", options include "perplexity", "exa", "arxiv", "pubmed", "linkup")

These configurations allow you to fine-tune the research process based on your needs, from adjusting the depth of research to selecting specific AI models for different phases of report generation.

Search API Configuration

Not all search APIs support additional configuration parameters. Here are the ones that do:

Exa: max_characters, num_results, include_domains, exclude_domains, subpages
- Note: include_domains and exclude_domains cannot be used together
- Particularly useful when you need to narrow your research to specific trusted sources, ensure information accuracy, or when your research requires using specified domains (e.g., academic journals, government sites)
- Provides AI-generated summaries tailored to your specific query, making it easier to extract relevant information from search results
ArXiv: load_max_docs, get_full_documents, load_all_available_meta
PubMed: top_k_results, email, api_key, doc_content_chars_max
Linkup: depth

Example with Exa configuration:

thread = {"configurable": {"thread_id": str(uuid.uuid4()),
                           "search_api": "exa",
                           "search_api_config": {
                               "num_results": 5,
                               "include_domains": ["nature.com", "sciencedirect.com"]
                           },
                           # Other configuration...
                           }}

Model Considerations

(1) You can pass any planner and writer models that are integrated with the init_chat_model() API. See full list of supported integrations here.

(2) The planner and writer models need to support structured outputs: Check whether structured outputs are supported by the model you are using here.

(3) With Groq, there are token per minute (TPM) limits if you are on the on_demand service tier:

The on_demand service tier has a limit of 6000 TPM
You will want a paid plan for section writing with Groq models

(4) deepseek-R1 is not strong at function calling, which the assistant uses to generate structured outputs for report sections and report section grading. See example traces here.

Consider providers that are strong at function calling such as OpenAI, Anthropic, and certain OSS models like Groq's llama-3.3-70b-versatile.
If you see the following error, it is likely due to the model not being able to produce structured outputs (see trace):

groq.APIError: Failed to call a function. Please adjust your prompt. See 'failed_generation' for more details.

How it works

Plan and Execute - Open Deep Research follows a plan-and-execute workflow that separates planning from research, allowing for human-in-the-loop approval of a report plan before the more time-consuming research phase. It uses, by default, a reasoning model to plan the report sections. During this phase, it uses web search to gather general information about the report topic to help in planning the report sections. But, it also accepts a report structure from the user to help guide the report sections as well as human feedback on the report plan.
Research and Write - Each section of the report is written in parallel. The research assistant uses web search via Tavily API, Perplexity, Exa, ArXiv, PubMed or Linkup to gather information about each section topic. It will reflect on each report section and suggest follow-up questions for web search. This "depth" of research will proceed for any many iterations as the user wants. Any final sections, such as introductions and conclusions, are written after the main body of the report is written, which helps ensure that the report is cohesive and coherent. The planner determines main body versus final sections during the planning phase.
Managing different types - Open Deep Research is built on LangGraph, which has native support for configuration management using assistants. The report structure is a field in the graph configuration, which allows users to create different assistants for different types of reports.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
examples		examples
src/open_deep_research		src/open_deep_research
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Deep Research

🚀 Quickstart

Using the package

Running LangGraph Studio UI locally

Mac

Windows / Linux

📖 Customizing the report

Search API Configuration

Model Considerations

How it works

UX

Local deployment

Hosted deployment

About

Releases

Packages

Contributors 10

Languages

License

langchain-ai/open_deep_research

Folders and files

Latest commit

History

Repository files navigation

Open Deep Research

🚀 Quickstart

Using the package

Running LangGraph Studio UI locally

Mac

Windows / Linux

📖 Customizing the report

Search API Configuration

Model Considerations

How it works

UX

Local deployment

Hosted deployment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages