[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dynamiq/dynamiq-research-workflow.ipynb)

# Research and Report Generation using Weaviate and Dynamiq

This notebook demonstrates how to leverage the Dynamiq library with Weaviate to automate the process of conducting research, storing retrieved information, and generating structured reports based on stored data.

### Installation  

First, ensure you have the `dynamiq` library installed:  

```python
!pip install dynamiq
```  

### Setting Up Weaviate  

Before diving in, configure your Weaviate instance and set up the necessary environment variables:  

- `WEAVIATE_URL` – The URL of your Weaviate instance;
- `WEAVIATE_API_KEY` – Your API key for authentication;
- `OPENAI_API_KEY`- The API key to OpenAI;
- `TAVILY_API_KEY` - The API key for Tavily (https://tavily.com/), the tool we will use for web search.

```python
import os
os.environ["WEAVIATE_URL"] = "https://your-weaviate-instance.com"
os.environ["WEAVIATE_API_KEY"] = "your-api-key"
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["TAVILY_API_KEY"] = "your-api-key"
```

In [5]:
from dynamiq import Workflow
from dynamiq.connections import Weaviate, Tavily, OpenAI as OpenAIConnection
from dynamiq.prompts import Message, Prompt
from dynamiq.nodes.llms import OpenAI
from dynamiq.nodes.agents import ReActAgent
from dynamiq.nodes.node import InputTransformer, NodeDependency
from dynamiq.nodes.operators import Map
from dynamiq.nodes.tools import Python, TavilyTool
from dynamiq.nodes.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from dynamiq.nodes.writers import WeaviateDocumentWriter
from dynamiq.nodes.retrievers import WeaviateDocumentRetriever, VectorStoreRetriever

# Step 0: Define a Topic for Research

In [110]:
USER_QUERY = "How big is Nvidia?"

# Step 1: Conduct research and store data in Weaviate

The research workflow is designed to search for information based on a user query, refine the results, and store them in Weaviate for future retrieval.

The steps include:
1. **Generate several search queries** to maximize search coverage.
2. **Execute searches with Tavily** using the generated queries.  
3. **Extract and structure search results** into documents.  
4. **Embed documents** for efficient vector-based retrieval.  
5. **Store processed data** in Weaviate for seamless long-term access.

In [64]:
def create_research_workflow():
    """Creates a research workflow that searches for information and stores it in Weaviate."""

    # Node 1: Generate several search queries based on task
    generate_queries = OpenAI(
        name="Generate Sub-Queries",
        id="generate_queries",
        model="gpt-4o-mini",
        connection=OpenAIConnection(),
        prompt=Prompt(messages=[Message(role="user", content="""
You are a skilled research assistant generating web search queries to gather objective and comprehensive information on the following task:

"{{query}}"

Generate 5 diverse and precise search queries that account for:
- Relevant keywords and concepts from the task
- Recent events, updates, or contextual details that may impact the search
- Different angles or perspectives to ensure a well-rounded understanding

Your response must be a **LIST format* containing only the search queries, like this:

["query_1", "query_2", "query_3", "query_4", "query_5"]
""")]),
        input_transformer=InputTransformer(selector={"query": "$.input"})
    )

    # Node 2: Parse the generated queries
    parse_queries = Python(
        name="Parse Sub-Queries",
        id="parse_queries",
        code="""
def run(input_data):
    queries = input_data["queries"].strip('[').strip(']').split(',')
    return [{"query": q.strip().strip('"').strip("'")} for q in queries]
""",
        input_transformer=InputTransformer(selector={"queries": f"${[generate_queries.id]}.output.content"}),
        depends=[NodeDependency(generate_queries)],
    )

    # Node 3: Search the web using generated queries
    search_queries = Map(
        name="Search Sub-Queries",
        id="search_queries",
        node=TavilyTool(name="Search Query", connection=Tavily()),
        input_transformer=InputTransformer(selector={"input": f"${[parse_queries.id]}.output.content"}),
        depends=[NodeDependency(parse_queries)],
    )

    # Node 4: Convert search results into documents
    convert_to_documents = Python(
        name="Convert to Documents",
        id="convert_to_documents",
        code="""
def run(input_data):
    from dynamiq.types import Document
    documents = []
    for query_result in input_data['content']:
        for el in query_result['content']['raw_response']['results']:
            doc = Document(content=el['content'], metadata={"url": el['url']})
            documents.append(doc)
    return {'documents': documents}
""",
        input_transformer=InputTransformer(selector={"content": f"${[search_queries.id]}.output.output"}),
        depends=[NodeDependency(search_queries)],
    )

    # Node 5: Embed the documents for retrieval
    embed_documents = OpenAIDocumentEmbedder(
        name="Embed Documents",
        id="embed_documents",
        connection=OpenAIConnection(),
        input_transformer=InputTransformer(
            selector={"documents": f"${[convert_to_documents.id]}.output.content.documents"}),
        depends=[NodeDependency(convert_to_documents)],
    )

    # Node 6: Store the embedded documents in Weaviate
    store_documents = WeaviateDocumentWriter(
        name="Store Documents",
        id="store_documents",
        connection=Weaviate(),
        index_name="Default",
        create_if_not_exist=True,
        input_transformer=InputTransformer(selector={"documents": f"${[embed_documents.id]}.output.documents"}),
        depends=[NodeDependency(embed_documents)],
    )

    # Create and return the workflow
    workflow = Workflow()
    for node in [
        generate_queries,
        parse_queries,
        search_queries,
        convert_to_documents,
        embed_documents,
        store_documents,
    ]:
        workflow.flow.add_nodes(node)

    return workflow

In [65]:
research_workflow = create_research_workflow()
research_workflow.run(input_data={"input": USER_QUERY})

INFO:dynamiq.utils.logger:Workflow 0411b1e0-e72f-4244-9268-bfe7431cddb7: execution started.
INFO:dynamiq.utils.logger:Flow d4ac5a94-5605-49ac-b795-55717354d6cf: execution started.
INFO:dynamiq.utils.logger:Node Generate Sub-Queries - generate_queries: execution started.
INFO:dynamiq.utils.logger:Node Generate Sub-Queries - generate_queries: execution succeeded in 1.3s.
INFO:dynamiq.utils.logger:Node Parse Sub-Queries - parse_queries: execution started.
INFO:dynamiq.utils.logger:Tool Parse Sub-Queries - parse_queries: started with INPUT DATA:
{'queries': '["Nvidia company size revenue market cap 2023", "Nvidia employee count and growth trends 2023", "Nvidia global presence and operations overview", "Nvidia recent acquisitions and their impact on company size", "Nvidia comparison with competitors in the semiconductor industry 2023"]'}
INFO:dynamiq.utils.logger:Tool Parse Sub-Queries - parse_queries: finished with RESULT:
[{'query': 'Nvidia company size revenue market cap 2023'}, {'query'

# Step 2: Generate report from stored data

This workflow retrieves the most relevant documents from Weaviate and uses an LLM agent to generate a structured markdown report.

In [114]:
def create_report_workflow():
    """Creates a report-generation workflow using retrieved documents from Weaviate."""

    text_embedder = OpenAITextEmbedder()
    retriever = WeaviateDocumentRetriever(index_name="Default", top_k=5)

    retriever_tool = VectorStoreRetriever(
        text_embedder=text_embedder,
        document_retriever=retriever,
        is_optimized_for_agents=True
    )

    # Agent to generate the report
    report_agent = ReActAgent(
        id="report_writer",
        llm=OpenAI(model='gpt-4o'),
        tools=[retriever_tool],
        role="Create a detailed markdown report based on retrieved information.\n"
             "Ensure that all special characters are properly escaped.\n"
             "Cite sources in APA style.",
    )

    # Create and return the workflow
    workflow = Workflow()
    workflow.flow.add_nodes(report_agent)
    return workflow

In [112]:
report_workflow = create_report_workflow()
result = report_workflow.run(input_data={"input": USER_QUERY})

INFO:dynamiq.utils.logger:Workflow 9779d062-26d5-4d5f-b4d5-1bbcc9b097b2: execution started.
INFO:dynamiq.utils.logger:Flow 91577226-796c-4801-9423-06edbbbb891f: execution started.
INFO:dynamiq.utils.logger:Node React Agent - report_writer: execution started.
INFO:dynamiq.utils.logger:Agent React Agent - report_writer: started with input {'input': 'How big is Nvidia?', 'images': None, 'files': None, 'user_id': None, 'session_id': None, 'metadata': {}, 'tool_params': ToolParams(global_params={}, by_name_params={}, by_id_params={})}
INFO:dynamiq.utils.logger:Node LLM - bd65e33c-0c2b-459a-bb1d-4a055e303e51: execution started.
INFO:dynamiq.utils.logger:Node LLM - bd65e33c-0c2b-459a-bb1d-4a055e303e51: execution succeeded in 1.6s.
INFO:dynamiq.utils.logger:Agent React Agent - report_writer: Loop 1, reasoning:
Thought: To provide a comprehensive answer about Nvidia's size, I should consider various aspects such as market capitalization, revenue, employee count, and global presence. I will retr

## Results

In [113]:
Sfrom IPython.display import display, Markdown

final_report = result.output["report_writer"].get("output", {}).get("content")

display(Markdown(final_report))

### Nvidia Corporation: Company Size Overview

**Market Capitalization:**
As of March 4, 2025, Nvidia's market capitalization is approximately $3.19 trillion, making it the second most valuable company globally by market cap (Macrotrends, 2025; CompaniesMarketCap, 2025).

**Revenue:**
Nvidia reported a revenue of $60.922 billion, highlighting its significant financial performance in the semiconductor industry (Macrotrends, 2025).

**Industry Position:**
Nvidia is a leading player in the computer and technology sector, specifically within the semiconductor industry. It is renowned for its innovations in visual computing technologies and the development of the graphics processing unit (GPU). The company's focus has expanded from PC graphics to artificial intelligence (AI) solutions, supporting high-performance computing, gaming, and virtual reality platforms (Macrotrends, 2025).

**Global Presence:**
Nvidia operates on a global scale, contributing to various multi-billion-dollar markets such as robotics and self-driving vehicles. It is a dominant force in data centers, professional visualization, and gaming markets, where it competes with companies like Intel and Advanced Micro Devices (Macrotrends, 2025).

**Company Background:**
Founded in January 1993 by Jen-Hsun Huang, Curtis Priem, and Chris Malachowsky, Nvidia operates on a fabless model, meaning it designs but does not manufacture its own chips. This strategic approach has allowed Nvidia to focus on innovation and market expansion (CompaniesMarketCap, 2025).

### References
- Macrotrends. (2025). Nvidia Market Cap. Retrieved from https://www.macrotrends.net/stocks/charts/NVDA/nvidia/market-cap
- CompaniesMarketCap. (2025). Nvidia Market Capitalization. Retrieved from https://companiesmarketcap.com/nvidia/marketcap/