[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dynamiq/dynamiq-getting-started.ipynb)

# Getting Started with Weaviate and Dynamiq

This notebook demonstrates how to integrate the Weaviate vector database with the Dynamiq library. We’ll cover four essential topics:

1. **Writing Documents to Weaviate**  

2. **Retrieving Documents from Weaviate**  

3. **Using the Weaviate as a Tool in an AI Agent**  

4. **Managing the Weaviate Vector Store**  

For more information visit: [Dynamiq GitHub](https://github.com/dynamiq-ai/dynamiq)

## Installation  

First, ensure you have the `dynamiq` library installed:  

```python
!pip install dynamiq
```  

## Setting Up Weaviate  

Before diving in, configure your Weaviate instance and set up the necessary environment variables:  

- `WEAVIATE_URL` – The URL of your Weaviate instance;
- `WEAVIATE_API_KEY` – Your API key for authentication;
- `OPENAI_API_KEY`- The API key to OpenAI which we will need for this tutorial.

```python
import os
os.environ["WEAVIATE_URL"] = "https://your-weaviate-instance.com"
os.environ["WEAVIATE_API_KEY"] = "your-api-key"
os.environ["OPENAI_API_KEY"] = "your-api-key"
```

## Writing Documents to Weaviate

Before we can retrieve data, we first need to store documents in Weaviate. The following workflow embeds and writes documents to Weaviate.

### Define the Writer Workflow

In [None]:
from dynamiq import Workflow
from dynamiq.types import Document
from dynamiq.nodes import InputTransformer
from dynamiq.nodes.node import NodeDependency
from dynamiq.nodes.embedders import OpenAIDocumentEmbedder
from dynamiq.nodes.writers import WeaviateDocumentWriter

def define_writer_workflow() -> Workflow:
    """Defines a workflow for embedding and writing documents to Weaviate."""
    docs_embedder = OpenAIDocumentEmbedder()

    writer_node = WeaviateDocumentWriter(
        index_name="Default",
        create_if_not_exist=True,
        depends=[
            NodeDependency(docs_embedder),
        ],
        input_transformer=InputTransformer(
            selector={
                "documents": f"${[docs_embedder.id]}.output.documents",
            },
        ),
    )

    wf = Workflow()
    wf.flow.add_nodes(docs_embedder)
    wf.flow.add_nodes(writer_node)
    return wf

### Add Documents

In [None]:
documents = [
    Document(content="London is the capital of Great Britain.", metadata={"country": "England", "topic": "Geography"}),
    Document(content="Ottawa is the capital of Canada.", metadata={"country": "Canada", "topic": "Geography"}),
    Document(content="An adjective is a word that describes or defines a noun or noun phrase.", metadata={"topic": "English language"}),
    Document(content="A verb is a word that describes an action, state, or occurrence", metadata={"topic": "English language"}),
]

wf = define_writer_workflow()
result = wf.run(input_data={"documents": documents})

INFO:dynamiq.utils.logger:Workflow b37f284b-e22e-4137-9b6b-9d810988053f: execution started.
INFO:dynamiq.utils.logger:Flow 2e936e8f-d342-4510-b1d2-d44718063fc7: execution started.
INFO:dynamiq.utils.logger:Node OpenAIDocumentEmbedder - 3bdf31f4-08c2-4545-ab17-69cf49f7c72d: execution started.
INFO:dynamiq.utils.logger:Node OpenAIDocumentEmbedder - 3bdf31f4-08c2-4545-ab17-69cf49f7c72d: execution succeeded in 544ms.
INFO:dynamiq.utils.logger:Node WeaviateDocumentWriter - 3134f474-8bd5-4993-b8c2-4d3b3b4aeea9: execution started.
INFO:dynamiq.utils.logger:Node WeaviateDocumentWriter - 3134f474-8bd5-4993-b8c2-4d3b3b4aeea9: execution succeeded in 310ms.
INFO:dynamiq.utils.logger:Flow 2e936e8f-d342-4510-b1d2-d44718063fc7: execution succeeded in 869ms.
INFO:dynamiq.utils.logger:Workflow b37f284b-e22e-4137-9b6b-9d810988053f: execution succeeded in 872ms.


In [None]:
print(f'Result status: {result.status}')
print(f'Number of upserted documents: {result.output[wf.flow.nodes[-1].id].get("output", {}).get("upserted_count")}')

Result status: RunnableStatus.SUCCESS
Number of upserted documents: 4


## Retrieving Documents from Weaviate

Once documents are stored, we can retrieve relevant ones using a query.

### Define the Retriever Workflow

In [None]:
from dynamiq import Workflow
from dynamiq.nodes import InputTransformer
from dynamiq.nodes.node import NodeDependency
from dynamiq.nodes.embedders import OpenAITextEmbedder
from dynamiq.nodes.retrievers import WeaviateDocumentRetriever


def define_retriever_workflow() -> Workflow:
    """Defines a workflow for embedding a query and retrieving documents from Weaviate."""
    text_embedder = OpenAITextEmbedder()

    retriever_node = WeaviateDocumentRetriever(
        index_name="Default",
        depends=[
            NodeDependency(text_embedder),
        ],
        input_transformer=InputTransformer(
            selector={
                "embedding": f"${[text_embedder.id]}.output.embedding",
                "top_k": "$.max_number_retrieves"
            },
        ),
        filters={
            "operator": "OR",
            "conditions": [
                {"field": "country", "operator": "==", "value": "England"},
                {"field": "country", "operator": "==", "value": "Canada"},
            ],
        },
    )

    wf = Workflow()
    wf.flow.add_nodes(text_embedder)
    wf.flow.add_nodes(retriever_node)
    return wf

### Query the Database

In [None]:
wf = define_retriever_workflow()
result = wf.run(input_data={"query": "Where is Ottawa?", "max_number_retrieves": 1})

INFO:dynamiq.utils.logger:Workflow 24089359-0174-4a16-b412-03cda9912a8c: execution started.
INFO:dynamiq.utils.logger:Flow 9e986524-203c-4938-aa54-f0b6b9ea8595: execution started.
INFO:dynamiq.utils.logger:Node OpenAITextEmbedder - 3993cad5-0527-44b5-8c5f-a2b768d266c2: execution started.
INFO:dynamiq.utils.logger:Node OpenAITextEmbedder - 3993cad5-0527-44b5-8c5f-a2b768d266c2: execution succeeded in 286ms.
INFO:dynamiq.utils.logger:Node WeaviateDocumentRetriever - b26603fa-cf70-4549-96fe-367c9fdf5f48: execution started.
INFO:dynamiq.utils.logger:Node WeaviateDocumentRetriever - b26603fa-cf70-4549-96fe-367c9fdf5f48: execution succeeded in 262ms.
INFO:dynamiq.utils.logger:Flow 9e986524-203c-4938-aa54-f0b6b9ea8595: execution succeeded in 608ms.
INFO:dynamiq.utils.logger:Workflow 24089359-0174-4a16-b412-03cda9912a8c: execution succeeded in 610ms.


In [None]:
retrieved_docs = result.output[wf.flow.nodes[-1].id].get("output", {}).get("documents")
print(f'Number of retrieved documents: {len(retrieved_docs)}')
print(f'Content of the most relevant document: {retrieved_docs[0]["content"]}')

Number of retrieved documents: 1
Content of the most relevant document: Ottawa is the capital of Canada.


## Using the Retriever as a Tool in an LLM Agent

We can integrate Weaviate into an AI assistant using Dynamiq’s ReActAgent.

### Define the Agent Workflow

In [None]:
from dynamiq import Workflow
from dynamiq.nodes.embedders import OpenAITextEmbedder
from dynamiq.nodes.retrievers import WeaviateDocumentRetriever, VectorStoreRetriever
from dynamiq.nodes.llms.openai import OpenAI
from dynamiq.nodes.agents.react import ReActAgent


def define_agent_workflow() -> Workflow:
    """Defines a workflow that integrates a Weaviate retriever as a tool within an AI agent."""
    text_embedder = OpenAITextEmbedder()

    retriever_node = WeaviateDocumentRetriever(
        index_name="Default",
        top_k=5
    )

    retriever_node_as_tool = VectorStoreRetriever(
        text_embedder=text_embedder,
        document_retriever=retriever_node,
        is_optimized_for_agents=True,
    )

    agent = ReActAgent(
        llm=OpenAI(model='gpt-4o'),
        tools=[retriever_node_as_tool],
        role="AI assistant with access to custom database",
    )

    wf = Workflow()
    wf.flow.add_nodes(agent)
    return wf

### Run the AI agent

In [None]:
wf = define_agent_workflow()
result = wf.run(input_data={"input": "Which countries are mentioned in the database?"})

INFO:dynamiq.utils.logger:Workflow 9dc534f5-761b-4c72-b76d-c2b3ef88f48d: execution started.
INFO:dynamiq.utils.logger:Flow f6508eec-086d-488d-8da8-7885b262a367: execution started.
INFO:dynamiq.utils.logger:Node React Agent - 13e772c1-5629-405a-82fa-fac776b85415: execution started.
INFO:dynamiq.utils.logger:Agent React Agent - 13e772c1-5629-405a-82fa-fac776b85415: started with input {'input': 'Which countries are mentioned in the database?', 'images': None, 'files': None, 'user_id': None, 'session_id': None, 'metadata': {}, 'tool_params': ToolParams(global_params={}, by_name_params={}, by_id_params={})}
INFO:dynamiq.utils.logger:Node LLM - bd019481-76b8-43e4-b9e5-9079bec5448e: execution started.
INFO:dynamiq.utils.logger:Node LLM - bd019481-76b8-43e4-b9e5-9079bec5448e: execution succeeded in 1.1s.
INFO:dynamiq.utils.logger:Agent React Agent - 13e772c1-5629-405a-82fa-fac776b85415: Loop 1, reasoning:
Thought: To find out which countries are mentioned in the database, I need to perform a q

In [None]:
print(result.output[wf.flow.nodes[-1].id].get('output', {}).get('content'))

The countries mentioned in the database are Canada and Great Britain.


## Managing the Weaviate Vector Store

You can additionaly manage Weaviate using its vector store functionality.


Below are a few basic operations you can perform, including counting, listing, and deleting documents.

In [None]:
from dynamiq.storages.vector import WeaviateVectorStore

vectore_store = WeaviateVectorStore(index_name = "Default")
print(f"Number of documents in vector store: {vectore_store.count_documents()}")

print('\nDocuments:')
list_documents = vectore_store.list_documents(include_embeddings=False)
for idx, doc in enumerate(list_documents):
  print(f'{idx}. {doc.content}')

Number of documents in vector store: 4

Documents:
0. An adjective is a word that describes or defines a noun or noun phrase.
1. Ottawa is the capital of Canada.
2. London is the capital of Great Britain.
3. A verb is a word that describes an action, state, or occurrence


In [None]:
print("Deleting documents where the metadata field 'country' is set to 'England'...")
vectore_store.delete_documents_by_filters({"field": "country", "operator": "==", "value": "England"})
print(f"Number of documents in vector store: {vectore_store.count_documents()}")

print("\nDeleting all documents from the vector store...")
vectore_store.delete_documents(delete_all=True)
print(f"Number of documents in vector store: {vectore_store.count_documents()}")

Deleting documents where the metadata field 'country' is set to 'England'...
Number of documents in vector store: 3

Deleting all documents from the vector store...
Number of documents in vector store: 0
