[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/llamaindex/agents/llama-index-weaviate-assistant-agent.ipynb)

# Agent vs No Agent

by Tuana Celik [🦋 Bsky](https://bsky.app/profile/tuana.dev), [LI](https://www.linkedin.com/in/tuanacelik/), [X](https://x.com/tuanacelik)

This recipe walks you through the difference between naive RAG, and an agent that has RAG tools. 

In this example notebook, we are using 2 Weaviate collections:

1. **Weaviate Docs:** This collection contains the technical documentation that you can find on weaviate.io. We've already created embeddings for them using `embed-multilingual-v3.0` by Cohere.
2. **GitHub Issues:** A collection which contains some of the GitHub issues on Weaviate Verba.

**To replicate the behaviour, you may choose to create and use 2 of your own Weavaite collections.**
If you choose to do so, don't forget to change the RAG tools accordingly.

We will see how providing these 2 collections and RAG over these collections as tools to an agent changes the way we are able to interact with them.

### First: Installations & Imports

In [None]:
!pip install weaviate-client python-dotenv llama-index llama-index-vector-stores-weaviate llama-index-embeddings-openai llama-index-embeddings-cohere

In [None]:
import os
import weaviate
from weaviate.classes.init import Auth
from dotenv import load_dotenv


from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core import get_response_synthesizer
from llama_index.core.response_synthesizers import BaseSynthesizer
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.cohere import CohereEmbedding


load_dotenv()

headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}

## Discover the Collections
1. GitHub Issues Collection

In [235]:
weaviate_issues_url = os.environ["WEAVIATE_ISSUES_URL"]
weaviate_issues_api_key = os.environ["WEAVIATE_ISSUES_KEY"]

issues_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_issues_url,
    auth_credentials=Auth.api_key(weaviate_issues_api_key),
    headers=headers
)

issues = issues_client.collections.get(name="example_verba_github_issues")

In [215]:
next(issues.iterator()).properties

{'issue_id': 2306015626.0,
 'issue_content': 'Adds OLLAMA_EMBED_MODEL environment variable\r\nSet this to an ollama model that supports\r\nembeddings like mxbai-embed-large\r\nSolves #171\r',
 'issue_url': 'https://github.com/weaviate/Verba/pull/178',
 'issue_labels': [],
 'issue_comments': 1.0,
 'issue_created_at': datetime.datetime(2024, 5, 20, 13, 33, 38, tzinfo=datetime.timezone.utc),
 'issue_title': 'Adds OLLAMA_EMBED_MODEL env variable',
 'issue_author': 'kjeldahl',
 'issue_updated_at': datetime.datetime(2024, 5, 27, 11, 25, 17, tzinfo=datetime.timezone.utc),
 'issue_state': 'closed'}

2. Weaviate Documentation

In [236]:
weaviate_url = os.environ["WEAVIATE_DOCS_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=Auth.api_key(weaviate_api_key),
    headers=headers
)

docs = client.collections.get(name="PageChunk")

In [217]:
next(docs.iterator()).properties

{'content': 'The list of metrics that are obtainable through Weaviate\'s metric system is constantly being expanded. The complete list is in the prometheus.go source code file. This page describes some noteworthy metrics and their uses. Typically metrics are quite granular, as they can always be aggregated later on. For example if the granularity is "shard", you could aggregate all "shard" metrics of the same "class" to obtain a class metrics, or aggregate all metrics to obtain the metric for the entire Weaviate instance. | Metric | Description | Labels | Type | | --- | --- | --- | --- | | batch_durations_ms | Duration of a single batch operation in ms. The operation label further defines what operation as part of the batch (e.g. object, inverted, vector) is being used. Granularity is a shard of a class.  | operation, class_name, shard_name | Histogram | | batch_delete_durations_ms | Duration of a batch delete in ms. The operation label further defines what operation as part of the bat

## Query the Resources

Let's say we're trying to find out something about Weaviate. For example: "Can I use Ollama for to generate answers?"

Observations:
- This questin probably makes sense to ask the `docs` collection 👇

In [218]:
response = docs.query.near_text(
    query="Can I use Ollama for to generate answers?",
    return_properties=["content", "url"],
    limit=2)

for doc in response.objects:
    print("===============================")
    print(doc.properties["content"])
    print("https://weaviate.io"+doc.properties["url"])

Finally, you can use Ollama’s generate() method to generate a response from the augmented prompt template. # Generate a response combining the prompt and data we retrieved in step 2 output = ollama.generate(   model = "llama2",   prompt = prompt_template, )  print(output['response'])  Llamas are members of the camelid family, which means they are closely related to other animals in the same family, including: 1. Vicuñas: Vicuñas are small, wild relatives of llamas and alpacas. They are found in the Andean region and are known for their soft, woolly coats. 2. Camels: Camels are large, even-toed ungulates that are closely related to llamas and vicuñas. They are found in hot, dry climates around the world and are known for their ability to go without water for long periods of time. 3. Guanacos: Guanacos are large, wild animals that are related to llamas and vicuñas. They are found in the Andean region and are known for their distinctive long necks and legs. 4. Llama-like creatures: There 

Now assume we want to find out wheter there have been any reports of certain issues. For example: "Has anyone reported weaviate issues about Ollama?"

Observations:
- This questin probably makes sense to ask the `issues` collection 👇

In [219]:
response = issues.query.near_text(
    query="Has anyone reported weaviate issues about Ollama?",
    return_properties=["issue_content", "issue_url"],
    target_vector="issue_content",
    limit=2)

for issue in response.objects:
    print("===============================")
    print(issue.properties["issue_content"])
    print(issue.properties["issue_url"])

![500 Internal Server Error](https://github.com/weaviate/Verba/assets/72214141/f97581dd-f4c5-4c13-be54-3ff7f3926735)
I using ollama docker and found this issues :(
https://github.com/weaviate/Verba/issues/134
## Description
Hey everyone,
I just cloned Verba locally and set the environment variables. I want to use Ollama for Embedding and Generation (using Llama3) but I cannot see where to choose the Ollama generator model from the settings after running my Verba instance.
Ollama is running at http://localhost:11434
Did anyone have the same problem? Please let me know what I missed.
Thanks!
## Is this a bug or a feature?
Bug
https://github.com/weaviate/Verba/issues/156


## Do RAG on Resources

Let's try the question: "How can I use generative models with Ollama and weaviate and are there any known issues about this feature?"

In [None]:
docs_vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="PageChunk", text_key="content"
)

docs_retriever = VectorStoreIndex.from_vector_store(vector_store=docs_vector_store).as_retriever(
    similarity_top_k=10,
    embed_model=CohereEmbedding(model_name="embed-multilingual-v3.0", api_key=os.environ['COHERE_APIKEY'])
)

class WeaviateDocsRAG(CustomQueryEngine):
    retriever: BaseRetriever
    response_synthesizer: BaseSynthesizer
    llm: OpenAI
    docs_qa_with_references_prompt: PromptTemplate
    docs_qa_with_references_prompt = PromptTemplate(
"""Below is the relevant cotent, followed by the URL that they are referenced from
---------------------
{docs_and_url}\n
---------------------\n
Given the context information and not prior knowledge, 
answer the query.\n"
Provide the reference(s) that the answer is generated from.\n
Query: {query_str}\n
Answer: """
)
    def custom_query(self, query_str: str):
        nodes = self.retriever.retrieve(query_str)

        context_and_references_str = ""
        for node in nodes:
            content = node.node.get_content()
            reference = "https://weaviate.io"+node.node.metadata['properties']['url']
            context_and_references_str += f"\nContent: {content}\nURL:{reference}"
        response = self.llm.complete(
            self.docs_qa_with_references_prompt.format(docs_and_url=context_and_references_str, query_str=query_str)
        )

        return str(response)

synthesizer = get_response_synthesizer(response_mode="compact")
llm = OpenAI(model="gpt-4o-mini")

query_engine = WeaviateDocsRAG(
    retriever=docs_retriever,
    response_synthesizer=synthesizer,
    llm=llm,
)

In [221]:
response = query_engine.query("How can I use use generative models with Ollama with weaviate?")

print(str(response))

To use generative models with Ollama in Weaviate, you need to follow these steps:

1. **Set Up a Locally Hosted Weaviate Instance**: Ensure that you have a locally hosted Weaviate instance, as the integration requires hosting your own Ollama models. You can find guidance on configuring Weaviate with Ollama models on the relevant integration page.

2. **Configure Weaviate with the Ollama Generative AI Integration**: Your Weaviate instance must be configured with the `generative-ollama` module. This integration is not available for Weaviate Cloud (WCD) serverless instances, as it requires a locally running Ollama instance. For self-hosted users, check the cluster metadata to verify if the module is enabled and follow the guide to enable it.

3. **Access the Ollama Endpoint**: Ensure that your Weaviate instance can access the Ollama endpoint. If you are using Docker, specify the Ollama endpoint using the `host.docker.internal` alias to access the host machine from within the container.

4

In [238]:
issues_vector_store = WeaviateVectorStore(
    weaviate_client=issues_client, index_name="Example_verba_github_issues", text_key="issue_content"
)

issues_retriever = VectorStoreIndex.from_vector_store(vector_store=issues_vector_store).as_retriever(
    similarity_top_k=10,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002", api_key=os.environ['OPENAI_APIKEY'])
)

class WeaviateIssuesRAG(CustomQueryEngine):
    retriever: BaseRetriever
    response_synthesizer: BaseSynthesizer
    llm: OpenAI
    issues_prompt: PromptTemplate
    issues_prompt = PromptTemplate(
"""Below are the relevant GitHub issues, followed by their URL and status
---------------------
{issues}\n
---------------------\n
Given the content of the issues information and not prior knowledge, 
answer the query.\n"
Provide the reference(s) that the answer is generated from.\n
Query: {query_str}\n
Answer: """
)
    def custom_query(self, query_str: str):
        nodes = self.retriever.retrieve(query_str)

        context_and_references_str = ""
        for node in nodes:
            content = node.node.get_content()
            reference = node.node.metadata['properties']['issue_url']
            status = node.node.metadata['properties']['issue_state']
            context_and_references_str += f"\nContent: {content}\nURL:{reference}\nSTATUS:{status}"
        response = self.llm.complete(
            self.issues_prompt.format(issues=context_and_references_str, query_str=query_str)
        )

        return str(response)

synthesizer = get_response_synthesizer(response_mode="compact")
llm = OpenAI(model="gpt-4o-mini")

issues_query_engine = WeaviateIssuesRAG(
    retriever=issues_retriever,
    response_synthesizer=synthesizer,
    llm=llm,
)

In [223]:
response = issues_query_engine.query("Are there any known open about using Ollama models?")

print(str(response))

Based on the provided information, there are no open issues regarding the use of Ollama models. All the listed issues and pull requests related to Ollama have a status of "closed." 

References:
- https://github.com/weaviate/Verba/issues/81 (closed)
- https://github.com/weaviate/Verba/issues/19 (closed)
- https://github.com/weaviate/Verba/issues/156 (closed)
- https://github.com/weaviate/Verba/issues/209 (closed)
- https://github.com/weaviate/Verba/issues/218 (closed)
- https://github.com/weaviate/Verba/issues/161 (closed)
- https://github.com/weaviate/Verba/issues/12 (closed)


## Create a Weaviate Assistant Agent

In [239]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import QueryPlanTool

query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="weaviate_docs",
            description="Technical documentation for Weaviate useful for answering general questions about weaviate.",
        ),
    ),
    QueryEngineTool(
        query_engine=issues_query_engine,
        metadata=ToolMetadata(
            name="weaviate_github_issues",
            description="A list of GitHub issues and pull requests for weaviate. Useful to refer to ongoing, known issues or upcoming features",
        ),
    ),
]

agent = OpenAIAgent.from_tools(query_engine_tools,
                               max_function_calls=10, 
                               llm=OpenAI(model="gpt-4o-mini"), verbose=True)


In [242]:
response = agent.chat("Tell me how I can use Ollama models and let me know if there are any issues I should know of.")

Added user message to memory: Tell me how I can use Ollama models and let me know if there are any issues I should know of.
=== Calling Function ===
Calling function: weaviate_docs with args: {"input": "How to use Ollama models?"}
Got output: To use Ollama models, you need to follow these steps:

1. **Install Ollama**: First, download and install Ollama for your operating system. This will set up a web server on your machine for inference through an API.

2. **Pull Models**: Open a terminal and use the `ollama pull <model-name>` command to download the desired models. For example, you can pull the Llama 3 model with the command:
   ```
   ollama pull llama3:latest
   ```
   You can also pull embedding models, such as:
   ```
   ollama pull snowflake-arctic-embed
   ```

3. **Run Models**: Once the models are downloaded, you can run them using the command:
   ```
   ollama run <model-name>
   ```

4. **Integrate with Weaviate**: If you are using Weaviate, you can configure it to use Oll

In [243]:
print(response.response)

### How to Use Ollama Models

1. **Install Ollama**: Download and install Ollama for your operating system. This will set up a web server on your machine for inference through an API.

2. **Pull Models**: Open a terminal and use the `ollama pull <model-name>` command to download the desired models. For example, to pull the Llama 3 model, use:
   ```bash
   ollama pull llama3:latest
   ```
   You can also pull embedding models, such as:
   ```bash
   ollama pull snowflake-arctic-embed
   ```

3. **Run Models**: Once the models are downloaded, you can run them using the command:
   ```bash
   ollama run <model-name>
   ```

4. **Integrate with Weaviate**: If you are using Weaviate, configure it to use Ollama's generative or embedding models for retrieval-augmented generation (RAG) or vectorization. This involves setting up a Weaviate collection or vector index to utilize the models via your local Ollama instance.

5. **Docker Setup**: If you want to run Ollama and Weaviate locally, you m

In [244]:
issues_client.close()
client.close()