[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/weaviate-features/model-providers/meta/rag_llama_3_ollama.ipynb)

# Local RAG with Ollama and Weaviate
## Using Weaviate integration

This example shows how to use the text2vec-ollama as well the generative-ollama 

## Setup 
1. Download and install Ollama for your operating system: https://ollama.com/download
2. `pip` install the Python library to generate vector embeddings from the model  with `pip install ollama`. (REST API or JavaScript library also available)

In [None]:
#!pip install -U weaviate-client

3. Pull relevant LLM and [embedding model](https://ollama.com/blog/embedding-models)

In [None]:
#!ollama pull llama3
#!ollama pull all-minilm # mxbai-embed-large

4. Optional: Test if the vectorizer works (`ollama run llama3`)

In [1]:
!curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm","prompt": "Llamas are members of the camelid family"}'

{"embedding":[-0.37209033966064453,0.41945910453796387,-0.42962121963500977,0.301665335893631,-0.47567272186279297,-0.006124582141637802,-0.18393747508525848,-0.4657520353794098,0.024121297523379326,0.41036325693130493,0.2584318220615387,-0.3501233458518982,0.3749215006828308,-0.17598895728588104,0.0711904764175415,0.07458031177520752,-0.01956084743142128,0.0819740742444992,-0.11168335378170013,-0.09002675116062164,-0.14640718698501587,-0.003168143332004547,0.3027119040489197,0.036542121320962906,-0.38922950625419617,0.6812086701393127,0.022584855556488037,-0.14138051867485046,0.007853759452700615,0.056560762226581573,-0.3321329951286316,-0.17216357588768005,0.3234858512878418,-0.07221385836601257,-0.25651711225509644,0.26626133918762207,0.2913508415222168,0.4247307777404785,0.531809389591217,0.2818898856639862,0.3869876265525818,0.10163667798042297,0.44477325677871704,-0.4057876169681549,0.18915584683418274,-0.46048250794410706,-0.1599886119365692,0.4838547110557556,0.3746705055236816

5. Optional: Test if the llm/generative model is working

In [2]:
!curl http://localhost:11434/api/generate -d '{"model": "llama3","prompt":"What is a vector database?", "stream": false }'

{"model":"llama3","created_at":"2024-05-30T23:18:23.059448Z","response":"A vector database, also known as a vector store or vector index, is a type of database that specializes in storing and querying high-dimensional vectors. These vectors are typically dense numerical arrays used to represent objects, such as images, documents, or audio files.\n\nVector databases are designed to efficiently store and retrieve large numbers of vectors, often with millions or billions of dimensions (features). They provide fast query capabilities, such as:\n\n1. **Nearest Neighbor Search**: Find the most similar vector(s) to a given query vector.\n2. **Similarity Search**: Retrieve all vectors that have a similarity score above a certain threshold.\n3. **Range Search**: Return all vectors within a certain distance or range from a query vector.\n\nVector databases are commonly used in various applications, such as:\n\n1. **Computer Vision**: Object recognition, image search, and facial recognition.\n2. 

In [3]:
import weaviate
client = weaviate.connect_to_embedded(
    environment_variables={"ENABLE_MODULES": "text2vec-ollama,generative-ollama"},
    version="1.25.1"
)

Started /Users/dudanogueira/.cache/weaviate-embedded: process ID 37293


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-05-30T20:18:25-03:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-05-30T20:18:25-03:00"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-05-30T20:18:25-03:00"}
{"level":"info","msg":"open cluster service","servers":{"Embedded_at_8079":52783},"time":"2024-05-30T20:18:25-03:00"}
{"address":"192.168.28.99:52784","level":"info","msg":"starting cloud rpc server ...","time":"2024-05-30T20:18:25-03:00"}
{"level":"info","msg":"starting raft sub-system ...","time":"2024-05-30T20:18:25-03:00"}
{"address":"192.168.28.99:52783","level":"info","msg":"tcp transport","tcpMa

{"action":"bootstrap","level":"info","msg":"node reporting ready, node has probably recovered cluster from raft config. Exiting bootstrap process","time":"2024-05-30T20:18:28-03:00"}
{"action":"telemetry_push","level":"info","msg":"telemetry started","payload":"\u0026{MachineID:db56bdb3-f923-4dc6-ad66-9cfb547dd405 Type:INIT Version:1.25.1 Modules:generative-ollama,text2vec-ollama NumObjects:0 OS:darwin Arch:arm64}","time":"2024-05-30T20:18:28-03:00"}


Here we can check the meta information for our Embedded instance.

In [4]:
client.get_meta()

{'hostname': 'http://127.0.0.1:8079',
 'modules': {'generative-ollama': {'documentationHref': 'https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion',
   'name': 'Generative Search - Ollama'},
  'text2vec-ollama': {'documentationHref': 'https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings',
   'name': 'Ollama Module'}},
 'version': '1.25.1'}

## NOTE
Below we use `http://localhost:11434` for calling ollama models.

As we are using **Weaviate Embedded instead of Docker**, and we assume here your ollama instalation is on the host, we should call ollama, from Weaviate, at `http://localhost:11434`

If your are **running Weaviate as a docker** container, the api_endpoint must be `http://host.docker.internal:11434`.

In [5]:
from weaviate import classes as wvc
client.collections.delete("OllamaCollection")
# lets create the collection, specifing our base url accordingling
collection = client.collections.create(
    "OllamaCollection",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(
        api_endpoint="http://localhost:11434",
        model="all-minilm"
    ),
    generative_config=wvc.config.Configure.Generative.ollama(
        api_endpoint="http://localhost:11434",
        model="llama3"
    )
)

{"level":"info","msg":"Created shard ollamacollection_wLgUWG5NS3P7 in 1.306083ms","time":"2024-05-30T20:18:34-03:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-05-30T20:18:34-03:00","took":54333}


In [6]:
# Let's check our collection
print(collection.config.get().vectorizer_config)
print(collection.config.get().generative_config)

_VectorizerConfig(vectorizer=<Vectorizers.TEXT2VEC_OLLAMA: 'text2vec-ollama'>, model={'apiEndpoint': 'http://localhost:11434', 'model': 'all-minilm'}, vectorize_collection_name=True)
_GenerativeConfig(generative=<GenerativeSearches.OLLAMA: 'generative-ollama'>, model={'apiEndpoint': 'http://localhost:11434', 'model': 'llama3'})


## Step 1: Add data

In [7]:
documents = [
  "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
  "Llamas are vegetarians and have very efficient digestive systems",
  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
]

In [8]:
# store each document in a vector embedding database
with collection.batch.dynamic() as batch:
  for i, d in enumerate(documents):
    batch.add_object(
        properties = {"text" : d},
    )

In [9]:
result = collection.generate.fetch_objects()
for object in result.objects:
    print(object.properties)

{'text': 'Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall'}
{'text': 'Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands'}
{'text': 'Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight'}
{'text': 'Llamas are vegetarians and have very efficient digestive systems'}
{'text': 'Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old'}
{'text': "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels"}


## Step 2: Retrieve
Next, add the code to retrieve the most relevant document given an example prompt:

In [10]:
# Now we retrieve our data
results = collection.query.near_text(
    query="What animals are llamas related to?",
    limit=1
)
print(results.objects[0].properties)

{'text': "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels"}


## Step 3: Generate
Lastly, use the prompt and the document retrieved in the previous step to generate an answer!

In [11]:
results = collection.generate.near_text(
    query="What animals are llamas related to?",
    limit=5,
    grouped_task="Answer the question: What animals are llamas related to?"
)
print(results.generated)

According to the text, llamas are related to vicuñas and camels, as they are members of the camelid family.


Note that we are both passing the query as a python variable as well as {text} (double {{ to scape)

In [14]:
query="When Lamas were first domesticated and how long do they live?"
results = collection.generate.near_text(
    query=query,
    limit=2,
    grouped_task=f"Answer the question: {query}? only using the given context in {{text}}"
)
print(results.generated)

Based on the given context, here are the answers:

1. When Lamas were first domesticated: They were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands.
2. How long do they live?: Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old.


In [15]:
for object in results.objects:
    print(object.properties)

{'text': 'Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old'}
{'text': 'Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands'}
