# Local RAG with Ollama and Weaviate

This example is based on a post in the Ollama blog titled "[Embedding models](https://ollama.com/blog/embedding-models)".

## Setup 
1. Download and install Ollama for your operating system: https://ollama.com/download
2. `pip` install the Python library to generate vector embeddings from the model  with `pip install ollama`. (REST API or JavaScript library also available)

In [1]:
# pip install ollama
# pip install -U weaviate-client

3. Pull relevant LLM and [embedding model](https://ollama.com/blog/embedding-models)

In [2]:
# ollama pull llama2
# ollama pull all-minilm # mxbai-embed-large

4. Optional: Test if it works (`ollama run llama2`)

In [3]:
import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])


The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering.

As a result of this scattering, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance. The blue color is most visible in the morning and evening when the sun is low on the horizon because the light has to travel through more of the atmosphere to reach our eyes, allowing more time for the blue light to be scattered.

It's worth noting that the blue color of the sky can vary depending on a number of factors, including the amount of dust and water vapor in the atmosphere, which can absorb or scatter certain wavelengths of light. For example, during sunrise and sunset, when the sun is low on the horizon, th

In [4]:
ollama.embeddings(model="all-minilm", 
                  prompt= "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels")

{'embedding': [-0.021702591329813004,
  0.15009357035160065,
  -0.3400360345840454,
  0.09298147261142731,
  -0.39127838611602783,
  0.06238943338394165,
  -0.33130955696105957,
  -0.2786891460418701,
  0.09664438664913177,
  0.13602189719676971,
  0.4002634286880493,
  -0.41067466139793396,
  0.21043244004249573,
  0.031082946807146072,
  0.06574380397796631,
  -0.1371842920780182,
  -0.08826376497745514,
  0.023083731532096863,
  -0.2947360873222351,
  0.28975099325180054,
  0.07283814996480942,
  0.01543290726840496,
  0.2452796846628189,
  0.14312325417995453,
  -0.2776225209236145,
  0.5459028482437134,
  -0.09798812866210938,
  0.02923489920794964,
  0.02753671631217003,
  0.15834477543830872,
  -0.29225167632102966,
  0.003948435187339783,
  0.014535926282405853,
  -0.03330737352371216,
  -0.2809745669364929,
  0.23154978454113007,
  0.23709622025489807,
  0.612492024898529,
  0.40364035964012146,
  0.3277810215950012,
  0.09551140666007996,
  -0.007477905601263046,
  0.33145445

## Step 1: Generate embeddings

In [5]:
documents = [
  "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
  "Llamas are vegetarians and have very efficient digestive systems",
  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
]

In [6]:
import weaviate
import weaviate.classes as wvc
from weaviate.classes.config import Property, DataType

client = weaviate.connect_to_embedded()

print(client.is_ready())

Started /Users/leonie/.cache/weaviate-embedded: process ID 32850


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-04-09T13:30:52+02:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-04-09T13:30:52+02:00"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-04-09T13:30:52+02:00"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50050","time":"2024-04-09T13:30:52+02:00"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-04-09T13:30:52+02:00"}


True


In [7]:
collection_name = "docs"

if client.collections.exists(collection_name):
    client.collections.delete(collection_name)

collection = client.collections.create(
    collection_name,
    properties=[
        Property(name="text", data_type=DataType.TEXT),
    ],
)

{"level":"info","msg":"Created shard docs_SN8loOvzlYv7 in 1.157084ms","time":"2024-04-09T13:30:52+02:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-04-09T13:30:52+02:00","took":38083}


In [8]:
import ollama

# store each document in a vector embedding database
with collection.batch.dynamic() as batch:
  for i, d in enumerate(documents):
    response = ollama.embeddings(model="all-minilm", prompt=d)
    embedding = response["embedding"]
    batch.add_object(
        properties = {"text" : d},
        vector = embedding,
    )

{"level":"info","msg":"Completed loading shard myexampleindex_XGMjGqT60mbO in 2.642125ms","time":"2024-04-09T13:30:52+02:00"}
{"level":"info","msg":"Completed loading shard llamaindex_dWivqPiChdO8 in 5.01475ms","time":"2024-04-09T13:30:52+02:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-04-09T13:30:52+02:00","took":225208}
{"level":"info","msg":"Completed loading shard mycontent_oiMgIfNpvwWZ in 706.541µs","time":"2024-04-09T13:30:52+02:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-04-09T13:30:52+02:00","took":33000}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-04-09T13:30:52+02:00","took":8005667}
{"level":"info","msg":"Completed loading shard llamaindex_filter_fKXpSjF

In [9]:
collection.query.fetch_objects(limit=1, include_vector=True)

QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('56f66958-dbee-469a-a170-ebfc1b65172e'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'text': 'Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight'}, references=None, vector={'default': [0.20390161871910095, 0.23614734411239624, -0.38450300693511963, 0.41218075156211853, -0.3761504888534546, -0.13563679158687592, 0.05481676757335663, -0.08999389410018921, -0.39202630519866943, 0.5020553469657898, 0.33837682008743286, -0.6673769950866699, 0.1679535210132599, 0.3123660385608673, 0.025210224092006683, 0.210839182138443, 0.3430388867855072, -0.213998943567276, -0.30820319056510925, 0.4516986906528473, 0.21469347178936005, -0.06823334842920303, -0.0848790630698204, 0.07997660338878632, -0.3046180009841919, 0.23734384775161743, -0.40219777822494507, -0.0084506832063198

## Step 2: Retrieve
Next, add the code to retrieve the most relevant document given an example prompt:

In [10]:
# an example prompt
prompt = "What animals are llamas related to?"

# generate an embedding for the prompt and retrieve the most relevant doc
response = ollama.embeddings(
  prompt=prompt,
  model="all-minilm"
)

results = collection.query.near_vector(near_vector=response["embedding"],
                             limit=1)

data = results.objects[0].properties['text']
print(data)

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels


## Step 3: Generate
Lastly, use the prompt and the document retrieved in the previous step to generate an answer!

In [11]:
# generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
  model="llama2",
  prompt=f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output['response'])


Llamas are members of the camelid family, which means they are closely related to other animals in the same family, including:

1. Vicuñas: Vicuñas are small, wild relatives of llamas and alpacas. They are found in the Andean region and are known for their soft, woolly coats.
2. Camels: Camels are large, even-toed ungulates that are closely related to llamas and vicuñas. They are found in hot, dry climates around the world and are known for their ability to go without water for long periods of time.
3. Guanacos: Guanacos are large, wild animals that are related to llamas and vicuñas. They are found in the Andean region and are known for their distinctive long necks and legs.
4. Llama-like creatures: There are also other animals that are sometimes referred to as "llamas," such as the lama-like creatures found in China, which are actually a different species altogether. These creatures are not closely related to vicuñas or camels, but are sometimes referred to as "llamas" due to their p