[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/weaviate-features/model-providers/meta/rag_llama_2_ollama.ipynb)

# Local RAG with Gemma 3 270M and Weaviate

This notebook walks you through building a local RAG pipeline using `Gemma 3 270M` with Ollama and Weaviate.  

## Setup 
1. Download and install Ollama for your operating system: https://ollama.com/download
2. `pip` install the Python library to generate vector embeddings from the model  with `pip install ollama`. (REST API or JavaScript library also available)

In [None]:
!pip install ollama -q
!pip install -U weaviate-client -q

3. Pull relevant LLM and [embedding model](https://ollama.com/blog/embedding-models)

In [None]:
!ollama pull gemma3:270m
!ollama pull snowflake-arctic-embed

4. Test the connection to `snowflake arctic embed` and `gemma 3 270M` 

In [6]:
import ollama

ollama.embeddings(model="snowflake-arctic-embed", 
                  prompt= "Vector databases store and search high-dimensional embeddings, making it possible to find meaningfully similar information in milliseconds.")

EmbeddingsResponse(embedding=[0.13748186826705933, -0.516970157623291, -0.49420779943466187, 0.04724853113293648, 0.26531967520713806, 0.42371881008148193, 0.26317495107650757, 0.8286728262901306, 0.13512548804283142, -0.3822863698005676, 0.14286038279533386, 0.3222828209400177, 0.3947080373764038, 0.8898833990097046, 0.29464420676231384, 0.05239919573068619, 0.4732877314090729, -0.039116669446229935, -0.4511941969394684, -0.2753143906593323, 0.5590512752532959, 0.6304900050163269, 0.7312970161437988, -0.08470149338245392, 0.1440456509590149, -0.12175796926021576, -0.03627774864435196, -0.5892104506492615, -0.7946105003356934, 0.47210487723350525, -0.4833589494228363, -0.464603453874588, 0.006660632789134979, 0.48332729935646057, 0.5913648009300232, 0.04839450120925903, -0.7773759365081787, -0.8896509408950806, 0.4875617027282715, -0.38727307319641113, -0.31987324357032776, -0.07857022434473038, 0.010622784495353699, 1.625440001487732, -0.3959883153438568, 0.9545913934707642, 0.3035925

In [7]:
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='gemma3:270m', messages=[
  {
    'role': 'user',
    'content': 'How are you?',
  },
])
print(response.message.content)

I am doing well, thank you for asking! How are you today?



## Step 1: Connect to Weaviate

**Connect to your running Weaviate cluster (choose one option)**
1. You can create a 14-day free sandbox on [WCD](https://console.weaviate.cloud/)
2. [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded)
3. [Local deployment](https://weaviate.io/developers/weaviate/installation/docker-compose#starter-docker-compose-file)
4. [Other options](https://weaviate.io/developers/weaviate/installation)

Option 1: Weaviate Cloud

In [None]:
import weaviate
import weaviate.classes as wvc
from weaviate.classes.config import Property, DataType

WCD_URL = os.environ["WEAVIATE_URL"] # Replace with your Weaviate cluster URL
WCD_AUTH_KEY = os.environ["WEAVIATE_AUTH"] # Replace with your cluster auth key

# Weaviate Cloud Deployment
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WCD_URL,
    auth_credentials=weaviate.auth.AuthApiKey(WCD_AUTH_KEY),
)

print(client.is_ready())

True


Option 2: Weaviate Embedded

In [None]:
# client = weaviate.connect_to_embedded()

# print(client.is_ready())

Option 3: Weaviate local host

In [None]:
# client = weaviate.connect_to_local()

# client.close()

## Step 2: Define Weaviate collection

In [10]:
import weaviate.classes.config as wc

# Note: in practice, you shouldn't rerun this cell, as it deletes your data
# in "vector_db_facts", and then you need to re-import it again.

# Delete the collection if it already exists
if (client.collections.exists("vector_db_facts")):
    client.collections.delete("vector_db_facts")

client.collections.create(
    name="vector_db_facts",

    properties=[ # defining properties (data schema) is optional
        wc.Property(name="text", data_type=wc.DataType.TEXT), 
    ]
)

print("Successfully created collection: vector_db_facts.")

Successfully created collection: vector_db_facts.


## Step 3: Import your data

In [1]:
documents = [
    "Vector databases excel at finding semantically similar items by comparing high-dimensional embeddings instead of exact keyword matches.",
    "They are essential for powering Retrieval-Augmented Generation (RAG) pipelines, enabling LLMs to pull in relevant facts before answering.",
    "Unlike traditional databases, vector databases store data as dense numerical vectors rather than rows of text or numbers.",
    "Modern vector databases support hybrid search, combining semantic search with keyword filters for more accurate results.",
    "They often include features like approximate nearest neighbor (ANN) indexing to make similarity search extremely fast at scale.",
    "Integrating a vector database with an LLM allows context retrieval from millions of documents in milliseconds.",
    "Some vector databases, like Weaviate, offer built-in modules for embedding generation, filtering, and real-time updates."
]

In [8]:
collection = client.collections.get("vector_db_facts")

# store each sentence and vector in the database
with collection.batch.fixed_size(batch_size=5) as batch:
  for i, d in enumerate(documents):
    response = ollama.embeddings(model="snowflake-arctic-embed", prompt=d)
    embedding = response["embedding"]
    batch.add_object(
        properties = {"text" : d},
        vector = embedding,
    )

In [9]:
collection.query.fetch_objects(limit=1, include_vector=True)

QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('016df96e-d72a-48e5-8f2c-c915ab5cac6b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'text': 'They are essential for powering Retrieval-Augmented Generation (RAG) pipelines, enabling LLMs to pull in relevant facts before answering.'}, references=None, vector={'default': [0.8140762448310852, -0.5702143907546997, -0.8066444396972656, 0.3504329323768616, 0.18915557861328125, 0.17941194772720337, 0.23056617379188538, -0.30798184871673584, 0.20502254366874695, -0.6825118660926819, 0.9177469611167908, -0.2981626093387604, 0.6611047983169556, 0.40573132038116455, 0.5498625040054321, -0.06456507742404938, 1.159651279449463, -0.36653849482536316, 0.38673311471939087, 0.059227339923381805, 0.5635879635810852, 0.07160080224275589, 0.2358045130968094, -0.4472408890724182, 0.32264775037765503, 0.19398479163646698,

## Step 4: Retrieve
Retrieve the most relevant document given an example prompt

In [10]:
# an example prompt
prompt = "What is a vector database and how is it different from a traditional database?"

# generate an embedding for the prompt and retrieve the most relevant doc
response = ollama.embeddings(
  prompt=prompt,
  model="snowflake-arctic-embed"
)

results = collection.query.near_vector(near_vector=response["embedding"],
                             limit=1)

data = results.objects[0].properties['text']
print(data)

Unlike traditional databases, vector databases store data as dense numerical vectors rather than rows of text or numbers.


## Step 5: Generate
We'll now use the prompt and document retrieved from the previous step to generate an answer

In [16]:
# generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
  model="gemma3:270m",
  prompt=f"Using this data: {data}. Give a detailed answer to the prompt: {prompt}"
)

print(output['response'])

Okay, let's break down the concept of a vector database and compare it to a traditional database.

**What is a Vector Database?**

A vector database is a database system that stores data in a **dense, ordered, and contiguous sequence** of numbers.  Instead of storing data in rows of text or numbers, vector databases store data as **vectors**.  Think of it like a collection of individual objects, each with its own unique characteristics and properties.

**Key Characteristics of a Vector Database:**

*   **Dense:**  The data is organized in a way that allows for efficient storage and retrieval of information.
*   **Ordered:** The data is typically stored in a specific order, often in a specific order (e.g., in a list, a sequence, or a specific order).
*   **Contiguous:**  Data points are stored in a contiguous sequence, meaning they are linked together in a way that makes it easy to find and retrieve information.
*   **Numerical:**  The data is typically represented as numerical values, 