# üß† Building a Retrieval-Augmented Generation (RAG) System with Milvus

This repository demonstrates how to build a **Retrieval-Augmented Generation (RAG)** pipeline using **Milvus**, a popular open-source vector database.  

## What you‚Äôll learn
- How to create and use embeddings for unstructured text.
- How to store and query vectors efficiently with Milvus.
- How to connect a Large Language Model (LLM) with Milvus to build a RAG system.
- How to run simple queries and generate answers grounded in retrieved context.

## Why RAG?
LLMs are powerful, but they **hallucinate** and have limited knowledge (cutoff dates).  
RAG overcomes these challenges by combining:
- **Retrieval** ‚Üí pull relevant facts from an external knowledge base.  
- **Generation** ‚Üí let the LLM generate natural, contextual answers.  
---

> ‚ö°Ô∏è By the end, you‚Äôll have a working RAG pipeline that you can adapt for use cases like product search, customer support, or knowledge management.


## üîß Installation

Install the required dependencies:  
- `pymilvus` for interacting with Milvus  
- `openai` for embeddings and LLMs  
-  supporting libraries for dataset handling

In [None]:
# Install necessary packages for Milvus and OpenAI

! pip install --upgrade pymilvus openai requests tqdm
! pip install 'pymilvus[milvus_lite]'

Setup

In [2]:
import os
import openai

os.environ["OPENAI_API_KEY"] = ""

### Prepare the data

We use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.

These are present in repository at  `data/milvus_docs`.

We load all markdown files from the folder. 

For each document, we just simply use "# " to separate the content in the file, which can separate each question-pair. 

In [1]:
from glob import glob

text_lines = []

for file_path in glob("data/milvus_docs/*.md"):
    with open(file_path, "r") as file:
        file_text = file.read()

    text_lines += file_text.split("# ")

### Prepare the Embedding Model

We initialize the OpenAI client to prepare the embedding model.

In [3]:
from openai import OpenAI

openai_client = OpenAI()

## üßÆ Creating Embeddings

We use OpenAI‚Äôs embedding model to convert each text/document into a high-dimensional vector.  
These vectors capture semantic meaning and make similarity search possible.

Define a function to generate text embeddings using OpenAI client. We use the [text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) model as an example.

In [4]:
def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")
        .data[0]
        .embedding
    )

Generate a test embedding and print its dimension and first few elements.

In [5]:
test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])

1536
[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]


## üîó Connecting to Milvus

Milvus will act as our **vector database**, storing and searching embeddings efficiently.

## üóÇÔ∏è Creating a Collection in Milvus

A collection in Milvus is like a table in a relational database.  

In [8]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")

collection_name = "my_rag_collection"

  from pkg_resources import DistributionNotFound, get_distribution


> As for the argument of `MilvusClient`:
> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.
> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.
> - Alternatively, Element offers a cloud-hosted version of Milvus (note: this option is not available in the Sandbox environment).

Check if the collection already exists and drop it if it does.

In [9]:
if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

Create a new collection with specified parameters.

If we don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.

In [11]:
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="COSINE",  # Cosine similarity
)

### Generate embeddings and insert data into Milvus
Iterate through the text lines, create embeddings, and then insert the data into Milvus.
This builds our searchable knowledge base.


In [12]:
from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line), "text": line})

milvus_client.insert(collection_name=collection_name, data=data)

Creating embeddings: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 72/72 [00:40<00:00,  1.78it/s]


{'insert_count': 72, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'cost': 0}

## Build RAG

### Step 1 - Retrieve data for a query

Let's specify a frequent question about Milvus.

In [13]:
question = "How is data stored in milvus?"


## üîç Querying Milvus

To answer user questions, we:  
1. Convert the query into an embedding  
2. Search Milvus for the top-k similar vectors  
3. Retrieve the most relevant documents  

In [14]:
# üîç Perform similarity search in Milvus to retrieve top-3 matches

search_res = milvus_client.search(
    collection_name=collection_name,
    data=[
        emb_text(question)
    ],  # Use the `emb_text` function to convert the question to an embedding vector
    limit=3,  # Return top 3 results
    search_params={"metric_type": "COSINE", "params": {}},  # Inner product distance
    output_fields=["text"],  # Return the text field
)

Let's take a look at the search results of the query


In [15]:
import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

[
    [
        " Where does Milvus store data?\n\nMilvus deals with two types of data, inserted data and metadata. \n\nInserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).\n\nMetadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.\n\n###",
        0.7826632261276245
    ],
    [
        "How does Milvus handle vector data types and precision?\n\nMilvus supports Binary, Float


### Step 2 - Augment original user query with additional context

Convert the retrieved documents into a string format.

In [18]:
context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
print(context)

 Where does Milvus store data?

Milvus deals with two types of data, inserted data and metadata. 

Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).

Metadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.

###
How does Milvus handle vector data types and precision?

Milvus supports Binary, Float32, Float16, and BFloat16 vector types.

- Binary vectors: Store binary data a

Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus.

In [19]:
# ü§ñ Combine retrieved context with LLM to generate final answer


SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

# Final augmented query to be sent to the LLM
print(USER_PROMPT)


Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
 Where does Milvus store data?

Milvus deals with two types of data, inserted data and metadata. 

Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).

Metadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.

###
How does Milvus hand

### Step 3 - Generate response using a Large Language Model
Use OpenAI LLM to generate a response based on the prompts.

In [None]:
response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)

Milvus stores two types of data: inserted data and metadata. 

Inserted data, which includes vector data, scalar data, and collection-specific schema, is stored in persistent storage as incremental logs. Milvus supports multiple object storage backends for this purpose, including MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage.

Metadata is generated within Milvus and stored in etcd, with each Milvus module having its own specific metadata.


## ‚úÖ Summary

In this notebook, we built a simple yet powerful RAG pipeline:
- Stored embeddings in Milvus  
- Queried for relevant context  
- Used LLM to generate grounded answers  

This foundation can be extended to real-world use cases like:
- Product search  
- Customer support  
- Internal knowledge management  
