# LlamaIndex + Qdrant RAG Pipeline

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/thierrypdamiba/qdrant-etl-cookbook/blob/main/notebooks/agents/llamaindex_qdrant.ipynb)

Build a RAG pipeline using LlamaIndex's document loaders, chunking, and query engine backed by Qdrant.

**Requirements:** Set `OPENAI_API_KEY` environment variable.

In [None]:
!pip install -q llama-index llama-index-vector-stores-qdrant qdrant-client

In [None]:
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

In [None]:
client = QdrantClient(":memory:")

vector_store = QdrantVectorStore(
    client=client,
    collection_name="llamaindex_demo",
)

In [None]:
# Create documents (replace with your own data source)
documents = [
    Document(text="Qdrant is a vector similarity search engine and database. It provides a production-ready service with a convenient API to store, search, and manage points with payloads."),
    Document(text="HNSW (Hierarchical Navigable Small World) is the primary index used in Qdrant. Key parameters are m (connections per node) and ef_construct (search depth during build)."),
    Document(text="Quantization reduces memory footprint. Scalar quantization (int8) gives 4x reduction. Binary quantization gives 32x reduction but works best with high-dimensional vectors."),
    Document(text="Multi-tenancy in Qdrant uses payload-based filtering. Create a keyword index on tenant_id for fast isolation between tenants."),
    Document(text="Hybrid search in Qdrant combines dense vector similarity with sparse BM25 keyword matching for better retrieval quality."),
]

In [None]:
# Build index from documents
index = VectorStoreIndex.from_documents(
    documents,
    vector_store=vector_store,
)
print(f"Indexed {len(documents)} documents")

In [None]:
# Query the index
query_engine = index.as_query_engine(similarity_top_k=3)

response = query_engine.query("How can I reduce memory usage in Qdrant?")
print("Answer:", response)
print("\nSources:")
for node in response.source_nodes:
    print(f"  Score: {node.score:.4f} | {node.text[:80]}...")

In [None]:
# Chat mode with memory
chat_engine = index.as_chat_engine(chat_mode="condense_question")

response1 = chat_engine.chat("What indexing does Qdrant use?")
print("Q1:", response1)

response2 = chat_engine.chat("What are the key parameters for it?")
print("\nQ2 (follow-up):", response2)