# Day 13: Vector Databases

Your embeddings live in memory.

Restart your script and they're gone.

**Vector databases** persist embeddings and make search fast at scale.

In [None]:
%pip install chromadb

## Setup

In [35]:
# pip install chromadb
import chromadb
from google import genai
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path='../.env')
API_KEY = os.environ["GEMINI_API_KEY"]
client = genai.Client(api_key=API_KEY)

## Create a Vector Database

In [36]:
# Create a persistent database (saves to disk)
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# Create a collection (like a table)
collection = chroma_client.get_or_create_collection(name="my_documents")

print(f"âœ… Collection created: {collection.name}")

âœ… Collection created: my_documents


## Add Documents

In [37]:
documents = [
    "Python is a popular programming language for data science.",
    "JavaScript powers most websites and web applications.",
    "Machine learning models learn patterns from data.",
    "Docker containers package applications for deployment.",
    "Neural networks are inspired by the human brain."
]

# Generate embeddings with Gemini
embeddings = []
for doc in documents:
    response = client.models.embed_content(model="gemini-embedding-001", contents=doc)
    embeddings.append(response.embeddings[0].values)

# Add to ChromaDB
collection.add(
    documents=documents,
    embeddings=embeddings,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

print(f"âœ… Added {len(documents)} documents to the database")

âœ… Added 5 documents to the database


## Search the Database

In [38]:
query = "How do AI systems learn?"

# Embed the query
query_embedding = client.models.embed_content(
    model="gemini-embedding-001", 
    contents=query
).embeddings[0].values

# Search (ChromaDB handles the similarity calculation)
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=2
)

print(f"ðŸ”Ž Query: '{query}'\n")
print("ðŸ“„ Results:")
for doc in results['documents'][0]:
    print(f"  â€¢ {doc}")

ðŸ”Ž Query: 'How do AI systems learn?'

ðŸ“„ Results:
  â€¢ Machine learning models learn patterns from data.
  â€¢ Neural networks are inspired by the human brain.


## The Key Difference

**Without a vector DB:**
```python
embeddings = []  # Lives in memory
# Script restarts â†’ embeddings gone
# 1M documents â†’ out of memory
```

**With a vector DB:**
```python
collection.add(...)  # Saved to disk
# Script restarts â†’ data persists
# 1M documents â†’ optimized indexing
```

## Key Takeaways

1. **Vector DBs** persist embeddings to disk
2. They handle **similarity search** for you
3. They scale to **millions of documents**
4. Popular options: ChromaDB, Pinecone, Weaviate, FAISS

---

**Next:** Day 14 â€” Advanced Chunking Strategies