# Day 8: Understanding Embeddings

So far, we've been sending text to LLMs and getting text back.

But what if we need to **compare** texts? Or find **similar** documents?

That's where **embeddings** come in.

## What is an Embedding?

An embedding converts text into a **vector** ‚Äî a list of numbers.

```
"Hello World" ‚Üí [0.012, -0.034, 0.056, ..., 0.089]  (3072 numbers)
```

These numbers capture the **meaning** of the text, not just the words.

Two sentences with similar meaning will have similar vectors.

## Setup

In [None]:
from google import genai
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path='../.env')
API_KEY = os.environ["GEMINI_API_KEY"]
client = genai.Client(api_key=API_KEY)

## Generate an Embedding

In [None]:
text = "Hello World"

response = client.models.embed_content(
    model="gemini-embedding-001",
    contents=text
)

embedding = response.embeddings[0].values

print(f"Input: '{text}'")
print(f"Vector dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

## What Do These Numbers Mean?

Each number represents a **feature** of the text's meaning.

- The model learned these features during training
- Individual numbers don't have human-readable labels
- But combined, they form a unique "fingerprint" of meaning

Think of it like GPS coordinates:
- `(37.7749, -122.4194)` doesn't tell you "San Francisco"
- But similar coordinates mean nearby locations

## Embedding Multiple Texts

In [None]:
texts = [
    "Hello World",
    "Hi there, how are you?",
    "Machine learning is fascinating",
    "Deep learning uses neural networks"
]

embeddings = []
for text in texts:
    response = client.models.embed_content(
        model="gemini-embedding-001",
        contents=text
    )
    embeddings.append(response.embeddings[0].values)
    print(f"‚úÖ Generated embedding for: '{text}'")

print(f"\nüìä Total embeddings generated: {len(embeddings)}")
print(f"üìê Each embedding has {len(embeddings[0])} dimensions")

## Key Takeaways

1. **Embeddings** convert text to vectors (lists of numbers)
2. Gemini embeddings have **3072 dimensions**
3. Each dimension captures some aspect of **meaning**
4. Similar texts will have similar vectors

---

**Next:** Day 9 ‚Äî Comparing texts with cosine similarity