# 3D Vector Similarity Metrics

Interactive 3D visualizations of three fundamental vector similarity measures:
- **Euclidean Distance**: Straight-line distance between vectors
- **Inner Product**: Dot product and vector projections
- **Cosine Similarity**: Angle between normalized vectors
- **Multi-vector Comparison**: Query vs multiple candidates with all metrics


## 🎯 What You'll Learn

This notebook demonstrates how vector databases work by visualizing similarity metrics in 3D space. Understanding these metrics is crucial for:

- **Vector Databases**: Choosing the right similarity measure for your use case
- **Machine Learning**: Interpreting embeddings and feature vectors
- **Information Retrieval**: Ranking search results by relevance
- **Recommendation Systems**: Finding similar items or users

## 📊 Metric Overview

| Metric | Formula | Range | Use Case | Interpretation |
|--------|---------|-------|----------|----------------|
| **Euclidean Distance** | `√(Σ(xᵢ-yᵢ)²)` | [0, ∞) | Spatial proximity | Smaller = more similar |
| **Inner Product** | `Σ(xᵢ×yᵢ)` | (-∞, ∞) | Recommendations | Higher = better match |
| **Cosine Similarity** | `(a·b)/(‖a‖×‖b‖)` | [-1, 1] | Text similarity | Closer to 1 = more similar |

## 🔧 Prerequisites

- Basic understanding of vectors and linear algebra
- Familiarity with Python (NumPy basics helpful)
- Interest in machine learning and information retrieval

## 📈 Learning Objectives

By the end of this notebook, you'll understand:
1. How each similarity metric works geometrically
2. When to use each metric for different applications
3. How vector databases rank and retrieve similar vectors
4. Practical implementation examples for real-world use cases

📖 For more context, see the full blog post at: [wiphoo.dev](https://go.wiphoo.dev/nhL42L)

## Setup: Import Libraries

In [1]:
%pip install numpy plotly ipywidgets

Looking in indexes: https://pypi.org/simple, https://packagecloud.io/github/git-lfs/pypi/simple
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, clear_output

## Define Example Vectors

In [3]:
# Query vector and candidate vectors for demonstration
v1 = np.array([3.0, 1.5, 2.0])  # Query vector (red)
v2 = np.array([1.0, 2.5, 0.5])  # Candidate 1: Similar direction (blue)
v3 = np.array([-2.0, 1.0, 1.5]) # Candidate 2: Opposite direction (green)
v4 = np.array([0.5, -1.0, 3.0]) # Candidate 3: Different direction (orange)

vectors = [v1, v2, v3, v4]
vector_names = ["Query (v₁)", "Candidate 1 (v₂)", "Candidate 2 (v₃)", "Candidate 3 (v₄)"]
vector_colors = ["red", "blue", "green", "orange"]

print("Example Vectors:")
for i, (vec, name) in enumerate(zip(vectors, vector_names), 1):
    print(f"{name}: {vec}")

Example Vectors:
Query (v₁): [3.  1.5 2. ]
Candidate 1 (v₂): [1.  2.5 0.5]
Candidate 2 (v₃): [-2.   1.   1.5]
Candidate 3 (v₄): [ 0.5 -1.   3. ]


## Core Similarity Functions

In [4]:
def euclidean_distance(a, b):
    """Calculate straight-line distance between two vectors"""
    return np.linalg.norm(a - b)

def inner_product(a, b):
    """Calculate dot product of two vectors"""
    return np.dot(a, b)

def cosine_similarity(a, b):
    """Calculate cosine of angle between normalized vectors"""
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return np.dot(a, b) / (norm_a * norm_b)

def calculate_metrics(vec1, vec2):
    """Calculate all three similarity metrics between two vectors"""
    return {
        'euclidean_dist': euclidean_distance(vec1, vec2),
        'dot_product': inner_product(vec1, vec2),
        'cosine_sim': cosine_similarity(vec1, vec2),
        'norm1': np.linalg.norm(vec1),
        'norm2': np.linalg.norm(vec2)
    }

## Calculate Similarity Scores

In [5]:
# Calculate metrics for query vector (v1) vs all candidates
metrics_v1 = {}
for i, vec in enumerate(vectors[1:], 2):  # v2, v3, v4
    metrics_v1[f'v1_v{i}'] = calculate_metrics(v1, vec)

print("Similarity Scores (Query v₁ vs Candidates):")
for pair, metrics in metrics_v1.items():
    v_num = pair[-1]
    print(f"v₁ vs v{v_num}:")
    print(f"  Euclidean Distance: {metrics['euclidean_dist']:.3f}")
    print(f"  Inner Product:      {metrics['dot_product']:.3f}")
    print(f"  Cosine Similarity:  {metrics['cosine_sim']:.3f}")
    print()

Similarity Scores (Query v₁ vs Candidates):
v₁ vs v2:
  Euclidean Distance: 2.693
  Inner Product:      7.750
  Cosine Similarity:  0.725

v₁ vs v3:
  Euclidean Distance: 5.050
  Inner Product:      -1.500
  Cosine Similarity:  -0.143

v₁ vs v4:
  Euclidean Distance: 3.674
  Inner Product:      6.000
  Cosine Similarity:  0.480



## Plotting Helper Functions

In [6]:
def make_layout(title):
    """Create consistent 3D plot layout"""
    return go.Layout(
        title=title,
        scene=dict(
            xaxis=dict(title="x", gridcolor="lightgray", linecolor="black", linewidth=2),
            yaxis=dict(title="y", gridcolor="lightgray", linecolor="black", linewidth=2),
            zaxis=dict(title="z", gridcolor="lightgray", linecolor="black", linewidth=2),
            aspectmode="cube"
        ),
        showlegend=True,
        paper_bgcolor="white",
        plot_bgcolor="white"
    )

def vector_trace(v, name, color=None, show_text=True):
    """Create a 3D vector trace from origin to point v"""
    mode = "lines+markers+text" if show_text else "lines+markers"
    return go.Scatter3d(
        x=[0, v[0]], y=[0, v[1]], z=[0, v[2]],
        mode=mode,
        line=dict(width=6, color=color),
        marker=dict(size=4, color=color),
        text=[None, name],
        textposition="top center",
        name=name,
        opacity=1.0
    )

def tip_to_tip_trace(p, q, dash="dot", color="green", name="‖a − b‖"):
    """Create a line trace between two points p and q"""
    return go.Scatter3d(
        x=[p[0], q[0]], y=[p[1], q[1]], z=[p[2], q[2]],
        mode="lines+text",
        line=dict(width=4, dash=dash, color=color),
        text=[None, f"{name} = {np.linalg.norm(p-q):.3f}"],
        textposition="middle right",
        name=name,
        opacity=1.0
    )

def text3d(x, y, z, text, color="black"):
    """Create 3D text annotation at position (x,y,z)"""
    return go.Scatter3d(
        x=[x], y=[y], z=[z],
        mode="text",
        text=[text],
        textfont=dict(color=color),
        name="label",
        opacity=1.0
    )

def projection_of_b_on_a(a, b):
    """Calculate projection of vector b onto vector a"""
    na2 = float(np.dot(a, a))
    coeff = float(np.dot(a, b) / na2) if na2 > 0 else 0.0
    return coeff * a, coeff

## 1. Euclidean Distance Visualization

In [7]:
# Euclidean Distance: Straight-line distance between vectors
fig = go.Figure(layout=make_layout("Euclidean Distance: Straight-line distance between vectors"))
fig.add_trace(vector_trace(v1, "Query (v₁)", color="red"))
fig.add_trace(vector_trace(v2, "Candidate 1 (v₂)", color="blue"))
fig.add_trace(tip_to_tip_trace(v1, v2, dash="dot", color="green", name="‖v₁ − v₂‖"))
fig.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig.show()

## 2. Inner Product Visualization

In [8]:
# Inner Product: Vector projection and perpendicular component
fig = go.Figure(layout=make_layout("Inner Product: Vector projection and perpendicular component"))
fig.add_trace(vector_trace(v1, "Query (v₁)", color="red"))
fig.add_trace(vector_trace(v2, "Candidate 1 (v₂)", color="blue"))

# Calculate and show projection
proj, coeff = projection_of_b_on_a(v1, v2)
fig.add_trace(vector_trace(proj, "proj_{v₁}(v₂)", color="green"))

# Show perpendicular component
perp_component = v2 - proj
if np.linalg.norm(perp_component) > 1e-10:
    fig.add_trace(vector_trace(perp_component, "⊥ component", color="orange"))
    
    # Draw right angle indicator
    right_angle_size = 0.3
    proj_norm = proj / np.linalg.norm(proj) if np.linalg.norm(proj) > 0 else np.array([1, 0, 0])
    perp_norm = perp_component / np.linalg.norm(perp_component) if np.linalg.norm(perp_component) > 0 else np.array([0, 1, 0])
    
    corner = proj
    ra1 = corner + right_angle_size * proj_norm
    ra2 = corner + right_angle_size * perp_norm
    
    fig.add_trace(go.Scatter3d(
        x=[corner[0], ra1[0], ra1[0] + right_angle_size * perp_norm[0]],
        y=[corner[1], ra1[1], ra1[1] + right_angle_size * perp_norm[1]],
        z=[corner[2], ra1[2], ra1[2] + right_angle_size * perp_norm[2]],
        mode="lines", line=dict(color="green", width=2, dash="dot"), showlegend=False
    ))

# Add labels
inner_product_val = float(np.dot(v1, v2))
fig.add_trace(text3d(0, 0, -0.5, f"v₁·v₂ = {inner_product_val:.3f}"))

if np.linalg.norm(proj) > 0:
    label_pos = proj + 0.2 * (proj / np.linalg.norm(proj))
    fig.add_trace(text3d(*label_pos, f"proj_{{v₁}}(v₂) = {coeff:.3f}·v₁"))

fig.add_trace(text3d(0, 0, -1.0, "v₂ = proj_{v₁}(v₂) + ⊥ component"))
fig.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig.show()

## 3. Cosine Similarity Visualization

In [9]:
# Cosine Similarity: Angle between normalized vectors
fig = go.Figure(layout=make_layout("Cosine Similarity: Angle between normalized vectors"))

# Normalize vectors to unit length
na = float(np.linalg.norm(v1))
nb = float(np.linalg.norm(v2))
ua = v1/na if na > 0 else v1  # Unit vector a
ub = v2/nb if nb > 0 else v2  # Unit vector b

fig.add_trace(vector_trace(ua, "v̂₁ (unit)", color="red"))
fig.add_trace(vector_trace(ub, "v̂₂ (unit)", color="blue"))

# Create angle arc
u = ua
w_raw = ub - np.dot(ub, u) * u
nw = np.linalg.norm(w_raw)

if nw > 1e-12:
    w = w_raw / nw
    theta = float(np.arccos(np.clip(float(np.dot(ua, ub)), -1.0, 1.0)))
    t_vals = np.linspace(0, theta, 120)
    arc = np.outer(np.cos(t_vals), u) + np.outer(np.sin(t_vals), w)
    fig.add_trace(go.Scatter3d(
        x=arc[:,0], y=arc[:,1], z=arc[:,2], 
        mode="lines", line=dict(width=6, color="green", dash="dot")
    ))
    label_point = arc[int(len(t_vals)*0.6)]
    cos_theta = float(np.cos(theta))
else:
    label_point = ua
    cos_theta = float(np.dot(ua, ub))

fig.add_trace(text3d(
    label_point[0], label_point[1], label_point[2], 
    f"θ = {np.degrees(theta):.1f}°, cosθ = {cos_theta:.3f}"
))
fig.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig.show()

## 4. Multi-Vector Comparison: Vector Database Simulation

This visualization simulates how **vector databases** work by comparing one query vector against multiple candidates. This is the core operation in systems like Pinecone, Weaviate, Milvus, and Qdrant.

### How Vector Databases Work:

1. **Indexing**: Store high-dimensional vectors (embeddings) in specialized data structures
2. **Querying**: Convert search query to vector using same embedding model
3. **Similarity Search**: Find k-nearest neighbors using chosen distance metric
4. **Ranking**: Return most similar vectors with their similarity scores

### Visualization Elements:
- **🔴 Red**: Query vector (what the user is searching for)
- **🔵 Blue/Green/Orange**: Candidate vectors (database entries to search through)
- **Dotted lines**: Euclidean distances between query and candidates
- **Solid lines**: Inner product projections (recommendation-style scoring)
- **Dash-dot arcs**: Cosine similarity angles (directional similarity)

### Real-World Applications:
- **Semantic Search**: Find documents with similar meaning
- **Image Search**: Locate visually similar images
- **Product Recommendations**: Suggest items with similar features
- **Anomaly Detection**: Identify unusual patterns in data streams

### Performance Considerations:
- **Index Types**: HNSW, IVF, PQ for approximate nearest neighbor search
- **Distance Metrics**: Choose based on embedding type and use case
- **Scalability**: Handle millions of vectors with sub-second query times
- **Accuracy vs Speed**: Balance between exact and approximate results

In [10]:
# Compare query vector against all candidates with all metrics
fig = go.Figure(layout=make_layout("Multi-Vector Comparison: Query vs All Candidates"))

# Add all vectors
for i, (vec, name, color) in enumerate(zip(vectors, vector_names, vector_colors)):
    fig.add_trace(vector_trace(vec, name, color=color))

# Add metric visualizations for each candidate
for i, (vec, name, color) in enumerate(zip(vectors[1:], vector_names[1:], vector_colors[1:]), 2):
    metrics = metrics_v1[f'v1_v{i}']
    
    # Euclidean distance (dotted line)
    fig.add_trace(tip_to_tip_trace(
        v1, vec, dash="dot", color=color,
        name=f"EU: ‖v₁ − v{i}‖ = {metrics['euclidean_dist']:.3f}"
    ))
    
    # Inner product projection (solid line)
    proj, coeff = projection_of_b_on_a(v1, vec)
    if np.linalg.norm(proj) > 0.1:
        fig.add_trace(go.Scatter3d(
            x=[0, proj[0]], y=[0, proj[1]], z=[0, proj[2]],
            mode="lines",
            line=dict(color=color, width=3, dash="solid"),
            name=f"IP: proj(v₁,v{i}) = {coeff:.3f}",
            showlegend=True,
            opacity=1.0
        ))
    
    # Cosine similarity arc
    na = float(np.linalg.norm(v1))
    nb = float(np.linalg.norm(vec))
    if na > 0 and nb > 0:
        ua = v1 / na
        ub = vec / nb
        cos_sim = metrics['cosine_sim']
        theta = np.arccos(np.clip(cos_sim, -1.0, 1.0))
        
        if 0.2 < theta < np.pi - 0.2:
            arc_radius = 0.8
            cross_prod = np.cross(ua, ub)
            cross_norm = np.linalg.norm(cross_prod)
            
            if cross_norm > 1e-10:
                u = ua / np.linalg.norm(ua)
                w = cross_prod / cross_norm
                v = np.cross(w, u)
                
                t_vals = np.linspace(0, theta, 25)
                arc_points = []
                
                for t in t_vals:
                    cos_t, sin_t = np.cos(t), np.sin(t)
                    point = arc_radius * (cos_t * u + sin_t * v)
                    arc_points.append(point)
                
                arc_points = np.array(arc_points)
                fig.add_trace(go.Scatter3d(
                    x=arc_points[:, 0], y=arc_points[:, 1], z=arc_points[:, 2],
                    mode="lines",
                    line=dict(color=color, width=2, dash="dashdot"),
                    name=f"COS: θ = {np.degrees(theta):.1f}°",
                    showlegend=True,
                    opacity=1.0
                ))

# Add metric summary
summary_text = "Similarity Scores (v₁ vs candidates):<br>"
for i in range(2, 5):
    m = metrics_v1[f'v1_v{i}']
    summary_text += f"v{i}: EU={m['euclidean_dist']:.3f}, IP={m['dot_product']:.3f}, COS={m['cosine_sim']:.3f}<br>"

fig.add_trace(text3d(-3, -3, -3, summary_text, color="black"))
fig.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig.show()

print("Multi-Vector Comparison Summary:")
print("• Red: Query vector (v₁)")
print("• Blue/Green/Orange: Candidate vectors (v₂/v₃/v₄)")
print("• Dotted lines: Euclidean distances")
print("• Solid lines: Inner product projections")
print("• Dash-dot arcs: Cosine similarity angles")
print("\nDetailed Scores:")
for i in range(2, 5):
    m = metrics_v1[f'v1_v{i}']
    print(f"v₁ vs v{i}: EU={m['euclidean_dist']:.3f}, IP={m['dot_product']:.3f}, COS={m['cosine_sim']:.3f}")

Multi-Vector Comparison Summary:
• Red: Query vector (v₁)
• Blue/Green/Orange: Candidate vectors (v₂/v₃/v₄)
• Dotted lines: Euclidean distances
• Solid lines: Inner product projections
• Dash-dot arcs: Cosine similarity angles

Detailed Scores:
v₁ vs v2: EU=2.693, IP=7.750, COS=0.725
v₁ vs v3: EU=5.050, IP=-1.500, COS=-0.143
v₁ vs v4: EU=3.674, IP=6.000, COS=0.480



## 🎯 Key Takeaways

### **Choosing the Right Metric**

| Scenario | Recommended Metric | Why? | Example |
|----------|-------------------|------|---------|
| **Text/Document Search** | Cosine Similarity | Direction matters, magnitude varies | Semantic search engines |
| **Image/Visual Search** | Euclidean Distance | Spatial proximity in feature space | Computer vision applications |
| **User Recommendations** | Inner Product | Both direction and magnitude matter | Collaborative filtering |
| **Geographic Data** | Euclidean Distance | Physical distances | Location-based services |
| **Anomaly Detection** | Euclidean Distance | Outlier identification | Fraud detection, monitoring |

### **Metric Properties Summary**

#### **Euclidean Distance (L2)**
- ✅ **Intuitive**: Physical distance in space
- ✅ **Bounded**: Always ≥ 0
- ✅ **Scale-sensitive**: Magnitude affects similarity
- ❌ **Curse of dimensionality**: Performance degrades in high dimensions

#### **Inner Product (IP)**
- ✅ **Efficient**: Simple dot product computation
- ✅ **Magnitude-aware**: Considers both direction and scale
- ✅ **ML-friendly**: Used in neural networks and attention mechanisms
- ❌ **Unbounded**: Can be negative, harder to interpret

#### **Cosine Similarity**
- ✅ **Scale-invariant**: Works with varying magnitudes
- ✅ **Bounded**: Easy to interpret [-1, 1] range
- ✅ **Angle-based**: Intuitive geometric interpretation
- ❌ **Ignores magnitude**: May miss important scale information

### **Vector Database Best Practices**

1. **Choose metric based on embedding type**:
   - Normalized embeddings → Cosine Similarity
   - Raw features → Euclidean Distance or Inner Product

2. **Consider your use case**:
   - Search quality vs speed trade-offs
   - Exact vs approximate nearest neighbors
   - Real-time requirements

3. **Monitor and tune**:
   - Query performance and latency
   - Result quality and relevance
   - Index maintenance and updates

### **Further Reading**
- [Vector Databases: A Comprehensive Guide](https://go.wiphoo.dev/vector-dbs)
- [Embeddings and Similarity Metrics](https://go.wiphoo.dev/embeddings)
- [Practical Vector Search](https://go.wiphoo.dev/vector-search)

### **Try It Yourself**
Experiment with different vectors and see how the metrics behave:
- Try vectors of different magnitudes
- Test orthogonal vectors (cosine similarity = 0)
- Compare opposite vectors (cosine similarity = -1)

---

*This notebook provides interactive visualizations to build intuition about vector similarity metrics. For production vector database implementations, consider using specialized libraries like FAISS, Annoy, or managed services like Pinecone and Weaviate.*