# Linear Algebra for ML - Lecture 2: The Dot Product - The Heart of Machine Learning

Welcome to the second lecture in our comprehensive series on Linear Algebra for Machine Learning. This lecture focuses on the dot product, arguably the most fundamental operation in machine learning.

## Learning Objectives
- Master both algebraic and geometric interpretations of the dot product
- Understand vector projections and their applications
- Learn to use cosine similarity in ML applications
- Build a single neuron from scratch
- Understand how matrix multiplication relates to dot products

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
import time

# Set random seed for reproducibility
np.random.seed(42)

# Set plotting style
plt.style.use('seaborn')
%matplotlib inline

# Section 1: The Dual Nature of the Dot Product

The dot product is unique in having two equivalent interpretations:

1. **Algebraically**: Sum of element-wise products
   - For vectors a = [a₁, a₂, ..., aₙ] and b = [b₁, b₂, ..., bₙ]
   - a · b = a₁b₁ + a₂b₂ + ... + aₙbₙ

2. **Geometrically**: Product of lengths and cosine of angle
   - a · b = ||a|| ||b|| cos(θ)
   - Where ||a|| is the length of vector a and θ is the angle between vectors

Let's explore both interpretations using a practical example from movie recommendations:

In [None]:
# Create movie rating vectors (ratings for 4 movies by different users)
user_A = np.array([5, 4, 1, 1])  # Likes first two movies
user_B = np.array([4, 5, 2, 1])  # Similar taste to A
user_C = np.array([1, 2, 5, 4])  # Different taste

# Calculate dot products
dot_AB = np.dot(user_A, user_B)
dot_AC = np.dot(user_A, user_C)

print("Movie Ratings:")
print(f"User A: {user_A}")
print(f"User B: {user_B}")
print(f"User C: {user_C}\n")

# Show step-by-step dot product calculation
print("Dot Product A·B calculation:")
for i, (a, b) in enumerate(zip(user_A, user_B)):
    print(f"Movie {i+1}: {a} * {b} = {a*b}")
print(f"Sum (A·B) = {dot_AB}\n")

print("Dot Product A·C calculation:")
for i, (a, c) in enumerate(zip(user_A, user_C)):
    print(f"Movie {i+1}: {a} * {c} = {a*c}")
print(f"Sum (A·C) = {dot_AC}")

# Interpretation
print("\nInterpretation:")
print("- Higher dot product (A·B) indicates more similar taste")
print("- Lower dot product (A·C) indicates different taste")

# Section 2: Geometric Interpretation and Vector Projection

The geometric formula for the dot product:

$\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta)$

This interpretation leads to several important insights:
1. When vectors point in the same direction (θ = 0°), cos(θ) = 1, dot product is maximum
2. When vectors are perpendicular (θ = 90°), cos(θ) = 0, dot product is zero
3. When vectors point in opposite directions (θ = 180°), cos(θ) = -1, dot product is minimum

Let's implement functions to explore these geometric properties:

In [None]:
def vector_length(v):
    """Calculate the length (magnitude) of a vector"""
    return np.sqrt(np.sum(v**2))

def angle_between(v1, v2):
    """Calculate the angle between two vectors in degrees"""
    dot_product = np.dot(v1, v2)
    lengths_product = vector_length(v1) * vector_length(v2)
    cos_theta = dot_product / lengths_product
    # Ensure numerical stability
    cos_theta = np.clip(cos_theta, -1.0, 1.0)
    return np.degrees(np.arccos(cos_theta))

def project_vector(v, u):
    """Project vector v onto vector u"""
    # Formula: proj_u(v) = (v·u / ||u||²) * u
    u_norm_squared = np.dot(u, u)
    scaling = np.dot(v, u) / u_norm_squared
    return scaling * u

# Example vectors
v1 = np.array([1, 0])  # Vector along x-axis
v2 = np.array([1, 1])  # Vector at 45 degrees
v3 = np.array([0, 1])  # Vector along y-axis
v4 = np.array([-1, 0])  # Vector in opposite direction to v1

# Calculate angles
angles = {
    "v1 and v2": angle_between(v1, v2),
    "v1 and v3": angle_between(v1, v3),
    "v1 and v4": angle_between(v1, v4)
}

print("Angles between vectors:")
for pair, angle in angles.items():
    print(f"{pair}: {angle:.1f} degrees")

In [None]:
# Visualize vectors and angles
def plot_vectors_and_angles():
    plt.figure(figsize=(10, 10))
    
    # Plot vectors
    vectors = {
        'v1': v1,
        'v2': v2,
        'v3': v3,
        'v4': v4
    }
    
    colors = ['blue', 'red', 'green', 'purple']
    
    # Plot unit circle for reference
    circle = plt.Circle((0, 0), 1, fill=False, color='gray', linestyle='--')
    plt.gca().add_artist(circle)
    
    # Plot vectors
    for (name, v), color in zip(vectors.items(), colors):
        plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1,
                  color=color, label=name)
    
    # Add grid and labels
    plt.grid(True)
    plt.axhline(y=0, color='k', linestyle=':')
    plt.axvline(x=0, color='k', linestyle=':')
    plt.xlim(-1.5, 1.5)
    plt.ylim(-1.5, 1.5)
    plt.aspect('equal')
    plt.legend()
    plt.title('Vector Relationships')
    plt.show()

plot_vectors_and_angles()

# Section 3: Cosine Similarity in Machine Learning

One of the most common applications of the dot product in ML is cosine similarity:

$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}$

This measure is useful because:
1. It's independent of vector magnitudes (normalized)
2. Range is [-1, 1], making it easy to interpret
3. Perfect similarity = 1, perfect dissimilarity = -1

Let's implement a document similarity system using this concept:

In [None]:
def vectorize_text(text, vocab=None):
    """Convert text to bag-of-words vector"""
    words = text.lower().split()
    if vocab is None:
        vocab = sorted(set(words))
    vector = np.zeros(len(vocab))
    for word in words:
        if word in vocab:
            vector[vocab.index(word)] += 1
    return vector, vocab

def cosine_similarity(v1, v2):
    """Calculate cosine similarity between two vectors"""
    dot_product = np.dot(v1, v2)
    norm1 = np.linalg.norm(v1)
    norm2 = np.linalg.norm(v2)
    return dot_product / (norm1 * norm2)

# Example documents
documents = [
    "I love deep learning",
    "I love linear algebra",
    "I love machine learning"
]

query = "I love learning"

# Create vocabulary from all documents and query
all_texts = documents + [query]
_, vocab = vectorize_text(" ".join(all_texts))

# Vectorize documents and query
doc_vectors = [vectorize_text(doc, vocab)[0] for doc in documents]
query_vector, _ = vectorize_text(query, vocab)

# Calculate similarities
similarities = [cosine_similarity(query_vector, doc_vector) 
               for doc_vector in doc_vectors]

# Print results
print("Query:", query)
print("\nDocument Similarities:")
for doc, sim in zip(documents, similarities):
    print(f"Document: '{doc}'")
    print(f"Similarity: {sim:.4f}\n")

# Section 4: The Dot Product in Neural Networks

At its core, every artificial neuron performs a dot product between its input vector and weight vector, followed by adding a bias term and applying an activation function:

$output = activation(\mathbf{w} \cdot \mathbf{x} + b)$

Let's implement a simple neuron from scratch:

In [None]:
class Neuron:
    def __init__(self, num_inputs):
        """Initialize weights and bias using He initialization"""
        self.weights = np.random.randn(num_inputs) * np.sqrt(2.0/num_inputs)
        self.bias = 0
        
    def forward(self, inputs):
        """Compute dot product and add bias"""
        return np.dot(self.weights, inputs) + self.bias
    
    def activate(self, inputs):
        """Apply ReLU activation function"""
        return max(0, self.forward(inputs))

# Create a neuron with 3 inputs
neuron = Neuron(3)

# Test with some example inputs
example_inputs = [
    np.array([0.5, -0.2, 0.1]),
    np.array([1.0, 1.0, 1.0]),
    np.array([-1.0, -1.0, -1.0])
]

print("Neuron weights:", neuron.weights)
print("Neuron bias:", neuron.bias)
print("\nTesting different inputs:")
for x in example_inputs:
    print(f"\nInput: {x}")
    print(f"Output before activation: {neuron.forward(x):.4f}")
    print(f"Output after ReLU: {neuron.activate(x):.4f}")

# Visualize decision boundary for 2D inputs
def plot_neuron_decision_boundary():
    # Create a simplified 2D neuron
    neuron_2d = Neuron(2)
    
    # Create a grid of points
    x = np.linspace(-2, 2, 100)
    y = np.linspace(-2, 2, 100)
    X, Y = np.meshgrid(x, y)
    
    # Calculate neuron output for each point
    Z = np.zeros_like(X)
    for i in range(len(x)):
        for j in range(len(y)):
            point = np.array([X[i,j], Y[i,j]])
            Z[i,j] = neuron_2d.forward(point)
    
    # Plot
    plt.figure(figsize=(10, 8))
    plt.contour(X, Y, Z, levels=[0], colors='k')  # Decision boundary
    plt.contourf(X, Y, Z, alpha=0.4)
    plt.colorbar(label='Neuron Output')
    
    # Plot the weight vector
    plt.quiver(0, 0, neuron_2d.weights[0], neuron_2d.weights[1], 
              angles='xy', scale_units='xy', scale=1, color='r', 
              label='Weight Vector')
    
    plt.grid(True)
    plt.xlabel('Input 1')
    plt.ylabel('Input 2')
    plt.title('2D Neuron Decision Boundary')
    plt.legend()
    plt.show()

plot_neuron_decision_boundary()

# Section 5: Matrix Multiplication as Dot Products

Matrix multiplication can be viewed as a series of dot products. When we multiply a matrix A by a vector x:
- Each row of A performs a dot product with x
- The result is a vector where each element is one of these dot products

Let's visualize this connection:

In [None]:
def visualize_matrix_vector_product():
    # Create a 2x3 matrix and a 3D vector
    A = np.array([[1, 2, 3],
                  [4, 5, 6]])
    x = np.array([0.5, 1.0, -0.5])
    
    # Compute the matrix-vector product
    result = A @ x
    
    print("Matrix A:")
    print(A)
    print("\nVector x:", x)
    print("\nMatrix-vector product (A @ x):", result)
    
    # Show the dot product calculations
    print("\nStep-by-step dot products:")
    for i, row in enumerate(A):
        print(f"Row {i+1} · x = ", end="")
        for j, (a, b) in enumerate(zip(row, x)):
            print(f"{a}*{b}", end="")
            if j < len(x) - 1:
                print(" + ", end="")
        print(f" = {np.dot(row, x)}")

# Demonstrate the relationship
visualize_matrix_vector_product()

# Now let's visualize how matrix multiplication transforms space
def plot_matrix_transformation():
    # Create a simple 2x2 matrix
    A = np.array([[2, 1],
                  [1, 2]])
    
    # Create a grid of points
    x = np.linspace(-2, 2, 10)
    y = np.linspace(-2, 2, 10)
    X, Y = np.meshgrid(x, y)
    points = np.column_stack((X.flatten(), Y.flatten()))
    
    # Transform points
    transformed = points @ A
    
    # Plot
    plt.figure(figsize=(15, 6))
    
    # Original points
    plt.subplot(121)
    plt.scatter(points[:, 0], points[:, 1], alpha=0.5)
    plt.axhline(y=0, color='k', linestyle=':')
    plt.axvline(x=0, color='k', linestyle=':')
    plt.grid(True)
    plt.title('Original Points')
    plt.axis('equal')
    
    # Transformed points
    plt.subplot(122)
    plt.scatter(transformed[:, 0], transformed[:, 1], alpha=0.5)
    plt.axhline(y=0, color='k', linestyle=':')
    plt.axvline(x=0, color='k', linestyle=':')
    plt.grid(True)
    plt.title('After Matrix Transformation')
    plt.axis('equal')
    
    plt.show()

plot_matrix_transformation()

# Exercises

### Exercise 1: Manual Dot Product
Given two 5D vectors, calculate their dot product by hand, then verify with NumPy:
```python
a = [1, 2, -1, 3, 0]
b = [0, 1, 2, -1, 3]
```

In [None]:
import numpy as np

# Define vectors
a = np.array([1, 2, -1, 3, 0])
b = np.array([0, 1, 2, -1, 3])

# Calculate dot product using NumPy
numpy_dot = np.dot(a, b)
print(f"NumPy dot product: {numpy_dot}")

# Now it's your turn to calculate by hand!
# The result should be: (1×0) + (2×1) + (-1×2) + (3×-1) + (0×3) = ?

### Exercise 2: Cosine Similarity
Calculate the cosine similarity between two document vectors:

```python
doc1 = [2, 1, 0, 2, 0, 1, 1]  # Frequencies of words in document 1
doc2 = [1, 1, 1, 0, 1, 0, 1]  # Frequencies of words in document 2
```

Hint: Remember that cosine similarity is the dot product divided by the product of the magnitudes:
cos(θ) = (a·b)/(||a||×||b||)

In [None]:
import numpy as np

# Define document vectors
doc1 = np.array([2, 1, 0, 2, 0, 1, 1])
doc2 = np.array([1, 1, 1, 0, 1, 0, 1])

# TODO: Calculate dot product of doc1 and doc2
dot_product = None

# TODO: Calculate magnitudes of both vectors
magnitude_doc1 = None
magnitude_doc2 = None

# TODO: Calculate cosine similarity
cosine_similarity = None

# Verify with NumPy's built-in function
numpy_cosine = np.dot(doc1, doc2) / (np.linalg.norm(doc1) * np.linalg.norm(doc2))
print(f"NumPy cosine similarity: {numpy_cosine:.4f}")

# Your result should match this value!

### Exercise 3: Neural Network Weight Update
In a simple neural network, we use dot products to compute the weighted sum of inputs. Given:
- Input vector: x = [0.5, -0.2, 0.1]
- Current weights: w = [0.4, 0.3, 0.6]
- Learning rate: α = 0.1
- Target output: y = 0.8
- Actual output: ŷ = sigmoid(w·x)

1. Calculate the actual output ŷ using the sigmoid function
2. Calculate the error: E = (y - ŷ)²
3. Calculate the gradient with respect to weights
4. Update the weights using: w_new = w + α × gradient

Note: The sigmoid function is provided: sigmoid(x) = 1/(1 + e^(-x))

In [None]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# Given values
x = np.array([0.5, -0.2, 0.1])
w = np.array([0.4, 0.3, 0.6])
learning_rate = 0.1
target = 0.8

# TODO: Calculate the weighted sum (dot product)
weighted_sum = None

# TODO: Calculate actual output using sigmoid
output = None

# TODO: Calculate error
error = None

# TODO: Calculate gradient
# Hint: gradient = error × sigmoid_derivative(weighted_sum) × input
gradient = None

# TODO: Update weights
new_weights = None

# Print results
print(f"Original weights: {w}")
print(f"New weights: {new_weights}")

# Verify the new output is closer to the target
new_output = sigmoid(np.dot(new_weights, x))
print(f"Original output: {output:.4f}")
print(f"New output: {new_output:.4f}")
print(f"Target: {target}")

# Solutions

The solutions to the exercises are provided below. Try to solve the exercises yourself before looking at the solutions!

### Solution to Exercise 1

The dot product calculation:
(1 × 0) + (2 × 1) + (-1 × 2) + (3 × -1) + (0 × 3) = 
0 + 2 + (-2) + (-3) + 0 = -3

### Solution to Exercise 2

Here's how to calculate the cosine similarity:

1. Dot product:
   (2×1) + (1×1) + (0×1) + (2×0) + (0×1) + (1×0) + (1×1) = 2 + 1 + 0 + 0 + 0 + 0 + 1 = 4

2. Magnitudes:
   ||doc1|| = √(2² + 1² + 0² + 2² + 0² + 1² + 1²) = √(4 + 1 + 0 + 4 + 0 + 1 + 1) = √11
   ||doc2|| = √(1² + 1² + 1² + 0² + 1² + 0² + 1²) = √(1 + 1 + 1 + 0 + 1 + 0 + 1) = √5

3. Cosine similarity:
   cos(θ) = 4 / (√11 × √5) ≈ 0.5411

### Solution to Exercise 3

Here's the step-by-step solution:

1. Calculate weighted sum (dot product):
   w·x = (0.4 × 0.5) + (0.3 × -0.2) + (0.6 × 0.1)
   = 0.2 - 0.06 + 0.06
   = 0.2

2. Calculate output:
   ŷ = sigmoid(0.2) = 1/(1 + e⁻⁰·²) ≈ 0.5498

3. Calculate error:
   E = (y - ŷ)² = (0.8 - 0.5498)² ≈ 0.0625

4. Calculate gradient:
   gradient = error × sigmoid_derivative(weighted_sum) × input
   = (0.8 - 0.5498) × (0.5498 × (1 - 0.5498)) × [0.5, -0.2, 0.1]
   ≈ 0.2502 × 0.2475 × [0.5, -0.2, 0.1]

5. Update weights:
   w_new = w + α × gradient

This will move the output closer to the target value of 0.8.

In [None]:
import numpy as np

print("Exercise 1 Solution:")
a = np.array([1, 2, -1, 3, 0])
b = np.array([0, 1, 2, -1, 3])
dot_product = np.dot(a, b)
print(f"Dot product: {dot_product}")
print()

print("Exercise 2 Solution:")
doc1 = np.array([2, 1, 0, 2, 0, 1, 1])
doc2 = np.array([1, 1, 1, 0, 1, 0, 1])

# Calculate dot product
dot_product = np.dot(doc1, doc2)

# Calculate magnitudes
magnitude_doc1 = np.sqrt(np.sum(doc1**2))
magnitude_doc2 = np.sqrt(np.sum(doc2**2))

# Calculate cosine similarity
cosine_similarity = dot_product / (magnitude_doc1 * magnitude_doc2)

print(f"Dot product: {dot_product}")
print(f"Magnitude of doc1: {magnitude_doc1:.4f}")
print(f"Magnitude of doc2: {magnitude_doc2:.4f}")
print(f"Cosine similarity: {cosine_similarity:.4f}")
print()

print("Exercise 3 Solution:")
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# Given values
x = np.array([0.5, -0.2, 0.1])
w = np.array([0.4, 0.3, 0.6])
learning_rate = 0.1
target = 0.8

# Calculate weighted sum
weighted_sum = np.dot(w, x)

# Calculate output
output = sigmoid(weighted_sum)

# Calculate error
error = target - output

# Calculate gradient
gradient = error * sigmoid_derivative(weighted_sum) * x

# Update weights
new_weights = w + learning_rate * gradient

print(f"Original weights: {w}")
print(f"Gradient: {gradient}")
print(f"New weights: {new_weights}")
print(f"Original output: {output:.4f}")
print(f"New output: {sigmoid(np.dot(new_weights, x)):.4f}")
print(f"Target: {target}")