# Lecture 6: Norms and Distances in Machine Learning

[![Watch the Video](https://img.shields.io/badge/Watch%20on%20YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white)](https://youtube.com/your-channel)

In this lecture, we'll explore how to measure vectors and the distances between them, concepts fundamental to many machine learning algorithms.

## Learning Objectives
- Understand different types of norms and their properties
- Learn when to use different distance metrics
- Apply distance metrics in machine learning contexts
- Implement custom metrics for specific problems

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
from sklearn.metrics.pairwise import euclidean_distances, manhattan_distances
from sklearn.preprocessing import normalize

plt.style.use('seaborn')
%matplotlib inline

def plot_norm_circle(p, points=1000):
    """Plot the unit circle for different p-norms"""
    theta = np.linspace(0, 2*np.pi, points)
    x = np.cos(theta)
    y = np.sin(theta)
    
    # Convert to p-norm unit circle
    r = (np.abs(x)**p + np.abs(y)**p)**(1/p)
    x = x/r
    y = y/r
    
    return x, y

## 1. Understanding Norms

A norm is a function that assigns a length or size to vectors. Properties of norms:
1. Non-negativity: $\|\mathbf{x}\| \geq 0$
2. Definiteness: $\|\mathbf{x}\| = 0$ if and only if $\mathbf{x} = 0$
3. Homogeneity: $\|\alpha\mathbf{x}\| = |\alpha|\|\mathbf{x}\|$
4. Triangle Inequality: $\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|$

Common norms:
- L1 (Manhattan): $\|\mathbf{x}\|_1 = \sum_i |x_i|$
- L2 (Euclidean): $\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$
- L∞ (Maximum): $\|\mathbf{x}\|_\infty = \max_i |x_i|$

In [None]:
# Visualize different p-norm unit circles
plt.figure(figsize=(15, 5))

# L1 norm
plt.subplot(131)
x1, y1 = plot_norm_circle(1)
plt.plot(x1, y1, 'b-', label='L1 norm')
plt.grid(True)
plt.axis('equal')
plt.title('L1 (Manhattan) Norm\nUnit Circle')
plt.legend()

# L2 norm
plt.subplot(132)
x2, y2 = plot_norm_circle(2)
plt.plot(x2, y2, 'r-', label='L2 norm')
plt.grid(True)
plt.axis('equal')
plt.title('L2 (Euclidean) Norm\nUnit Circle')
plt.legend()

# L∞ norm
plt.subplot(133)
x_inf, y_inf = plot_norm_circle(float('inf'))
plt.plot(x_inf, y_inf, 'g-', label='L∞ norm')
plt.grid(True)
plt.axis('equal')
plt.title('L∞ (Maximum) Norm\nUnit Circle')
plt.legend()

plt.tight_layout()
plt.show()

## 2. Distance Metrics

Distance metrics are derived from norms and measure the separation between vectors. Common distance metrics include:

1. Euclidean Distance: $d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_2$
2. Manhattan Distance: $d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_1$
3. Cosine Distance: $d(\mathbf{x}, \mathbf{y}) = 1 - \frac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}$

In [None]:
# Generate some random 2D points
np.random.seed(42)
points = np.random.randn(5, 2)
reference_point = np.array([0, 0])

# Calculate distances using different metrics
euclidean_dist = euclidean_distances([reference_point], points)[0]
manhattan_dist = manhattan_distances([reference_point], points)[0]

# Plotting
plt.figure(figsize=(12, 6))

# Plot points and distances
plt.subplot(121)
plt.scatter(points[:, 0], points[:, 1], c='blue', label='Points')
plt.scatter([0], [0], c='red', label='Reference')

# Draw lines for both metrics
for i, point in enumerate(points):
    plt.plot([0, point[0]], [0, point[1]], 'g--', alpha=0.3)
    
plt.grid(True)
plt.axis('equal')
plt.title('Points and Their Distances')
plt.legend()

# Bar plot comparing distances
plt.subplot(122)
x = np.arange(len(points))
width = 0.35

plt.bar(x - width/2, euclidean_dist, width, label='Euclidean')
plt.bar(x + width/2, manhattan_dist, width, label='Manhattan')
plt.xlabel('Point Index')
plt.ylabel('Distance')
plt.title('Distance Comparison')
plt.legend()

plt.tight_layout()
plt.show()

# Print the distances
print("Distances from origin:")
for i in range(len(points)):
    print(f"\nPoint {i+1} at {points[i]}:")
    print(f"Euclidean distance: {euclidean_dist[i]:.2f}")
    print(f"Manhattan distance: {manhattan_dist[i]:.2f}")

## 3. Applications in Machine Learning

Different norms and distances are used in various ML contexts:

1. **L1 Regularization (Lasso)**
   - Promotes sparsity
   - Feature selection

2. **L2 Regularization (Ridge)**
   - Prevents overfitting
   - Stabilizes solutions

3. **Distance-based Algorithms**
   - k-Nearest Neighbors
   - k-Means Clustering
   - DBSCAN

In [None]:
# Example: Effect of different norms in regularization
np.random.seed(42)
X = np.random.randn(100, 2)
y = 3*X[:, 0] + 0.5*X[:, 1] + np.random.randn(100) * 0.1

from sklearn.linear_model import Lasso, Ridge
from sklearn.preprocessing import StandardScaler

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Fit models with different regularization
alphas = [0.001, 0.01, 0.1, 1.0, 10.0]
lasso_coefs = []
ridge_coefs = []

for alpha in alphas:
    # Lasso (L1)
    lasso = Lasso(alpha=alpha)
    lasso.fit(X_scaled, y)
    lasso_coefs.append(lasso.coef_)
    
    # Ridge (L2)
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_scaled, y)
    ridge_coefs.append(ridge.coef_)

# Plot coefficients vs regularization strength
lasso_coefs = np.array(lasso_coefs)
ridge_coefs = np.array(ridge_coefs)

plt.figure(figsize=(12, 5))

plt.subplot(121)
plt.plot(alphas, lasso_coefs[:, 0], 'b-', label='Feature 1')
plt.plot(alphas, lasso_coefs[:, 1], 'r-', label='Feature 2')
plt.xscale('log')
plt.xlabel('Alpha (regularization strength)')
plt.ylabel('Coefficient value')
plt.title('Lasso (L1) Regularization')
plt.legend()
plt.grid(True)

plt.subplot(122)
plt.plot(alphas, ridge_coefs[:, 0], 'b-', label='Feature 1')
plt.plot(alphas, ridge_coefs[:, 1], 'r-', label='Feature 2')
plt.xscale('log')
plt.xlabel('Alpha (regularization strength)')
plt.ylabel('Coefficient value')
plt.title('Ridge (L2) Regularization')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

## 4. Custom Distance Metrics

Sometimes we need to define custom distance metrics for specific problems:

1. Time series data: Dynamic Time Warping
2. Text data: Edit Distance
3. Structured data: Domain-specific metrics

In [None]:
def custom_distance(x, y, weights=None):
    """
    Custom weighted Euclidean distance
    
    Parameters:
    -----------
    x, y: array-like
        Vectors to compute distance between
    weights: array-like, optional
        Importance weights for each dimension
    """
    if weights is None:
        weights = np.ones_like(x)
    return np.sqrt(np.sum(weights * (x - y)**2))

# Example usage with feature importance weights
point1 = np.array([1, 2])
point2 = np.array([4, 6])
weights = np.array([2, 1])  # First feature twice as important

standard_dist = np.linalg.norm(point1 - point2)
weighted_dist = custom_distance(point1, point2, weights)

print(f"Standard Euclidean distance: {standard_dist:.2f}")
print(f"Weighted distance: {weighted_dist:.2f}")

## 5. Practice Exercises

1. Implement different p-norms from scratch
2. Compare clustering results using different distance metrics
3. Create a custom distance metric for a specific problem
4. Explore the effect of different norms in regularization

Write your solutions in the cell below:

In [None]:
# Your solution here


## Next Steps

In the next lecture, we'll introduce matrices and explore how they can represent linear transformations.

### Preparation for Next Lecture
1. Review vector operations and norms
2. Think about how we might represent transformations of vectors
3. Consider why matrices are useful in machine learning

### Additional Resources
- [Interactive Norm Visualization](../../resources/visualizations/norms.html)
- [Distance Metrics Cheat Sheet](../../resources/cheat_sheets/distances.pdf)
- [Regularization in Machine Learning](../../resources/articles/regularization.md)