# ML LeetCode - Part 2: Core ML Algorithms from Scratch 🤖

This notebook contains algorithmic challenges focused on implementing fundamental machine learning algorithms from scratch. Each problem follows the LeetCode format with multiple solution approaches and complexity analysis.

## 🎯 Learning Objectives
- Implement classic ML algorithms without libraries
- Understand algorithmic complexity and optimization
- Master different approaches to the same problem
- Practice efficient coding for ML problems

## 📊 Difficulty Levels
- 🟢 **Easy**: Basic implementations
- 🟡 **Medium**: Optimized versions with edge cases
- 🔴 **Hard**: Advanced algorithms with multiple optimizations

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Tuple, Optional, Dict, Any
import time
import math
from collections import defaultdict, Counter
import heapq

def test_algorithm(func, test_cases, name="Algorithm"):
    """Test an algorithm with multiple test cases."""
    print(f"\n🧪 Testing {name}:")
    for i, (inputs, expected) in enumerate(test_cases, 1):
        try:
            result = func(*inputs)
            if isinstance(expected, (list, np.ndarray)):
                passed = np.allclose(result, expected, rtol=1e-6)
            else:
                passed = abs(result - expected) < 1e-6 if isinstance(result, (int, float)) else result == expected
            status = "✅" if passed else "❌"
            print(f"  {status} Test {i}: Expected {expected}, Got {result}")
        except Exception as e:
            print(f"  ❌ Test {i}: Error - {e}")

plt.style.use('seaborn-v0_8')
np.random.seed(42)

## Problem 1: K-Means Clustering Implementation 🟡

**Difficulty**: Medium

**Problem**: Implement K-means clustering algorithm with multiple initialization strategies and convergence criteria.

**Constraints**:
- 1 ≤ n_samples ≤ 1000
- 1 ≤ n_features ≤ 50
- 1 ≤ k ≤ min(20, n_samples)
- max_iters ≤ 1000
- Must handle edge cases (empty clusters, convergence)

**Example**:
```python
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
k = 2
labels, centers = kmeans(X, k)
# Expected: Two clusters around [1, 2] and [10, 2]
```

In [None]:
class KMeansImplementation:
    
    def __init__(self, k: int, max_iters: int = 100, tol: float = 1e-4, init: str = 'random'):
        self.k = k
        self.max_iters = max_iters
        self.tol = tol
        self.init = init
    
    def fit(self, X: List[List[float]]) -> Tuple[List[int], List[List[float]]]:
        """
        K-means clustering algorithm.
        
        Time Complexity: O(n*k*d*iterations)
        Space Complexity: O(n + k*d)
        
        Returns:
            labels: Cluster assignment for each point
            centers: Final cluster centers
        """
        X = np.array(X)
        n_samples, n_features = X.shape
        
        if self.k > n_samples:
            raise ValueError(f"k ({self.k}) cannot be larger than n_samples ({n_samples})")
        
        # Initialize centers
        centers = self._init_centers(X)
        
        for iteration in range(self.max_iters):
            # Assign points to nearest centers
            labels = self._assign_clusters(X, centers)
            
            # Update centers
            new_centers = self._update_centers(X, labels)
            
            # Check convergence
            center_shift = np.max([np.linalg.norm(new_centers[i] - centers[i]) 
                                  for i in range(self.k)])
            
            centers = new_centers
            
            if center_shift < self.tol:
                break
        
        return labels.tolist(), centers.tolist()
    
    def _init_centers(self, X: np.ndarray) -> np.ndarray:
        """Initialize cluster centers."""
        n_samples, n_features = X.shape
        
        if self.init == 'random':
            # Random initialization
            indices = np.random.choice(n_samples, self.k, replace=False)
            return X[indices].copy()
        
        elif self.init == 'kmeans++':
            # K-means++ initialization
            centers = np.zeros((self.k, n_features))
            
            # Choose first center randomly
            centers[0] = X[np.random.choice(n_samples)]
            
            for i in range(1, self.k):
                # Calculate distances to nearest center
                distances = np.array([min([np.linalg.norm(x - c)**2 
                                          for c in centers[:i]]) for x in X])
                
                # Choose next center with probability proportional to squared distance
                probabilities = distances / distances.sum()
                centers[i] = X[np.random.choice(n_samples, p=probabilities)]
            
            return centers
        
        else:
            raise ValueError(f"Unknown initialization method: {self.init}")
    
    def _assign_clusters(self, X: np.ndarray, centers: np.ndarray) -> np.ndarray:
        """Assign each point to the nearest cluster center."""
        distances = np.sqrt(((X - centers[:, np.newaxis])**2).sum(axis=2))
        return np.argmin(distances, axis=0)
    
    def _update_centers(self, X: np.ndarray, labels: np.ndarray) -> np.ndarray:
        """Update cluster centers based on current assignments."""
        centers = np.zeros((self.k, X.shape[1]))
        
        for i in range(self.k):
            mask = labels == i
            if np.any(mask):
                centers[i] = X[mask].mean(axis=0)
            else:
                # Handle empty cluster by reinitializing
                centers[i] = X[np.random.choice(len(X))]
        
        return centers
    
    def inertia(self, X: List[List[float]], labels: List[int], centers: List[List[float]]) -> float:
        """Calculate within-cluster sum of squares."""
        X = np.array(X)
        centers = np.array(centers)
        labels = np.array(labels)
        
        inertia = 0
        for i in range(self.k):
            mask = labels == i
            if np.any(mask):
                inertia += np.sum((X[mask] - centers[i])**2)
        
        return inertia

# Test K-means implementation
print("=== Problem 1: K-Means Clustering ===")

# Test case 1: Simple 2D data
X1 = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
kmeans1 = KMeansImplementation(k=2, init='random')
labels1, centers1 = kmeans1.fit(X1)

print(f"\nTest Case 1: 2D data with k=2")
print(f"Labels: {labels1}")
print(f"Centers: {[[round(x, 2) for x in center] for center in centers1]}")
print(f"Inertia: {kmeans1.inertia(X1, labels1, centers1):.2f}")

# Test case 2: K-means++ initialization
kmeans2 = KMeansImplementation(k=2, init='kmeans++')
labels2, centers2 = kmeans2.fit(X1)

print(f"\nTest Case 2: K-means++ initialization")
print(f"Labels: {labels2}")
print(f"Centers: {[[round(x, 2) for x in center] for center in centers2]}")
print(f"Inertia: {kmeans2.inertia(X1, labels2, centers2):.2f}")

# Visualize results
X1_np = np.array(X1)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for idx, (labels, centers, title) in enumerate([
    (labels1, centers1, "Random Init"),
    (labels2, centers2, "K-means++ Init")
]):
    
    colors = ['red', 'blue', 'green', 'orange', 'purple']
    for i in range(len(set(labels))):
        mask = np.array(labels) == i
        axes[idx].scatter(X1_np[mask, 0], X1_np[mask, 1], 
                         c=colors[i], label=f'Cluster {i}', alpha=0.7, s=100)
    
    # Plot centers
    centers_np = np.array(centers)
    axes[idx].scatter(centers_np[:, 0], centers_np[:, 1], 
                     c='black', marker='x', s=200, linewidths=3, label='Centers')
    
    axes[idx].set_title(title)
    axes[idx].set_xlabel('Feature 1')
    axes[idx].set_ylabel('Feature 2')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Problem 2: Decision Tree from Scratch 🔴

**Difficulty**: Hard

**Problem**: Implement a decision tree classifier with multiple splitting criteria, pruning, and feature importance calculation.

**Constraints**:
- 1 ≤ n_samples ≤ 5000
- 1 ≤ n_features ≤ 100
- max_depth ≤ 20
- min_samples_split ≥ 2
- Must support both numerical and categorical features

**Example**:
```python
X = [[0, 0], [1, 1], [0, 1], [1, 0]]
y = [0, 1, 1, 0]  # XOR problem
tree = DecisionTree(max_depth=3)
tree.fit(X, y)
predictions = tree.predict(X)
```

In [None]:
class DecisionTreeNode:
    def __init__(self):
        self.feature_idx = None
        self.threshold = None
        self.left = None
        self.right = None
        self.value = None  # For leaf nodes
        self.samples = 0
        self.impurity = 0

class DecisionTreeClassifier:
    
    def __init__(self, max_depth: int = 10, min_samples_split: int = 2, 
                 min_samples_leaf: int = 1, criterion: str = 'gini'):
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.min_samples_leaf = min_samples_leaf
        self.criterion = criterion
        self.root = None
        self.feature_importances = None
        self.n_features = 0
        self.n_classes = 0
    
    def fit(self, X: List[List[float]], y: List[int]) -> 'DecisionTreeClassifier':
        """
        Build decision tree classifier.
        
        Time Complexity: O(n * m * log(n)) average case
        Space Complexity: O(log(n)) for balanced tree
        """
        X = np.array(X)
        y = np.array(y)
        
        self.n_features = X.shape[1]
        self.n_classes = len(np.unique(y))
        self.feature_importances = np.zeros(self.n_features)
        
        self.root = self._build_tree(X, y, depth=0)
        
        # Normalize feature importances
        if self.feature_importances.sum() > 0:
            self.feature_importances /= self.feature_importances.sum()
        
        return self
    
    def _build_tree(self, X: np.ndarray, y: np.ndarray, depth: int) -> DecisionTreeNode:
        """Recursively build decision tree."""
        n_samples, n_features = X.shape
        n_classes = len(np.unique(y))
        
        node = DecisionTreeNode()
        node.samples = n_samples
        node.impurity = self._calculate_impurity(y)
        
        # Stopping criteria
        if (depth >= self.max_depth or 
            n_samples < self.min_samples_split or 
            n_classes == 1):
            node.value = self._most_common_class(y)
            return node
        
        # Find best split
        best_feature, best_threshold, best_gain = self._find_best_split(X, y)
        
        if best_gain == 0:
            node.value = self._most_common_class(y)
            return node
        
        # Update feature importance
        self.feature_importances[best_feature] += best_gain * n_samples
        
        # Split data
        left_mask = X[:, best_feature] <= best_threshold
        right_mask = ~left_mask
        
        if np.sum(left_mask) < self.min_samples_leaf or np.sum(right_mask) < self.min_samples_leaf:
            node.value = self._most_common_class(y)
            return node
        
        node.feature_idx = best_feature
        node.threshold = best_threshold
        node.left = self._build_tree(X[left_mask], y[left_mask], depth + 1)
        node.right = self._build_tree(X[right_mask], y[right_mask], depth + 1)
        
        return node
    
    def _find_best_split(self, X: np.ndarray, y: np.ndarray) -> Tuple[int, float, float]:
        """Find the best feature and threshold to split on."""
        best_gain = 0
        best_feature = 0
        best_threshold = 0
        
        parent_impurity = self._calculate_impurity(y)
        
        for feature_idx in range(X.shape[1]):
            thresholds = np.unique(X[:, feature_idx])
            
            for threshold in thresholds:
                left_mask = X[:, feature_idx] <= threshold
                right_mask = ~left_mask
                
                if np.sum(left_mask) == 0 or np.sum(right_mask) == 0:
                    continue
                
                # Calculate information gain
                left_impurity = self._calculate_impurity(y[left_mask])
                right_impurity = self._calculate_impurity(y[right_mask])
                
                n_left, n_right = np.sum(left_mask), np.sum(right_mask)
                weighted_impurity = (n_left * left_impurity + n_right * right_impurity) / len(y)
                
                gain = parent_impurity - weighted_impurity
                
                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature_idx
                    best_threshold = threshold
        
        return best_feature, best_threshold, best_gain
    
    def _calculate_impurity(self, y: np.ndarray) -> float:
        """Calculate impurity measure (Gini or Entropy)."""
        if len(y) == 0:
            return 0
        
        proportions = np.bincount(y) / len(y)
        
        if self.criterion == 'gini':
            return 1 - np.sum(proportions ** 2)
        elif self.criterion == 'entropy':
            return -np.sum(proportions * np.log2(proportions + 1e-15))
        else:
            raise ValueError(f"Unknown criterion: {self.criterion}")
    
    def _most_common_class(self, y: np.ndarray) -> int:
        """Return the most common class in y."""
        return np.bincount(y).argmax()
    
    def predict(self, X: List[List[float]]) -> List[int]:
        """Make predictions for input samples."""
        X = np.array(X)
        return [self._predict_sample(sample, self.root) for sample in X]
    
    def _predict_sample(self, sample: np.ndarray, node: DecisionTreeNode) -> int:
        """Predict class for a single sample."""
        if node.value is not None:
            return node.value
        
        if sample[node.feature_idx] <= node.threshold:
            return self._predict_sample(sample, node.left)
        else:
            return self._predict_sample(sample, node.right)
    
    def get_depth(self) -> int:
        """Calculate tree depth."""
        def _depth(node):
            if node is None or node.value is not None:
                return 0
            return 1 + max(_depth(node.left), _depth(node.right))
        
        return _depth(self.root)

# Test Decision Tree implementation
print("\n=== Problem 2: Decision Tree Implementation ===")

# Test case 1: Simple linearly separable data
X_simple = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_simple = [0, 0, 0, 1]

dt1 = DecisionTreeClassifier(max_depth=3, criterion='gini')
dt1.fit(X_simple, y_simple)
pred1 = dt1.predict(X_simple)

print(f"\nTest Case 1: Simple AND logic")
print(f"Predictions: {pred1}")
print(f"Accuracy: {np.mean(np.array(pred1) == np.array(y_simple)):.2f}")
print(f"Tree depth: {dt1.get_depth()}")
print(f"Feature importances: {[f'{x:.3f}' for x in dt1.feature_importances]}")

# Test case 2: XOR problem (non-linearly separable)
X_xor = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_xor = [0, 1, 1, 0]

dt2 = DecisionTreeClassifier(max_depth=5, criterion='entropy')
dt2.fit(X_xor, y_xor)
pred2 = dt2.predict(X_xor)

print(f"\nTest Case 2: XOR problem")
print(f"Predictions: {pred2}")
print(f"Accuracy: {np.mean(np.array(pred2) == np.array(y_xor)):.2f}")
print(f"Tree depth: {dt2.get_depth()}")
print(f"Feature importances: {[f'{x:.3f}' for x in dt2.feature_importances]}")

# Visualize decision boundaries
def plot_decision_boundary(X, y, model, title):
    X = np.array(X)
    h = 0.01
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                        np.arange(y_min, y_max, h))
    
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    Z = model.predict(mesh_points.tolist())
    Z = np.array(Z).reshape(xx.shape)
    
    plt.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
    colors = ['red', 'blue']
    for class_val in np.unique(y):
        mask = np.array(y) == class_val
        plt.scatter(X[mask, 0], X[mask, 1], c=colors[class_val], 
                   label=f'Class {class_val}', s=100, edgecolors='black')
    
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.legend()
    plt.grid(True, alpha=0.3)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

plt.subplot(1, 2, 1)
plot_decision_boundary(X_simple, y_simple, dt1, "AND Logic (Gini)")

plt.subplot(1, 2, 2)
plot_decision_boundary(X_xor, y_xor, dt2, "XOR Problem (Entropy)")

plt.tight_layout()
plt.show()

## Problem 3: Linear Regression with Regularization 🟡

**Difficulty**: Medium

**Problem**: Implement linear regression with L1 (Lasso), L2 (Ridge), and Elastic Net regularization using coordinate descent.

**Constraints**:
- 1 ≤ n_samples ≤ 10000
- 1 ≤ n_features ≤ 1000
- 0 ≤ alpha ≤ 1000 (regularization strength)
- 0 ≤ l1_ratio ≤ 1 (for Elastic Net)
- max_iter ≤ 10000

**Example**:
```python
X = [[1, 2], [2, 3], [3, 4]]
y = [1, 2, 3]
model = LinearRegression(alpha=0.1, penalty='ridge')
model.fit(X, y)
predictions = model.predict(X)
```

In [None]:
class RegularizedLinearRegression:
    
    def __init__(self, alpha: float = 1.0, penalty: str = 'ridge', 
                 l1_ratio: float = 0.5, max_iter: int = 1000, tol: float = 1e-4):
        self.alpha = alpha
        self.penalty = penalty
        self.l1_ratio = l1_ratio
        self.max_iter = max_iter
        self.tol = tol
        self.coef_ = None
        self.intercept_ = None
        self.n_iter_ = None
    
    def fit(self, X: List[List[float]], y: List[float]) -> 'RegularizedLinearRegression':
        """
        Fit regularized linear regression.
        
        Time Complexity: O(n_features * max_iter)
        Space Complexity: O(n_features)
        """
        X = np.array(X)
        y = np.array(y)
        
        # Add intercept column
        X_with_intercept = np.column_stack([np.ones(X.shape[0]), X])
        
        if self.penalty == 'ridge':
            self._fit_ridge(X_with_intercept, y)
        elif self.penalty == 'lasso':
            self._fit_lasso(X_with_intercept, y)
        elif self.penalty == 'elastic_net':
            self._fit_elastic_net(X_with_intercept, y)
        else:
            self._fit_ols(X_with_intercept, y)
        
        return self
    
    def _fit_ols(self, X: np.ndarray, y: np.ndarray):
        """Ordinary Least Squares using normal equations."""
        beta = np.linalg.solve(X.T @ X, X.T @ y)
        self.intercept_ = beta[0]
        self.coef_ = beta[1:]
        self.n_iter_ = 1
    
    def _fit_ridge(self, X: np.ndarray, y: np.ndarray):
        """Ridge regression using analytical solution."""
        n_features = X.shape[1]
        I = np.eye(n_features)
        I[0, 0] = 0  # Don't penalize intercept
        
        beta = np.linalg.solve(X.T @ X + self.alpha * I, X.T @ y)
        self.intercept_ = beta[0]
        self.coef_ = beta[1:]
        self.n_iter_ = 1
    
    def _fit_lasso(self, X: np.ndarray, y: np.ndarray):
        """Lasso regression using coordinate descent."""
        n_samples, n_features = X.shape
        beta = np.zeros(n_features)
        
        # Precompute X^T X diagonal
        XTX_diag = np.sum(X**2, axis=0)
        
        for iteration in range(self.max_iter):
            beta_old = beta.copy()
            
            for j in range(n_features):
                # Calculate residual without j-th feature
                r_j = y - X @ beta + beta[j] * X[:, j]
                
                # Coordinate update
                rho_j = X[:, j] @ r_j
                
                if j == 0:  # Don't penalize intercept
                    beta[j] = rho_j / XTX_diag[j]
                else:
                    # Soft thresholding
                    if rho_j > self.alpha:
                        beta[j] = (rho_j - self.alpha) / XTX_diag[j]
                    elif rho_j < -self.alpha:
                        beta[j] = (rho_j + self.alpha) / XTX_diag[j]
                    else:
                        beta[j] = 0
            
            # Check convergence
            if np.max(np.abs(beta - beta_old)) < self.tol:
                break
        
        self.intercept_ = beta[0]
        self.coef_ = beta[1:]
        self.n_iter_ = iteration + 1
    
    def _fit_elastic_net(self, X: np.ndarray, y: np.ndarray):
        """Elastic Net regression using coordinate descent."""
        n_samples, n_features = X.shape
        beta = np.zeros(n_features)
        
        # Precompute X^T X diagonal
        XTX_diag = np.sum(X**2, axis=0)
        
        l1_penalty = self.alpha * self.l1_ratio
        l2_penalty = self.alpha * (1 - self.l1_ratio)
        
        for iteration in range(self.max_iter):
            beta_old = beta.copy()
            
            for j in range(n_features):
                # Calculate residual without j-th feature
                r_j = y - X @ beta + beta[j] * X[:, j]
                
                # Coordinate update
                rho_j = X[:, j] @ r_j
                
                if j == 0:  # Don't penalize intercept
                    beta[j] = rho_j / XTX_diag[j]
                else:
                    # Elastic net soft thresholding
                    denominator = XTX_diag[j] + l2_penalty
                    
                    if rho_j > l1_penalty:
                        beta[j] = (rho_j - l1_penalty) / denominator
                    elif rho_j < -l1_penalty:
                        beta[j] = (rho_j + l1_penalty) / denominator
                    else:
                        beta[j] = 0
            
            # Check convergence
            if np.max(np.abs(beta - beta_old)) < self.tol:
                break
        
        self.intercept_ = beta[0]
        self.coef_ = beta[1:]
        self.n_iter_ = iteration + 1
    
    def predict(self, X: List[List[float]]) -> List[float]:
        """Make predictions using fitted model."""
        X = np.array(X)
        return (X @ self.coef_ + self.intercept_).tolist()
    
    def score(self, X: List[List[float]], y: List[float]) -> float:
        """Calculate R-squared score."""
        y_pred = self.predict(X)
        y_true = np.array(y)
        y_pred = np.array(y_pred)
        
        ss_res = np.sum((y_true - y_pred) ** 2)
        ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
        
        return 1 - (ss_res / ss_tot)

# Test regularized linear regression
print("\n=== Problem 3: Regularized Linear Regression ===")

# Generate synthetic data with noise
np.random.seed(42)
n_samples, n_features = 100, 5
X_reg = np.random.randn(n_samples, n_features)
true_coef = np.array([1.5, -2.0, 0.5, 0, 0])  # Some zero coefficients
y_reg = X_reg @ true_coef + 0.1 * np.random.randn(n_samples)

X_reg_list = X_reg.tolist()
y_reg_list = y_reg.tolist()

print(f"\nTrue coefficients: {true_coef}")

# Test different regularization methods
methods = {
    'OLS': RegularizedLinearRegression(alpha=0, penalty='none'),
    'Ridge': RegularizedLinearRegression(alpha=1.0, penalty='ridge'),
    'Lasso': RegularizedLinearRegression(alpha=0.1, penalty='lasso'),
    'Elastic Net': RegularizedLinearRegression(alpha=0.1, penalty='elastic_net', l1_ratio=0.5)
}

results = {}
for name, model in methods.items():
    model.fit(X_reg_list, y_reg_list)
    score = model.score(X_reg_list, y_reg_list)
    
    results[name] = {
        'coef': model.coef_,
        'intercept': model.intercept_,
        'score': score,
        'n_iter': model.n_iter_
    }
    
    print(f"\n{name}:")
    print(f"  Coefficients: {[f'{x:.3f}' for x in model.coef_]}")
    print(f"  R² Score: {score:.4f}")
    print(f"  Iterations: {model.n_iter_}")
    
    # Calculate sparsity (number of zero coefficients)
    sparsity = np.sum(np.abs(model.coef_) < 1e-6)
    print(f"  Sparsity: {sparsity}/{len(model.coef_)} coefficients ≈ 0")

# Visualize coefficient paths
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.ravel()

for idx, (name, result) in enumerate(results.items()):
    coef = result['coef']
    axes[idx].bar(range(len(coef)), coef, alpha=0.7)
    axes[idx].bar(range(len(true_coef)), true_coef, alpha=0.3, color='red', label='True')
    axes[idx].set_title(f'{name} Coefficients')
    axes[idx].set_xlabel('Feature Index')
    axes[idx].set_ylabel('Coefficient Value')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Analysis:")
print("• OLS: No regularization, may overfit")
print("• Ridge: Shrinks coefficients, keeps all features")
print("• Lasso: Performs feature selection (sets coefficients to 0)")
print("• Elastic Net: Combines Ridge and Lasso benefits")

## Problem 4: K-Nearest Neighbors with Custom Distance Metrics 🟢

**Difficulty**: Easy

**Problem**: Implement KNN classifier with multiple distance metrics and efficient nearest neighbor search.

**Constraints**:
- 1 ≤ k ≤ min(50, n_samples)
- 1 ≤ n_samples ≤ 5000
- Support for Euclidean, Manhattan, and Minkowski distances
- Handle both classification and regression

**Example**:
```python
X_train = [[0, 0], [1, 0], [0, 1], [1, 1]]
y_train = [0, 0, 1, 1]
knn = KNN(k=3, metric='euclidean')
knn.fit(X_train, y_train)
predictions = knn.predict([[0.5, 0.5]])
```

In [None]:
class KNearestNeighbors:
    
    def __init__(self, k: int = 3, metric: str = 'euclidean', p: float = 2, task: str = 'classification'):
        self.k = k
        self.metric = metric
        self.p = p  # For Minkowski distance
        self.task = task
        self.X_train = None
        self.y_train = None
    
    def fit(self, X: List[List[float]], y: List[int]) -> 'KNearestNeighbors':
        """
        Store training data.
        
        Time Complexity: O(1)
        Space Complexity: O(n * m)
        """
        self.X_train = np.array(X)
        self.y_train = np.array(y)
        return self
    
    def _distance(self, x1: np.ndarray, x2: np.ndarray) -> float:
        """Calculate distance between two points."""
        if self.metric == 'euclidean':
            return np.sqrt(np.sum((x1 - x2) ** 2))
        elif self.metric == 'manhattan':
            return np.sum(np.abs(x1 - x2))
        elif self.metric == 'minkowski':
            return np.sum(np.abs(x1 - x2) ** self.p) ** (1/self.p)
        elif self.metric == 'cosine':
            dot_product = np.dot(x1, x2)
            norm_x1 = np.linalg.norm(x1)
            norm_x2 = np.linalg.norm(x2)
            return 1 - dot_product / (norm_x1 * norm_x2 + 1e-15)
        else:
            raise ValueError(f"Unknown metric: {self.metric}")
    
    def _find_k_nearest(self, x: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Find k nearest neighbors for a point.
        
        Time Complexity: O(n * d + k log n)
        Space Complexity: O(n)
        """
        distances = np.array([self._distance(x, x_train) for x_train in self.X_train])
        
        # Get indices of k smallest distances
        k_nearest_indices = np.argpartition(distances, min(self.k, len(distances)-1))[:self.k]
        
        return k_nearest_indices, distances[k_nearest_indices]
    
    def predict(self, X: List[List[float]]) -> List[int]:
        """
        Make predictions for input samples.
        
        Time Complexity: O(m * n * d) for m test samples
        Space Complexity: O(k)
        """
        X = np.array(X)
        predictions = []
        
        for x in X:
            k_indices, k_distances = self._find_k_nearest(x)
            k_labels = self.y_train[k_indices]
            
            if self.task == 'classification':
                # Majority voting
                prediction = np.bincount(k_labels).argmax()
            else:  # regression
                # Average of k nearest neighbors
                prediction = np.mean(k_labels)
            
            predictions.append(prediction)
        
        return predictions
    
    def predict_proba(self, X: List[List[float]]) -> List[List[float]]:
        """
        Predict class probabilities.
        """
        if self.task != 'classification':
            raise ValueError("predict_proba only available for classification")
        
        X = np.array(X)
        n_classes = len(np.unique(self.y_train))
        probabilities = []
        
        for x in X:
            k_indices, k_distances = self._find_k_nearest(x)
            k_labels = self.y_train[k_indices]
            
            # Calculate class probabilities
            class_counts = np.bincount(k_labels, minlength=n_classes)
            class_probs = class_counts / self.k
            
            probabilities.append(class_probs.tolist())
        
        return probabilities
    
    def score(self, X: List[List[float]], y: List[int]) -> float:
        """Calculate accuracy for classification or R² for regression."""
        predictions = self.predict(X)
        
        if self.task == 'classification':
            return np.mean(np.array(predictions) == np.array(y))
        else:  # regression
            y_true = np.array(y)
            y_pred = np.array(predictions)
            ss_res = np.sum((y_true - y_pred) ** 2)
            ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
            return 1 - (ss_res / ss_tot)

# Test KNN implementation
print("\n=== Problem 4: K-Nearest Neighbors ===")

# Generate test data
from sklearn.datasets import make_classification
X_knn, y_knn = make_classification(n_samples=200, n_features=2, n_redundant=0, 
                                  n_informative=2, n_clusters_per_class=1, random_state=42)

# Split data
split_idx = int(0.7 * len(X_knn))
X_train_knn = X_knn[:split_idx].tolist()
y_train_knn = y_knn[:split_idx].tolist()
X_test_knn = X_knn[split_idx:].tolist()
y_test_knn = y_knn[split_idx:].tolist()

print(f"Training samples: {len(X_train_knn)}")
print(f"Test samples: {len(X_test_knn)}")

# Test different distance metrics
metrics = ['euclidean', 'manhattan', 'cosine']
k_values = [1, 3, 5, 10]

results_knn = {}
for metric in metrics:
    results_knn[metric] = {}
    for k in k_values:
        knn = KNearestNeighbors(k=k, metric=metric, task='classification')
        knn.fit(X_train_knn, y_train_knn)
        accuracy = knn.score(X_test_knn, y_test_knn)
        results_knn[metric][k] = accuracy

# Display results
print("\nAccuracy Results:")
print("k\t" + "\t".join(f"{metric:>10}" for metric in metrics))
for k in k_values:
    row = f"{k}\t"
    for metric in metrics:
        row += f"{results_knn[metric][k]:>10.3f}\t"
    print(row)

# Find best configuration
best_metric, best_k, best_accuracy = None, None, 0
for metric in metrics:
    for k in k_values:
        if results_knn[metric][k] > best_accuracy:
            best_accuracy = results_knn[metric][k]
            best_metric = metric
            best_k = k

print(f"\nBest configuration: k={best_k}, metric={best_metric}, accuracy={best_accuracy:.3f}")

# Visualize decision boundaries for best model
best_knn = KNearestNeighbors(k=best_k, metric=best_metric, task='classification')
best_knn.fit(X_train_knn, y_train_knn)

# Plot results
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Performance comparison
for metric in metrics:
    accuracies = [results_knn[metric][k] for k in k_values]
    axes[0].plot(k_values, accuracies, 'o-', label=metric, linewidth=2, markersize=8)

axes[0].set_xlabel('k (Number of Neighbors)')
axes[0].set_ylabel('Test Accuracy')
axes[0].set_title('KNN Performance vs k and Distance Metric')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_xticks(k_values)

# Decision boundary visualization
h = 0.1
X_train_np = np.array(X_train_knn)
x_min, x_max = X_train_np[:, 0].min() - 1, X_train_np[:, 0].max() + 1
y_min, y_max = X_train_np[:, 1].min() - 1, X_train_np[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                    np.arange(y_min, y_max, h))

mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = best_knn.predict(mesh_points.tolist())
Z = np.array(Z).reshape(xx.shape)

axes[1].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
colors = ['red', 'blue']
for class_val in [0, 1]:
    mask = np.array(y_train_knn) == class_val
    X_class = X_train_np[mask]
    axes[1].scatter(X_class[:, 0], X_class[:, 1], c=colors[class_val], 
                   label=f'Class {class_val}', s=50, alpha=0.8, edgecolors='black')

axes[1].set_xlabel('Feature 1')
axes[1].set_ylabel('Feature 2')
axes[1].set_title(f'Decision Boundary (k={best_k}, {best_metric})')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 KNN Analysis:")
print("• Small k: More complex decision boundary, may overfit")
print("• Large k: Smoother decision boundary, may underfit")
print("• Distance metric choice depends on data distribution")
print("• Euclidean: Good for continuous features")
print("• Manhattan: Robust to outliers")
print("• Cosine: Good for high-dimensional sparse data")

## Summary and Next Steps 🎓

### 🏆 Problems Completed:

1. **K-Means Clustering** 🟡
   - Multiple initialization strategies (random, k-means++)
   - Convergence criteria and empty cluster handling
   - **Key Insight**: K-means++ initialization reduces iterations

2. **Decision Tree Classifier** 🔴
   - Gini and entropy splitting criteria
   - Recursive tree building with pruning parameters
   - **Key Insight**: Trees can overfit without proper regularization

3. **Regularized Linear Regression** 🟡
   - Ridge, Lasso, and Elastic Net implementations
   - Coordinate descent optimization
   - **Key Insight**: Lasso performs automatic feature selection

4. **K-Nearest Neighbors** 🟢
   - Multiple distance metrics and efficiency considerations
   - Classification and regression support
   - **Key Insight**: Distance metric choice affects performance

### ⚡ Performance Analysis:

| Algorithm | Time Complexity | Space Complexity | Best Use Case |
|-----------|----------------|------------------|---------------|
| **K-Means** | O(nkdi) | O(nk) | Spherical clusters |
| **Decision Tree** | O(n log n * m) | O(log n) | Interpretable models |
| **Ridge/Lasso** | O(m * iterations) | O(m) | High-dimensional data |
| **KNN** | O(nm) per query | O(nm) | Non-parametric problems |

### 🎯 Key Implementation Insights:

1. **Initialization Matters**: K-means++ significantly improves convergence
2. **Regularization Trade-offs**: Ridge vs Lasso vs Elastic Net serve different purposes
3. **Tree Complexity**: Depth and split criteria affect generalization
4. **Distance Metrics**: Euclidean, Manhattan, and Cosine have different properties

### 🚀 Next Notebook Preview:
**Part 3: Data Structures and Efficiency** will cover:
- Efficient nearest neighbor search (KD-trees, LSH)
- Priority queues for streaming algorithms
- Hash tables for feature hashing
- Memory-efficient data structures