
# Week 3: Optimization Techniques in Neural Networks
## Objective:
In this notebook, we will explore various optimization techniques used in deep learning, such as Gradient Descent (GD), Stochastic Gradient Descent (SGD), and Mini-Batch Gradient Descent. We will understand how these methods work and implement them in Python to optimize a neural network.



## 3.1 Theory: Introduction to Optimization
Optimization is the process of minimizing a loss function by updating the model's parameters (weights and biases). In neural networks, the most common optimization technique is **Gradient Descent (GD)**, which iteratively adjusts weights to minimize the error.

### Key Optimization Methods:
- **Gradient Descent (GD)**: Updates weights using the entire dataset in each iteration. It is computationally expensive for large datasets.
- **Stochastic Gradient Descent (SGD)**: Updates weights using a single data point in each iteration, making it faster but more noisy.
- **Mini-Batch Gradient Descent**: Combines the advantages of GD and SGD by updating weights using small batches of data, balancing speed and accuracy.

### Learning Rate:
The **learning rate** controls how much the model's weights are adjusted in response to the gradient. Choosing an appropriate learning rate is crucial for model performance.


In [None]:

# Simple Gradient Descent Implementation

import numpy as np

def gradient_descent(X, y, lr=0.01, epochs=1000):
    m, n = X.shape
    weights = np.zeros(n)
    bias = 0
    
    for _ in range(epochs):
        y_pred = np.dot(X, weights) + bias
        error = y_pred - y
        
        # Update weights and bias
        weights -= lr * (1/m) * np.dot(X.T, error)
        bias -= lr * (1/m) * np.sum(error)
    
    return weights, bias

# Example dataset
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([5, 7, 9, 11])

# Running Gradient Descent
weights, bias = gradient_descent(X, y)
print("Weights:", weights)
print("Bias:", bias)



## 3.2 Stochastic Gradient Descent (SGD)
Unlike Gradient Descent, which updates weights using the entire dataset, **SGD** updates the weights using only a single data point at a time. This makes it faster, especially for large datasets, but introduces more noise in the weight updates.


In [None]:

# Stochastic Gradient Descent Implementation

def stochastic_gradient_descent(X, y, lr=0.01, epochs=1000):
    m, n = X.shape
    weights = np.zeros(n)
    bias = 0
    
    for _ in range(epochs):
        for i in range(m):
            y_pred = np.dot(X[i], weights) + bias
            error = y_pred - y[i]
            
            # Update weights and bias
            weights -= lr * error * X[i]
            bias -= lr * error
    
    return weights, bias

# Running SGD
weights_sgd, bias_sgd = stochastic_gradient_descent(X, y)
print("Weights (SGD):", weights_sgd)
print("Bias (SGD):", bias_sgd)



## 3.3 Mini-Batch Gradient Descent
**Mini-Batch Gradient Descent** is a compromise between GD and SGD. It divides the dataset into small batches and performs weight updates on each batch. This approach provides a balance between the stability of GD and the speed of SGD.


In [None]:

# Mini-Batch Gradient Descent Implementation

def mini_batch_gradient_descent(X, y, lr=0.01, epochs=1000, batch_size=2):
    m, n = X.shape
    weights = np.zeros(n)
    bias = 0
    
    for _ in range(epochs):
        indices = np.random.permutation(m)
        X_shuffled = X[indices]
        y_shuffled = y[indices]
        
        for i in range(0, m, batch_size):
            X_batch = X_shuffled[i:i+batch_size]
            y_batch = y_shuffled[i:i+batch_size]
            y_pred = np.dot(X_batch, weights) + bias
            error = y_pred - y_batch
            
            # Update weights and bias
            weights -= lr * (1/batch_size) * np.dot(X_batch.T, error)
            bias -= lr * (1/batch_size) * np.sum(error)
    
    return weights, bias

# Running Mini-Batch Gradient Descent
weights_mini_batch, bias_mini_batch = mini_batch_gradient_descent(X, y)
print("Weights (Mini-Batch GD):", weights_mini_batch)
print("Bias (Mini-Batch GD):", bias_mini_batch)



## 3.4 Exercises:
- Compare the performance of GD, SGD, and Mini-Batch GD by testing them on different datasets.
- Experiment with different learning rates and observe their effect on the speed of convergence.
- Use a larger dataset and compare how fast each optimization method converges to the solution.
