## Exercise: Extending the Neural Network

1. **Build a Deeper Network with Multiple Hidden Layers**  
   - **Task:** Extend the network architecture to include two or three hidden layers instead of one.  
   - **Details:**  
     - Replace the single hidden layer with multiple layers, each with its own weight and bias tensors.  
     - Experiment with different activation functions (e.g., Tanh or Sigmoid) between layers.  
     - Compare the performance and training behavior with the original single-layer network.  
   - **Challenge:** Understand how deeper architectures affect gradient flow and learning stability.

2. **Implement Mini-Batch Gradient Descent**  
   - **Task:** Modify the training loop to use mini-batches instead of processing the entire training set at once.  
   - **Details:**  
     - Split the training dataset into small batches (for example, 16 or 32 samples per batch).  
     - In each epoch, iterate through each mini-batch, compute the loss, perform the backward pass, and update the parameters.  
     - Observe how mini-batch training affects convergence speed and accuracy compared to full-batch gradient descent.  
   - **Challenge:** Explore the trade-offs between computational efficiency and convergence noise.

3. **Train on a More Challenging Dataset (e.g., MNIST)**  
   - **Task:** Replace the Iris dataset with a more complex dataset such as MNIST (handwritten digit classification with 10 classes).  
   - **Details:**  
     - Load and preprocess the MNIST dataset (e.g., normalize the pixel values).  
     - Adjust the network’s input layer to accept 28×28 images (flattened to 784 features) and update the output layer to have 10 classes.  
     - Modify the training loop and hyperparameters as needed for the increased complexity.  
   - **Challenge:** Understand how network design and training adjustments are required when scaling up to a more challenging and higher-dimensional dataset.

4. **Add L2 Regularization (Weight Decay)**  
   - **Task:** Introduce L2 regularization manually in the loss function to penalize large weights and potentially reduce overfitting.  
   - **Details:**  
     - After computing the cross-entropy loss, add a regularization term (e.g., lambda * (sum of squared weights)) to the loss.  
     - Experiment with different values of the regularization coefficient (lambda) and observe its effect on training and generalization.  
     - Ensure that biases are optionally excluded from regularization.  
   - **Challenge:** Learn how regularization can improve model generalization and prevent overfitting in neural networks.

5. **Implement Dropout Manually**  
   - **Task:** Add dropout to the network by manually zeroing out a fraction of the neurons during training.  
   - **Details:**  
     - In the forward pass, after computing the activation from the first layer (or one of the hidden layers), create a dropout mask (a tensor of zeros and ones) based on a dropout probability (e.g., 0.5).  
     - Multiply the activations element-wise by this mask to “drop” certain neurons during training.  
     - Ensure that during evaluation (inference), the dropout is turned off (or the activations are scaled appropriately).  
   - **Challenge:** Implementing dropout from scratch reinforces understanding of how randomness can improve model robustness by preventing co-adaptation of neurons.