# Deep Learning From Scratch: A Self-Learning Journey

Welcome to your comprehensive deep learning tutorial! This notebook will guide you through building neural networks from scratch using only Python and basic mathematical libraries.

## 📚 Learning Objectives

By the end of this notebook, you will:
- Understand the fundamental concepts of neural networks
- Implement a neural network from scratch without using deep learning frameworks
- Master forward propagation and backpropagation algorithms
- Apply your knowledge to classify the famous Iris dataset
- Evaluate and optimize your neural network's performance

## 🛠️ Prerequisites

- Basic Python programming knowledge
- Understanding of linear algebra (vectors, matrices, dot products)
- Basic calculus (derivatives, chain rule)
- Familiarity with NumPy

## 📖 Table of Contents

1. **Introduction to Neural Networks**
2. **Mathematical Foundations**
3. **Data Preparation: The Iris Dataset**
4. **Building Our Neural Network Architecture**
5. **Forward Propagation**
6. **Backpropagation and Gradient Descent**
7. **Training the Neural Network**
8. **Model Evaluation and Visualization**
9. **Exercises and Challenges**
10. **Next Steps and Advanced Topics**

---

## ⚡ Getting Started

Let's begin by importing the necessary libraries and setting up our environment.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib for better plots
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (10, 6)

print("✅ All libraries imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

✅ All libraries imported successfully!
NumPy version: 2.3.2
Pandas version: 2.3.1


# 1. Introduction to Neural Networks 🧠

## What is a Neural Network?

A neural network is a computational model inspired by the way biological neural networks in animal brains process information. It consists of:

- **Neurons (Nodes)**: Processing units that receive inputs, perform calculations, and produce outputs
- **Weights**: Parameters that determine the strength of connections between neurons
- **Biases**: Additional parameters that allow neurons to learn patterns more effectively
- **Activation Functions**: Functions that introduce non-linearity to the network

## Key Components:

1. **Input Layer**: Receives the input data
2. **Hidden Layer(s)**: Processes the data through weighted connections
3. **Output Layer**: Produces the final predictions

## Why Build From Scratch?

Building neural networks from scratch helps you:
- Understand the underlying mathematics
- Debug and optimize models more effectively
- Implement custom architectures
- Gain deep intuition about how neural networks work

# 2. Mathematical Foundations 📐

## Essential Mathematical Concepts

### Activation Functions

Activation functions introduce non-linearity to neural networks, allowing them to learn complex patterns.

Let's implement some common activation functions:

In [None]:
class ActivationFunctions:
    """Collection of activation functions and their derivatives"""
    
    @staticmethod
    def sigmoid(x):
        """Sigmoid activation function: f(x) = 1 / (1 + e^(-x))"""
        # Clip x to prevent overflow
        x = np.clip(x, -500, 500)
        return 1 / (1 + np.exp(-x))
    
    @staticmethod
    def sigmoid_derivative(x):
        """Derivative of sigmoid function"""
        return x * (1 - x)
    
    @staticmethod
    def relu(x):
        """ReLU activation function: f(x) = max(0, x)"""
        return np.maximum(0, x)
    
    @staticmethod
    def relu_derivative(x):
        """Derivative of ReLU function"""
        return (x > 0).astype(float)
    
    @staticmethod
    def tanh(x):
        """Hyperbolic tangent activation function"""
        return np.tanh(x)
    
    @staticmethod
    def tanh_derivative(x):
        """Derivative of tanh function"""
        return 1 - x**2
    
    @staticmethod
    def softmax(x):
        """Softmax activation function for multi-class classification"""
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Let's visualize these activation functions
x = np.linspace(-5, 5, 100)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Common Activation Functions', fontsize=16)

# Sigmoid
axes[0,0].plot(x, ActivationFunctions.sigmoid(x), 'b-', linewidth=2)
axes[0,0].set_title('Sigmoid Function')
axes[0,0].set_xlabel('x')
axes[0,0].set_ylabel('f(x)')
axes[0,0].grid(True)

# ReLU
axes[0,1].plot(x, ActivationFunctions.relu(x), 'r-', linewidth=2)
axes[0,1].set_title('ReLU Function')
axes[0,1].set_xlabel('x')
axes[0,1].set_ylabel('f(x)')
axes[0,1].grid(True)

# Tanh
axes[1,0].plot(x, ActivationFunctions.tanh(x), 'g-', linewidth=2)
axes[1,0].set_title('Tanh Function')
axes[1,0].set_xlabel('x')
axes[1,0].set_ylabel('f(x)')
axes[1,0].grid(True)

# Comparison
axes[1,1].plot(x, ActivationFunctions.sigmoid(x), 'b-', label='Sigmoid', linewidth=2)
axes[1,1].plot(x, ActivationFunctions.relu(x), 'r-', label='ReLU', linewidth=2)
axes[1,1].plot(x, ActivationFunctions.tanh(x), 'g-', label='Tanh', linewidth=2)
axes[1,1].set_title('Function Comparison')
axes[1,1].set_xlabel('x')
axes[1,1].set_ylabel('f(x)')
axes[1,1].legend()
axes[1,1].grid(True)

plt.tight_layout()
plt.show()

print("📊 Activation functions visualized!")

# 3. Data Preparation: The Iris Dataset 🌸

The Iris dataset is a classic dataset in machine learning, perfect for learning classification. It contains measurements of iris flowers from three different species.

## Dataset Features:
- **Sepal Length** (cm)
- **Sepal Width** (cm) 
- **Petal Length** (cm)
- **Petal Width** (cm)

## Target Classes:
- **Setosa** (0)
- **Versicolor** (1)
- **Virginica** (2)

Let's load and explore the data:

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = iris.target_names[y]
df['target'] = y

print("🌸 Iris Dataset Overview:")
print(f"Dataset shape: {df.shape}")
print(f"Features: {list(iris.feature_names)}")
print(f"Classes: {list(iris.target_names)}")
print("\nFirst 5 rows:")
print(df.head())

print("\nDataset Statistics:")
print(df.describe())

print("\nClass Distribution:")
print(df['species'].value_counts())

In [None]:
# Visualize the dataset
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Iris Dataset Exploration', fontsize=16)

# Pairplot-style visualization
features = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
colors = ['red', 'blue', 'green']
species = iris.target_names

# Scatter plots
axes[0,0].scatter(df[features[0]], df[features[1]], c=[colors[i] for i in df['target']], alpha=0.7)
axes[0,0].set_xlabel(features[0])
axes[0,0].set_ylabel(features[1])
axes[0,0].set_title('Sepal Length vs Sepal Width')

axes[0,1].scatter(df[features[2]], df[features[3]], c=[colors[i] for i in df['target']], alpha=0.7)
axes[0,1].set_xlabel(features[2])
axes[0,1].set_ylabel(features[3])
axes[0,1].set_title('Petal Length vs Petal Width')

# Box plots
df.boxplot(column=features[0], by='species', ax=axes[1,0])
axes[1,0].set_title('Sepal Length by Species')
axes[1,0].set_xlabel('Species')

df.boxplot(column=features[2], by='species', ax=axes[1,1])
axes[1,1].set_title('Petal Length by Species')
axes[1,1].set_xlabel('Species')

# Add legend
legend_elements = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor=colors[i], 
                             markersize=10, label=species[i]) for i in range(3)]
fig.legend(handles=legend_elements, loc='upper right')

plt.tight_layout()
plt.show()

print("📈 Data visualization complete!")

## Data Preprocessing

Before training our neural network, we need to:
1. Split the data into training and testing sets
2. Standardize the features (important for neural networks)
3. Convert labels to one-hot encoding for multi-class classification

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert labels to one-hot encoding
def to_one_hot(y, num_classes):
    """Convert integer labels to one-hot encoding"""
    one_hot = np.zeros((y.shape[0], num_classes))
    one_hot[np.arange(y.shape[0]), y] = 1
    return one_hot

y_train_onehot = to_one_hot(y_train, 3)
y_test_onehot = to_one_hot(y_test, 3)

print("🔄 Data preprocessing completed!")
print(f"Training set shape: {X_train_scaled.shape}")
print(f"Test set shape: {X_test_scaled.shape}")
print(f"Training labels shape: {y_train_onehot.shape}")
print(f"Test labels shape: {y_test_onehot.shape}")

# Show example of one-hot encoding
print("\nExample of one-hot encoding:")
print(f"Original labels: {y_train[:5]}")
print(f"One-hot encoded:\n{y_train_onehot[:5]}")

# Verify class distribution in train/test splits
print("\nClass distribution in training set:")
unique, counts = np.unique(y_train, return_counts=True)
for i, (cls, count) in enumerate(zip(unique, counts)):
    print(f"{iris.target_names[cls]}: {count} samples")

print("\nClass distribution in test set:")
unique, counts = np.unique(y_test, return_counts=True)
for i, (cls, count) in enumerate(zip(unique, counts)):
    print(f"{iris.target_names[cls]}: {count} samples")

# 4. Building Our Neural Network Architecture 🏗️

Now let's build our neural network from scratch! We'll create a flexible class that can handle different architectures.

## Network Architecture:
- **Input Layer**: 4 neurons (for 4 features)
- **Hidden Layer**: 8 neurons (with sigmoid activation)
- **Output Layer**: 3 neurons (for 3 classes, with softmax activation)

## Key Methods:
1. `forward_propagation()`: Calculate predictions
2. `backward_propagation()`: Calculate gradients
3. `update_weights()`: Update network parameters
4. `train()`: Train the network
5. `predict()`: Make predictions

In [None]:
class NeuralNetwork:
    """A simple neural network built from scratch"""
    
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1):
        """
        Initialize the neural network
        
        Args:
            input_size: Number of input features
            hidden_size: Number of neurons in hidden layer
            output_size: Number of output classes
            learning_rate: Learning rate for gradient descent
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.learning_rate = learning_rate
        
        # Initialize weights and biases
        self.initialize_parameters()
        
        # Store training history
        self.loss_history = []
        self.accuracy_history = []
        
    def initialize_parameters(self):
        """Initialize weights and biases using Xavier initialization"""
        # Weights from input to hidden layer
        self.W1 = np.random.randn(self.input_size, self.hidden_size) * np.sqrt(2.0 / self.input_size)
        self.b1 = np.zeros((1, self.hidden_size))
        
        # Weights from hidden to output layer
        self.W2 = np.random.randn(self.hidden_size, self.output_size) * np.sqrt(2.0 / self.hidden_size)
        self.b2 = np.zeros((1, self.output_size))
        
    def forward_propagation(self, X):
        """
        Perform forward propagation
        
        Args:
            X: Input data
            
        Returns:
            A2: Output predictions
        """
        # Hidden layer
        self.Z1 = np.dot(X, self.W1) + self.b1  # Linear transformation
        self.A1 = ActivationFunctions.sigmoid(self.Z1)  # Activation
        
        # Output layer
        self.Z2 = np.dot(self.A1, self.W2) + self.b2  # Linear transformation
        self.A2 = ActivationFunctions.softmax(self.Z2)  # Softmax for multi-class
        
        return self.A2
    
    def compute_loss(self, y_true, y_pred):
        """
        Compute categorical cross-entropy loss
        
        Args:
            y_true: True labels (one-hot encoded)
            y_pred: Predicted probabilities
            
        Returns:
            loss: Cross-entropy loss
        """
        m = y_true.shape[0]  # Number of samples
        
        # Prevent log(0) by adding small epsilon
        epsilon = 1e-15
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        
        # Cross-entropy loss
        loss = -np.sum(y_true * np.log(y_pred)) / m
        return loss
    
    def backward_propagation(self, X, y_true):
        """
        Perform backward propagation (backpropagation)
        
        Args:
            X: Input data
            y_true: True labels (one-hot encoded)
        """
        m = X.shape[0]  # Number of samples
        
        # Calculate gradients for output layer
        dZ2 = self.A2 - y_true  # Gradient of loss w.r.t Z2
        dW2 = np.dot(self.A1.T, dZ2) / m  # Gradient w.r.t W2
        db2 = np.sum(dZ2, axis=0, keepdims=True) / m  # Gradient w.r.t b2
        
        # Calculate gradients for hidden layer
        dA1 = np.dot(dZ2, self.W2.T)  # Gradient w.r.t A1
        dZ1 = dA1 * ActivationFunctions.sigmoid_derivative(self.A1)  # Gradient w.r.t Z1
        dW1 = np.dot(X.T, dZ1) / m  # Gradient w.r.t W1
        db1 = np.sum(dZ1, axis=0, keepdims=True) / m  # Gradient w.r.t b1
        
        # Store gradients
        self.dW1, self.db1 = dW1, db1
        self.dW2, self.db2 = dW2, db2
    
    def update_parameters(self):
        """Update weights and biases using gradient descent"""
        self.W1 -= self.learning_rate * self.dW1
        self.b1 -= self.learning_rate * self.db1
        self.W2 -= self.learning_rate * self.dW2
        self.b2 -= self.learning_rate * self.db2
    
    def train(self, X, y, epochs=1000, verbose=True):
        """
        Train the neural network
        
        Args:
            X: Training data
            y: Training labels (one-hot encoded)
            epochs: Number of training epochs
            verbose: Whether to print training progress
        """
        for epoch in range(epochs):
            # Forward propagation
            predictions = self.forward_propagation(X)
            
            # Compute loss
            loss = self.compute_loss(y, predictions)
            self.loss_history.append(loss)
            
            # Compute accuracy
            accuracy = self.compute_accuracy(y, predictions)
            self.accuracy_history.append(accuracy)
            
            # Backward propagation
            self.backward_propagation(X, y)
            
            # Update parameters
            self.update_parameters()
            
            # Print progress
            if verbose and (epoch + 1) % 100 == 0:
                print(f"Epoch {epoch + 1}/{epochs}, Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")
    
    def predict(self, X):
        """
        Make predictions
        
        Args:
            X: Input data
            
        Returns:
            predictions: Predicted class probabilities
            predicted_classes: Predicted class labels
        """
        predictions = self.forward_propagation(X)
        predicted_classes = np.argmax(predictions, axis=1)
        return predictions, predicted_classes
    
    def compute_accuracy(self, y_true, y_pred):
        """
        Compute classification accuracy
        
        Args:
            y_true: True labels (one-hot encoded)
            y_pred: Predicted probabilities
            
        Returns:
            accuracy: Classification accuracy
        """
        predicted_classes = np.argmax(y_pred, axis=1)
        true_classes = np.argmax(y_true, axis=1)
        accuracy = np.mean(predicted_classes == true_classes)
        return accuracy

print("🏗️ Neural Network class created successfully!")
print("Ready to build and train our model.")

# 5. Training Our Neural Network 🚀

Now let's create and train our neural network on the Iris dataset!

In [None]:
# Create neural network
# Architecture: 4 inputs → 8 hidden neurons → 3 outputs
nn = NeuralNetwork(input_size=4, hidden_size=8, output_size=3, learning_rate=0.1)

print("🧠 Neural Network Architecture:")
print(f"Input Layer: {nn.input_size} neurons")
print(f"Hidden Layer: {nn.hidden_size} neurons (Sigmoid activation)")
print(f"Output Layer: {nn.output_size} neurons (Softmax activation)")
print(f"Learning Rate: {nn.learning_rate}")
print("\nTraining the neural network...\n")

# Train the network
nn.train(X_train_scaled, y_train_onehot, epochs=1000, verbose=True)

print("\n✅ Training completed!")

# 6. Model Evaluation and Visualization 📊

Let's evaluate our trained neural network and visualize the results.

In [None]:
# Make predictions on both training and test sets
train_predictions, train_pred_classes = nn.predict(X_train_scaled)
test_predictions, test_pred_classes = nn.predict(X_test_scaled)

# Calculate accuracies
train_accuracy = accuracy_score(y_train, train_pred_classes)
test_accuracy = accuracy_score(y_test, test_pred_classes)

print("🎯 Model Performance:")
print(f"Training Accuracy: {train_accuracy:.4f} ({train_accuracy*100:.2f}%)")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

# Detailed classification report
print("\n📈 Detailed Classification Report (Test Set):")
print(classification_report(y_test, test_pred_classes, target_names=iris.target_names))

# Confusion Matrix
cm = confusion_matrix(y_test, test_pred_classes)
print("\n🔍 Confusion Matrix:")
print(cm)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=iris.target_names, 
            yticklabels=iris.target_names)
plt.title('Confusion Matrix - Test Set')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Loss curve
axes[0].plot(nn.loss_history, 'b-', linewidth=2)
axes[0].set_title('Training Loss Over Time')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Cross-Entropy Loss')
axes[0].grid(True)

# Accuracy curve
axes[1].plot(nn.accuracy_history, 'g-', linewidth=2)
axes[1].set_title('Training Accuracy Over Time')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].grid(True)

plt.tight_layout()
plt.show()

print(f"📉 Final Training Loss: {nn.loss_history[-1]:.4f}")
print(f"📈 Final Training Accuracy: {nn.accuracy_history[-1]:.4f}")

In [None]:
# Show some prediction examples
print("🔮 Prediction Examples (Test Set):")
print("=" * 60)

for i in range(10):  # Show first 10 test examples
    true_class = y_test[i]
    pred_class = test_pred_classes[i]
    confidence = np.max(test_predictions[i])
    
    true_name = iris.target_names[true_class]
    pred_name = iris.target_names[pred_class]
    
    status = "✅" if true_class == pred_class else "❌"
    
    print(f"{status} Sample {i+1}: True: {true_name:<12} | Predicted: {pred_name:<12} | Confidence: {confidence:.3f}")

print("\n🎲 Prediction Probabilities for First 5 Test Samples:")
print("=" * 70)
for i in range(5):
    print(f"Sample {i+1}: {test_predictions[i]} (True: {iris.target_names[y_test[i]]})")

# 7. Exercises and Challenges 💪

Now it's your turn to experiment and learn! Try these exercises to deepen your understanding:

## 🎯 Exercise 1: Experiment with Different Architectures
Try changing the number of hidden neurons and see how it affects performance.

In [None]:
# TODO: Experiment with different hidden layer sizes
# Try: 4, 8, 16, 32 hidden neurons

hidden_sizes = [4, 8, 16, 32]
results = {}

print("🧪 Experimenting with different hidden layer sizes...\n")

for hidden_size in hidden_sizes:
    print(f"Testing with {hidden_size} hidden neurons...")
    
    # Create and train network
    nn_exp = NeuralNetwork(input_size=4, hidden_size=hidden_size, 
                          output_size=3, learning_rate=0.1)
    nn_exp.train(X_train_scaled, y_train_onehot, epochs=1000, verbose=False)
    
    # Evaluate
    _, test_pred = nn_exp.predict(X_test_scaled)
    test_acc = accuracy_score(y_test, test_pred)
    
    results[hidden_size] = test_acc
    print(f"Test Accuracy: {test_acc:.4f}\n")

# Plot results
plt.figure(figsize=(10, 6))
plt.bar(results.keys(), results.values(), color='skyblue', alpha=0.8)
plt.title('Test Accuracy vs Hidden Layer Size')
plt.xlabel('Number of Hidden Neurons')
plt.ylabel('Test Accuracy')
plt.ylim(0.8, 1.0)
for size, acc in results.items():
    plt.text(size, acc + 0.01, f'{acc:.3f}', ha='center')
plt.grid(True, alpha=0.3)
plt.show()

print("💡 What do you notice about the relationship between hidden layer size and performance?")

## 🎯 Exercise 2: Learning Rate Sensitivity
Investigate how different learning rates affect training.

In [None]:
# TODO: Experiment with different learning rates
learning_rates = [0.01, 0.05, 0.1, 0.5, 1.0]
lr_results = {}

print("🎛️ Experimenting with different learning rates...\n")

plt.figure(figsize=(12, 8))

for i, lr in enumerate(learning_rates):
    print(f"Testing with learning rate = {lr}...")
    
    # Create and train network
    nn_lr = NeuralNetwork(input_size=4, hidden_size=8, 
                         output_size=3, learning_rate=lr)
    nn_lr.train(X_train_scaled, y_train_onehot, epochs=500, verbose=False)
    
    # Evaluate
    _, test_pred = nn_lr.predict(X_test_scaled)
    test_acc = accuracy_score(y_test, test_pred)
    
    lr_results[lr] = test_acc
    print(f"Final Test Accuracy: {test_acc:.4f}")
    
    # Plot training curves
    plt.subplot(2, 3, i+1)
    plt.plot(nn_lr.loss_history, label=f'LR={lr}')
    plt.title(f'Learning Rate: {lr}')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.grid(True)
    
    if i == len(learning_rates) - 1:  # Last subplot - show comparison
        plt.subplot(2, 3, i+2)
        plt.bar(lr_results.keys(), lr_results.values(), color='lightcoral', alpha=0.8)
        plt.title('Final Test Accuracy')
        plt.xlabel('Learning Rate')
        plt.ylabel('Accuracy')
        for lr_val, acc in lr_results.items():
            plt.text(lr_val, acc + 0.01, f'{acc:.3f}', ha='center', fontsize=8)
        plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Questions to consider:")
print("- Which learning rate converged fastest?")
print("- Which learning rate achieved the best final accuracy?")
print("- What happens with very high learning rates?")

## 🎯 Exercise 3: Different Activation Functions
Modify the neural network to use different activation functions in the hidden layer.

In [None]:
# TODO: Create a modified neural network class that can use different activation functions
# Hint: You'll need to modify the forward_propagation and backward_propagation methods

class FlexibleNeuralNetwork(NeuralNetwork):
    """Neural network with configurable activation function"""
    
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1, activation='sigmoid'):
        super().__init__(input_size, hidden_size, output_size, learning_rate)
        self.activation = activation
        
    def forward_propagation(self, X):
        """Forward propagation with configurable activation function"""
        # Hidden layer
        self.Z1 = np.dot(X, self.W1) + self.b1
        
        # Apply chosen activation function
        if self.activation == 'sigmoid':
            self.A1 = ActivationFunctions.sigmoid(self.Z1)
        elif self.activation == 'relu':
            self.A1 = ActivationFunctions.relu(self.Z1)
        elif self.activation == 'tanh':
            self.A1 = ActivationFunctions.tanh(self.Z1)
        
        # Output layer (always softmax for classification)
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        self.A2 = ActivationFunctions.softmax(self.Z2)
        
        return self.A2
    
    def backward_propagation(self, X, y_true):
        """Backpropagation with configurable activation function"""
        m = X.shape[0]
        
        # Output layer gradients
        dZ2 = self.A2 - y_true
        dW2 = np.dot(self.A1.T, dZ2) / m
        db2 = np.sum(dZ2, axis=0, keepdims=True) / m
        
        # Hidden layer gradients
        dA1 = np.dot(dZ2, self.W2.T)
        
        # Apply appropriate derivative
        if self.activation == 'sigmoid':
            dZ1 = dA1 * ActivationFunctions.sigmoid_derivative(self.A1)
        elif self.activation == 'relu':
            dZ1 = dA1 * ActivationFunctions.relu_derivative(self.Z1)
        elif self.activation == 'tanh':
            dZ1 = dA1 * ActivationFunctions.tanh_derivative(self.A1)
        
        dW1 = np.dot(X.T, dZ1) / m
        db1 = np.sum(dZ1, axis=0, keepdims=True) / m
        
        # Store gradients
        self.dW1, self.db1 = dW1, db1
        self.dW2, self.db2 = dW2, db2

# Test different activation functions
activations = ['sigmoid', 'relu', 'tanh']
activation_results = {}

print("🔧 Testing different activation functions...\n")

for activation in activations:
    print(f"Testing with {activation} activation...")
    
    # Create and train network
    nn_act = FlexibleNeuralNetwork(input_size=4, hidden_size=8, 
                                  output_size=3, learning_rate=0.1, 
                                  activation=activation)
    nn_act.train(X_train_scaled, y_train_onehot, epochs=1000, verbose=False)
    
    # Evaluate
    _, test_pred = nn_act.predict(X_test_scaled)
    test_acc = accuracy_score(y_test, test_pred)
    
    activation_results[activation] = {
        'accuracy': test_acc,
        'loss_history': nn_act.loss_history
    }
    print(f"Test Accuracy: {test_acc:.4f}\n")

# Plot comparison
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Loss curves
for activation in activations:
    axes[0].plot(activation_results[activation]['loss_history'], 
                label=f'{activation.capitalize()}', linewidth=2)
axes[0].set_title('Training Loss by Activation Function')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True)

# Final accuracies
accuracies = [activation_results[act]['accuracy'] for act in activations]
axes[1].bar(activations, accuracies, color=['blue', 'red', 'green'], alpha=0.7)
axes[1].set_title('Final Test Accuracy by Activation Function')
axes[1].set_xlabel('Activation Function')
axes[1].set_ylabel('Test Accuracy')
axes[1].set_ylim(0.8, 1.0)
for i, acc in enumerate(accuracies):
    axes[1].text(i, acc + 0.01, f'{acc:.3f}', ha='center')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("💡 Observations:")
for activation in activations:
    acc = activation_results[activation]['accuracy']
    print(f"- {activation.capitalize()}: {acc:.4f} accuracy")

## 🏆 Challenge: Build a Deeper Network

Can you modify the neural network to have multiple hidden layers? This is more advanced!

In [None]:
# TODO: Create a deep neural network with multiple hidden layers
# This is a challenging exercise - try to implement a network with 2-3 hidden layers

print("🏆 CHALLENGE: Build a Deep Neural Network")
print("="*50)
print("Task: Implement a neural network with multiple hidden layers")
print("Suggested architecture: 4 → 16 → 8 → 4 → 3")
print("")
print("Tips:")
print("- You'll need to track multiple weight matrices and bias vectors")
print("- Forward propagation becomes more complex with more layers")
print("- Backpropagation requires careful chain rule application")
print("- Consider using ReLU activation for hidden layers to avoid vanishing gradients")
print("")
print("This is an advanced exercise - don't worry if it's challenging!")
print("The key is understanding the concepts we've learned so far.")

# Starter code structure (you fill in the implementation)
class DeepNeuralNetwork:
    """Deep neural network with multiple hidden layers"""
    
    def __init__(self, layer_sizes, learning_rate=0.01):
        """
        Initialize deep neural network
        
        Args:
            layer_sizes: List of layer sizes [input, hidden1, hidden2, ..., output]
            learning_rate: Learning rate for gradient descent
        """
        # TODO: Initialize weights and biases for multiple layers
        pass
    
    def forward_propagation(self, X):
        """Forward propagation through multiple layers"""
        # TODO: Implement forward pass through all layers
        pass
    
    def backward_propagation(self, X, y_true):
        """Backpropagation through multiple layers"""
        # TODO: Implement backward pass through all layers
        pass
    
    def train(self, X, y, epochs=1000):
        """Train the deep network"""
        # TODO: Implement training loop
        pass

print("\n🤔 Think about:")
print("- How would you store weights for multiple layers?")
print("- How does backpropagation change with more layers?")
print("- What are the advantages and disadvantages of deeper networks?")

# 8. Next Steps and Advanced Topics 🚀

Congratulations! You've successfully built and trained a neural network from scratch. Here's what you've learned:

## ✅ What You've Accomplished:
- Implemented forward propagation
- Implemented backpropagation and gradient descent
- Trained a neural network on real data
- Evaluated model performance
- Experimented with different hyperparameters

## 🎯 Advanced Topics to Explore Next:

### 1. **Regularization Techniques**
- L1/L2 regularization to prevent overfitting
- Dropout for better generalization
- Early stopping

### 2. **Optimization Algorithms**
- Momentum
- Adam optimizer
- Learning rate scheduling

### 3. **Deep Learning Frameworks**
- TensorFlow/Keras
- PyTorch
- JAX

### 4. **Advanced Architectures**
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Transformer networks

### 5. **Specialized Applications**
- Computer Vision
- Natural Language Processing
- Reinforcement Learning

In [None]:
# Final summary and resources
print("🎉 CONGRATULATIONS! 🎉")
print("="*50)
print("You have successfully completed the Deep Learning from Scratch tutorial!")
print("")
print("📊 Your Neural Network Performance Summary:")
print(f"- Final Training Accuracy: {train_accuracy:.4f} ({train_accuracy*100:.2f}%)")
print(f"- Final Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print(f"- Network Architecture: {nn.input_size} → {nn.hidden_size} → {nn.output_size}")
print(f"- Training Epochs: 1000")
print(f"- Final Loss: {nn.loss_history[-1]:.4f}")
print("")
print("🧠 Key Concepts Mastered:")
concepts = [
    "Neural network architecture",
    "Forward propagation",
    "Backpropagation algorithm",
    "Gradient descent optimization",
    "Activation functions",
    "Loss functions (cross-entropy)",
    "Model evaluation metrics",
    "Hyperparameter tuning"
]

for i, concept in enumerate(concepts, 1):
    print(f"  {i}. ✅ {concept}")

print("")
print("📚 Recommended Next Steps:")
print("1. Implement regularization techniques (L2, Dropout)")
print("2. Try the network on different datasets (Boston Housing, Wine, etc.)")
print("3. Experiment with deeper architectures")
print("4. Learn about convolutional neural networks for image data")
print("5. Explore modern deep learning frameworks (PyTorch, TensorFlow)")
print("")
print("🌟 Keep learning and building amazing AI applications!")

# 📖 Additional Resources

## Books:
- **"Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville** - The comprehensive mathematical foundation
- **"Neural Networks and Deep Learning" by Michael Nielsen** - Excellent intuitive explanations
- **"Hands-On Machine Learning" by Aurélien Géron** - Practical implementation focus

## Online Courses:
- **Andrew Ng's Deep Learning Specialization (Coursera)** - Systematic and thorough
- **Fast.ai Practical Deep Learning** - Top-down approach
- **CS231n (Stanford)** - Computer vision focus

## Practice Platforms:
- **Kaggle** - Real-world datasets and competitions
- **Google Colab** - Free GPU access for experimentation
- **Papers With Code** - Latest research with implementations

## Key Mathematical Topics to Strengthen:
- Linear Algebra (matrices, eigenvalues, SVD)
- Calculus (partial derivatives, chain rule)
- Probability and Statistics
- Information Theory

---

**Remember**: The best way to learn deep learning is by doing. Keep experimenting, building projects, and don't be afraid to make mistakes. Every expert was once a beginner! 🚀