<a href="https://colab.research.google.com/github/usshaa/Deep-Learning-For-all/blob/main/Module_3_Deep_Neural_Networks_(DNNs).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **📌 Module 3: Deep Neural Networks (DNNs)**  
### 🎯 **Learning Objectives**  
- Understand the **architecture of Deep Neural Networks (DNNs)**  
- Learn about **Hyperparameter Tuning** (Learning Rate, Batch Size, Dropout)  
- Explore **Regularization Techniques** (L1, L2, Dropout, Batch Normalization)  
- Understand the **Vanishing & Exploding Gradient Problem**  
- Learn different **Weight Initialization Techniques**  

---

## **🔹 1. Architecture of Deep Neural Networks (DNNs)**  
A **Deep Neural Network (DNN)** is an extension of a **Feedforward Neural Network (FNN)** with multiple **hidden layers** between input and output.  

### ✅ **Typical DNN Architecture**
1. **Input Layer**: Takes raw data (e.g., images, text, numerical data).  
2. **Hidden Layers**: Multiple layers with neurons applying activation functions.  
3. **Output Layer**: Produces the final prediction (e.g., classification, regression).  

### ✅ **Code: Build a Simple DNN in TensorFlow**


In [162]:
import tensorflow as tf
from tensorflow import keras

In [163]:
# Create a Deep Neural Network (DNN)
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(10,)),  # Hidden Layer 1
    keras.layers.Dense(128, activation='relu'),  # Hidden Layer 2
    keras.layers.Dense(64, activation='relu'),  # Hidden Layer 3
    keras.layers.Dense(1, activation='sigmoid')  # Output Layer (Binary Classification)
])

In [164]:
# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

None


✅ **Explanation:**  
- **3 hidden layers** with ReLU activation  
- **Output layer** uses **Sigmoid** (for binary classification)  
- **Adam optimizer** for better weight updates  

---

## **🔹 2. Hyperparameter Tuning**  
Hyperparameters affect the learning process and performance of the model.

| Hyperparameter | Description | Best Practices |
|---------------|------------|---------------|
| **Learning Rate (LR)** | Controls step size in weight updates | Start with `0.001`, adjust based on performance |
| **Batch Size** | Number of samples per training step | Small batch → More noise, Large batch → Faster training |
| **Dropout** | Prevents overfitting by randomly deactivating neurons | Typical values: `0.2 - 0.5` |

### ✅ **Code: Experimenting with Learning Rate**


In [165]:
# Compile Model with Different Learning Rates
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])

✅ **Tip:** If training is slow, **increase learning rate**; if unstable, **decrease learning rate**.

---

## **🔹 3. Regularization Techniques**  
Regularization prevents **overfitting** by reducing model complexity.

### ✅ **L1 & L2 Regularization (Lasso & Ridge)**
| Regularization | Formula | Effect |
|--------------|---------|--------|
| **L1 (Lasso)** | \( L1 = \lambda \sum |w| \) | Encourages sparsity (zero weights) |
| **L2 (Ridge)** | \( L2 = \lambda \sum w^2 \) | Prevents large weight values |

✅ **Example: Adding L1 & L2 Regularization**


In [166]:
from tensorflow.keras.regularizers import l1, l2

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', kernel_regularizer=l1(0.01)),  # L1 Regularization
    keras.layers.Dense(128, activation='relu', kernel_regularizer=l2(0.01)),  # L2 Regularization
    keras.layers.Dense(1, activation='sigmoid')
])

### ✅ **Dropout**
- Randomly drops **neurons** during training to prevent reliance on specific features.

✅ **Example: Adding Dropout**


In [167]:
from tensorflow.keras.layers import Dropout

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.3),  # Drops 30% of neurons
    keras.layers.Dense(1, activation='sigmoid')
])

### ✅ **Batch Normalization**
- Normalizes activations to stabilize learning and speed up convergence.

✅ **Example: Adding Batch Normalization**


In [168]:
from tensorflow.keras.layers import BatchNormalization

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(1, activation='sigmoid')
])

## **🔹 4. Vanishing & Exploding Gradient Problem**  
### **❌ Problem:**
- **Vanishing Gradients**: Small gradients → Slow learning → Poor performance.  
- **Exploding Gradients**: Large gradients → Unstable updates → Training divergence.  

### **✅ Solutions**
| Solution | Description |
|----------|------------|
| **ReLU Activation** | Prevents vanishing gradient (compared to Sigmoid/Tanh) |
| **Batch Normalization** | Keeps activations stable |
| **Gradient Clipping** | Limits gradient values to avoid explosion |
| **Proper Weight Initialization** | Ensures proper signal propagation |

✅ **Example: Using Gradient Clipping**


In [169]:
model.compile(optimizer=keras.optimizers.Adam(clipnorm=1.0), loss='binary_crossentropy')

## **🔹 5. Weight Initialization Techniques**  
Proper weight initialization **avoids vanishing or exploding gradients**.

| Initialization | Purpose | Formula |
|---------------|---------|---------|
| **Random Initialization** | Assigns small random values | \( w \sim U(-0.5, 0.5) \) |
| **Xavier (Glorot)** | Keeps variance stable across layers | \( w \sim \mathcal{N}(0, \frac{1}{n}) \) |
| **He Initialization** | Designed for ReLU networks | \( w \sim \mathcal{N}(0, \frac{2}{n}) \) |

✅ **Example: Using He Initialization (for ReLU)**


In [170]:
from tensorflow.keras.initializers import HeNormal

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', kernel_initializer=HeNormal()),
    keras.layers.Dense(1, activation='sigmoid')
])

# **📊 Summary**
| Topic | Key Takeaways |
|-------|--------------|
| **DNN Architecture** | Multiple hidden layers for deeper learning |
| **Hyperparameter Tuning** | Learning Rate, Batch Size, Dropout |
| **Regularization** | L1/L2 (weight penalties), Dropout, Batch Normalization |
| **Gradient Problems** | Vanishing (slow learning), Exploding (unstable updates) |
| **Weight Initialization** | He (ReLU), Xavier (Sigmoid/Tanh) |

---