# Regularization

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping
import tensorflow_datasets as tfds

2025-04-13 17:35:12.226248: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-13 17:35:12.486331: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744583712.596933   86127 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744583712.628624   86127 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1744583712.862672   86127 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

KeyboardInterrupt: 

In [None]:
# Set memory growth to avoid allocating all GPU memory at once
physical_devices = tf.config.list_physical_devices("GPU")
if physical_devices:
    print(f"Found {len(physical_devices)} GPU(s)")
    for device in physical_devices:
        tf.config.experimental.set_memory_growth(device, True)
        print(f"Memory growth set to True for {device}")
else:
    print("No GPU found, using CPU")

Found 1 GPU(s)
Memory growth set to True for PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')


In [None]:
# Set random seed for reproducibility
tf.random.set_seed(42)

In [None]:
# Use mixed precision to reduce memory usage (can speed up training on newer GPUs)
# Only enable if your GPU supports it
try:
    policy = tf.keras.mixed_precision.Policy("mixed_float16")
    tf.keras.mixed_precision.set_global_policy(policy)
    print("Using mixed precision policy")
except:
    print("Mixed precision not supported or enabled")

Using mixed precision policy


In [None]:
print("Loading Imagenette dataset...")
dataset, info = tfds.load("imagenette/160px", as_supervised=True, with_info=True)

Loading Imagenette dataset...


I0000 00:00:1744570800.791981   49553 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5563 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9


In [None]:
# After loading the dataset
print("Training samples:", info.splits["train"].num_examples)
print("Validation samples:", info.splits["validation"].num_examples)

Training samples: 12894
Validation samples: 500


In [None]:
num_classes = info.features["label"].num_classes
class_names = info.features["label"].names
train_ds = dataset["train"]
valid_ds = dataset["validation"]

In [None]:
# Target size for all images
TARGET_SIZE = (160, 160)

## Data Augmentation

- Random horizontal flips
- Random brightness adjustments (±20%)
- Random contrast adjustments (0.8-1.2)
- Random rotation (by 90-degree increments)

In [None]:
# Define data augmentation
def augment_image(image):
    # Random flip left-right
    image = tf.image.random_flip_left_right(image)

    # Random brightness adjustment
    image = tf.image.random_brightness(image, 0.2)

    # Random contrast adjustment
    image = tf.image.random_contrast(image, 0.8, 1.2)

    # Random rotation (by 90-degree increments)
    image = tf.image.rot90(image, tf.cast(tf.random.uniform([], 0, 4), tf.int32))

    # Ensure pixel values remain in valid range [0, 255]
    return tf.clip_by_value(image, 0, 255)



In [None]:
# Preprocess the data - including resizing to handle varying dimensions
def preprocess_train_data(image, label):
    # Apply data augmentation
    image = augment_image(image)

    # Resize images to consistent dimensions
    image = tf.image.resize(image, TARGET_SIZE)
    image = tf.cast(image, tf.float32) / 255.0  # Normalize to [0,1]
    return image, tf.one_hot(label, num_classes)

In [None]:
def preprocess_val_data(image, label):
    # No augmentation for validation data
    image = tf.image.resize(image, TARGET_SIZE)
    image = tf.cast(image, tf.float32) / 255.0  # Normalize to [0,1]
    return image, tf.one_hot(label, num_classes)

In [None]:
BATCH_SIZE = 16
AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.map(preprocess_train_data, num_parallel_calls=AUTOTUNE)
train_ds = train_ds.shuffle(1000).batch(BATCH_SIZE).prefetch(AUTOTUNE)

valid_ds = valid_ds.map(preprocess_val_data, num_parallel_calls=AUTOTUNE)
valid_ds = valid_ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)

## Building the Model

### Architecture Overview

#### Architecture Components

##### Input
- **Shape**: (160, 160, 3)

##### Convolutional Blocks (3)
Each block follows the same pattern with increasing filter counts:
1. **Block 1**: 32 filters
2. **Block 2**: 64 filters
3. **Block 3**: 128 filters

Each block contains:
- **Two Conv2D layers** with 3×3 kernels, ReLU activation, and "same" padding
- **BatchNormalization** after each convolution
- **MaxPooling2D** (2×2) at the end of the block
- **Dropout** (0.25) for regularization

##### Fully Connected Section
- **Flatten** layer to convert 2D feature maps to 1D
- **Dense** layer with 256 units and ReLU activation
- **BatchNormalization**
- **Dropout** (0.5) - higher rate for dense layer
- **Output Dense** layer with softmax activation (num_classes units)

#### Design Reasoning

##### Convolutional Structure
- **Paired Conv2D layers**: Create deeper feature extraction with fewer parameters than larger kernels
- **Incremental filter growth** (32→64→128): Captures increasingly complex features as spatial dimensions reduce
- **Same padding**: Preserves spatial dimensions within each block
- **MaxPooling**: Reduces spatial dimensions while retaining important features

##### Regularization Strategy
- **BatchNormalization**: Stabilizes training, reduces internal covariate shift, enables higher learning rates
- **Dropout layers**: Prevents overfitting by randomly deactivating neurons during training
  - 0.25 rate after pooling in convolutional blocks
  - Higher 0.5 rate in dense layers (where overfitting risk is greater)

In [None]:
def build_regularized_cnn_model():
    return models.Sequential([
        # First Convolutional Block
        layers.Conv2D(32, (3, 3), activation="relu", padding="same", input_shape=(160, 160, 3)),
        layers.BatchNormalization(),
        layers.Conv2D(32, (3, 3), activation="relu", padding="same"),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),  # Add dropout after pooling

        # Second Convolutional Block
        layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),  # Add dropout after pooling

        # Third Convolutional Block
        layers.Conv2D(128, (3, 3), activation="relu", padding="same"),
        layers.BatchNormalization(),
        layers.Conv2D(128, (3, 3), activation="relu", padding="same"),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),  # Add dropout after pooling

        # Fully Connected Layers
        layers.Flatten(),
        layers.Dense(256, activation="relu"),
        layers.BatchNormalization(),
        layers.Dropout(0.5),  # Higher dropout rate for fully connected layer
        layers.Dense(num_classes, activation="softmax"),
    ])

In [None]:
# Create and compile the model
print("Building and compiling the regularized model...")
model = build_regularized_cnn_model()
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"],
)

model.summary()

Building and compiling the regularized model...


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


```mermaid
flowchart TD
    subgraph "Input"
        input[Input Image 160×160×3]
    end

    subgraph "Block 1"
        conv1_1[Conv2D 3×3, 32]
        bn1_1[BatchNorm]
        conv1_2[Conv2D 3×3, 32]
        bn1_2[BatchNorm]
        pool1[MaxPool 2×2]
        drop1[Dropout 0.25]
    end

    subgraph "Block 2"
        conv2_1[Conv2D 3×3, 64]
        bn2_1[BatchNorm]
        conv2_2[Conv2D 3×3, 64]
        bn2_2[BatchNorm]
        pool2[MaxPool 2×2]
        drop2[Dropout 0.25]
    end

    subgraph "Block 3"
        conv3_1[Conv2D 3×3, 128]
        bn3_1[BatchNorm]
        conv3_2[Conv2D 3×3, 128]
        bn3_2[BatchNorm]
        pool3[MaxPool 2×2]
        drop3[Dropout 0.25]
    end

    subgraph "Fully Connected"
        flat[Flatten]
        fc1[Dense 256]
        bn_fc[BatchNorm]
        drop_fc[Dropout 0.5]
        output[Dense 10 with Softmax]
    end

    input --> conv1_1 --> bn1_1 --> conv1_2 --> bn1_2 --> pool1 --> drop1
    drop1 --> conv2_1 --> bn2_1 --> conv2_2 --> bn2_2 --> pool2 --> drop2
    drop2 --> conv3_1 --> bn3_1 --> conv3_2 --> bn3_2 --> pool3 --> drop3
    drop3 --> flat --> fc1 --> bn_fc --> drop_fc --> output
```

In [None]:
# Save checkpoints only when validation improves (reduces disk I/O)
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="best_regularized_model.weights.h5",
    save_best_only=True,
    save_weights_only=True,
    monitor="val_loss",
    mode="min",
    verbose=1,
)

In [None]:
# Implement early stopping to prevent overfitting
early_stopping = EarlyStopping(
    monitor="val_loss",
    patience=7,  # Increased patience to allow model to learn with regularization
    restore_best_weights=True,
    verbose=1,
)

In [None]:
# Reduce learning rate when plateauing
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_loss",
    factor=0.5,
    patience=3,
    min_lr=0.00001,
    verbose=1,
)

In [None]:
print("Training the regularized model...")
epochs = 50

Training the regularized model...


In [None]:
# Train with history stored but with memory-efficient callbacks
history_regularized = model.fit(
    train_ds,
    validation_data=valid_ds,
    epochs=epochs,
    callbacks=[early_stopping, reduce_lr, checkpoint_callback],
    verbose=2,  # Less output to console
)

Epoch 1/50


2025-04-13 14:13:30.101781: E tensorflow/core/util/util.cc:131] oneDNN supports DT_UINT8 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.
2025-04-13 14:13:30.136499: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:387] The default buffer size is 262144, which is overridden by the user specified `buffer_size` of 8388608
I0000 00:00:1744571610.595640   49753 service.cc:152] XLA service 0x7f9e100028c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1744571610.595795   49753 service.cc:160]   StreamExecutor device (0): NVIDIA GeForce RTX 4060 Laptop GPU, Compute Capability 8.9
2025-04-13 14:13:30.722853: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1744571611.686184   49753 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1744571624.245901   49753 device


Epoch 1: val_loss improved from inf to 3.16996, saving model to best_regularized_model.weights.h5
806/806 - 56s - 69ms/step - accuracy: 0.2887 - loss: 2.3027 - val_accuracy: 0.2000 - val_loss: 3.1700 - learning_rate: 1.0000e-03
Epoch 2/50

Epoch 2: val_loss improved from 3.16996 to 1.21180, saving model to best_regularized_model.weights.h5
806/806 - 17s - 21ms/step - accuracy: 0.4635 - loss: 1.6062 - val_accuracy: 0.6140 - val_loss: 1.2118 - learning_rate: 1.0000e-03
Epoch 3/50

Epoch 3: val_loss improved from 1.21180 to 1.12532, saving model to best_regularized_model.weights.h5
806/806 - 17s - 22ms/step - accuracy: 0.5491 - loss: 1.3634 - val_accuracy: 0.6280 - val_loss: 1.1253 - learning_rate: 1.0000e-03
Epoch 4/50

Epoch 4: val_loss improved from 1.12532 to 0.97990, saving model to best_regularized_model.weights.h5
806/806 - 17s - 21ms/step - accuracy: 0.5789 - loss: 1.2656 - val_accuracy: 0.6940 - val_loss: 0.9799 - learning_rate: 1.0000e-03
Epoch 5/50

Epoch 5: val_loss did not i

In [None]:
# Function to plot metrics that clears data after plotting
def plot_metrics(history, filename="training_history_regularized.png"):
    plt.figure(figsize=(12, 4))

    # Plot training and validation loss
    plt.subplot(1, 2, 1)
    plt.plot(history.history["loss"], label="Training Loss")
    plt.plot(history.history["val_loss"], label="Validation Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend()
    plt.title("Training and Validation Loss")

    # Plot training and validation accuracy
    plt.subplot(1, 2, 2)
    plt.plot(history.history["accuracy"], label="Training Accuracy")
    plt.plot(history.history["val_accuracy"], label="Validation Accuracy")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend()
    plt.title("Training and Validation Accuracy")

    plt.tight_layout()
    plt.savefig(filename)
    plt.close()  # Close to free memory

# Plot and save metrics
plot_metrics(history_regularized)

![Loss and Accuracy](training_history_regularized.png)

In [None]:
# Evaluate the model on the validation set
print("Evaluating the regularized model...")
test_loss, test_accuracy = model.evaluate(valid_ds, verbose=2)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Generate a report
print("\n--- Regularized Model Report ---")
print("Architecture:")
model.summary()
print("\nTraining Results:")
print(f"Final Training Loss: {history_regularized.history['loss'][-1]:.4f}")
print(f"Final Training Accuracy: {history_regularized.history['accuracy'][-1]:.4f}")
print(f"Final Validation Loss: {history_regularized.history['val_loss'][-1]:.4f}")
print(f"Final Validation Accuracy: {history_regularized.history['val_accuracy'][-1]:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
print("\nTraining stopped after {0} epochs".format(len(history_regularized.history['loss'])))


Evaluating the regularized model...
32/32 - 0s - 13ms/step - accuracy: 0.8400 - loss: 0.5492
Test Loss: 0.5492
Test Accuracy: 0.8400

--- Regularized Model Report ---
Architecture:



Training Results:
Final Training Loss: 0.3196
Final Training Accuracy: 0.8973
Final Validation Loss: 0.5665
Final Validation Accuracy: 0.8380
Test Accuracy: 0.8400

Training stopped after 39 epochs


## Comparison

### 1. **Training Dynamics and Epochs**

- **Basic Model:**
  - **Epochs:** Training stopped after **7 epochs**.
  - **Training Accuracy:** Final training accuracy is extremely high (≈99.3%), suggesting that the model nearly “memorized” the training examples.
  - **Validation/Test Accuracy:** Despite the very low training loss (0.0282), the validation and test accuracies are only ≈68.6% (validation) and ≈68.4% (test).  
  - **Interpretation:** The high training accuracy alongside a much lower validation/test accuracy indicates that the basic model is heavily overfitting. In other words, while it fits the training data almost perfectly, it generalizes poorly to unseen data.

- **Regularized Model:**
  - **Epochs:** Training ran for **39 epochs** (a much longer training duration).
  - **Training Accuracy:** The final training accuracy is lower (≈89.7%), indicating that the regularization methods (dropout and batch normalization) are effective in preventing the network from overfitting—even if it means that the network does not perfectly fit the training data.
  - **Validation/Test Accuracy:** The final validation accuracy is ≈83.8%, and the test accuracy is ≈84.0%.  
  - **Interpretation:** The regularized model sacrifices some training performance (it doesn’t hit near-perfect training accuracy) but ends up with a much more robust model that generalizes significantly better. The gap between training and validation/test performance is much smaller, showing that the network has learned features that are more useful for unseen data.

---

### 2. **Parameter Counts and Model Complexity**

- **Basic Model Parameters:**  
  - **Total Parameters:** ~39.6 million  
  - **Trainable Parameters:** ~13.2 million  
  - The large number of parameters comes especially from the fully connected dense layer, which can make the model susceptible to overfitting if not regularized.

- **Regularized Model Parameters:**  
  - **Total Parameters:** ~40.2 million  
  - **Trainable Parameters:** ~13.4 million  
  - Although slightly larger, the additional parameters (and layers) include regularization components (e.g., batch normalization layers with their scale and shift parameters). These additions help keep the network from fitting the noise in the training data.

---

### 3. **Training Curves and Convergence Behavior**

- **Basic Model Training Behavior:**
  - Rapid convergence to extremely high training accuracy in just a few epochs.
  - Large divergence between training and validation performance very early, indicating overfitting.
  
- **Regularized Model Training Behavior:**
  - Slower convergence: The model takes more epochs (39 instead of 7) to settle into its performance level.
  - The inclusion of dropout forces the network to learn redundant and robust representations.
  - Lower training accuracy compared to the basic model is a typical sign that regularization is working by “penalizing” or dropping some activations, but the overall generalization has improved as seen from validation/test metrics.

---

### 4. **Summary of Key Differences**

| Aspect                     | Basic Model                                    | Regularized Model                            |
|----------------------------|------------------------------------------------|----------------------------------------------|
| **Architecture**           | Simple three-block ConvNet with two dense layers | Additional batch normalization and dropout layers in each block; slightly deeper structure |
| **Epochs to Convergence**  | 7 epochs                                       | 39 epochs                                    |
| **Training Accuracy**      | ~99.3% (nearly perfect fit)                    | ~89.7% (more moderate fit, regularized)      |
| **Validation/Test Accuracy** | ~68.6% / 68.4% (poor generalization)       | ~83.8% / 84.0% (much better generalization)    |
| **Overfitting**            | High – large gap between training and validation/test metrics | Mitigated – smaller gap between training and validation/test metrics |
| **Regularization Techniques** | None                                        | Batch Normalization and Dropout applied, plus data augmentation in pre-processing |

---

### 6. **Takeaway**

- **Without Regularization:**  
  The basic model is able to “memorize” the training data quickly, but because it lacks mechanisms to reduce complexity and mitigate overfitting, it performs significantly worse on unseen data (validation/test set), as evidenced by only ~68% accuracy on these sets.

- **With Regularization:**  
  The regularized model shows a controlled training process—training accuracy is lower because the model is not overfitting, and the validation and test accuracies improve significantly (up to ~84%).  
  This is a textbook example of how techniques like dropout and batch normalization, when combined with data augmentation, can improve generalization even in a deep network with many parameters.

---

### Final Conclusion

While the basic CNN achieves nearly perfect performance on the training data within very few epochs, its generalization performance is lacking (around 68% accuracy on unseen data) due to overfitting. The regularized CNN, although trained over more epochs and achieving a lower training accuracy, manages to generalize much better, reaching around 84% accuracy on both the validation and test sets. This comparison clearly demonstrates the importance of incorporating regularization techniques, especially when dealing with large networks susceptible to overfitting.

In [None]:
# Clean up to free memory
import gc

del model
gc.collect()
if physical_devices:
    tf.keras.backend.clear_session()

: 