Trey Tuscai and Gordon Doore

Spring 2025

CS 444: Deep Learning

Project 2: Branch Neural Networks

#### Week 2: Residual networks

The focus this week is on the ResNet architecture. You will build several neural networks in the ResNet family and and train them on CIFAR-10 and CIFAR-100.

In [1]:
import numpy as np
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt

plt.style.use(['seaborn-v0_8-colorblind', 'seaborn-v0_8-darkgrid'])
plt.rcParams.update({'font.size': 20})

np.set_printoptions(suppress=True, precision=7)

# Automatically reload your external source code
%load_ext autoreload
%autoreload 2

2025-04-06 16:58:57.263229: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-06 16:59:04.345661: I tensorflow/core/platform/cpu_feature_guard.cc:211] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Task 6: The Residual Block

This task focuses on implementing and testing the **Residual Block** in preparation of creating the first ResNet (**ResNet-8**). 

Much like how Inception Blocks represent the building blocks of Inception Net, stacks of Residual Blocks represent the basis of ResNet. Residual Blocks possess a simpler structure than Inception Blocks — they only contain two parallel branches with fewer layers. Here is a refresher on the structure of the branches:

**Main branch:** sequence of two 2D convolutional layers.

**Residual branch:** the input signal to the Residual Block passes through "as-is", without modification (usually).

Like Inception Block, the output of both branches comes together at the end of the block. However, the branch outputs are SUMMED together rather than being concatenated.


This is the story for most Residual Blocks, however, like most CNNs:
1. the spatial resolution of the activations occasionally decreases
2. the number of conv filters/neurons increases

as we go deeper in a ResNet. Both of these factors tend to change *at the same time* in a small number of Residual Blocks located at various depths of the ResNet. Put another way, the spatial resolution and number of filters tends to remain constant across most successive Residual Blocks and they only changes in a few blocks throughout the net.
1. The decrease in spatial resolution is implemented in these small number of Residual Blocks with a convolutional stride > 1.
2. A 1x1 convolutional layer is needed as the "special sauce" along the residual branch to make the shapes of signals in both branches match (*otherwise they could not be summed!*).

### 6a. Implement and test the Residual Block

The class is in `residual_block.py`.

In [2]:
from residual_block import ResidualBlock

#### Test: `ResidualBlock` Stride 1 (1/2)

In [3]:
# Testing architecture and shapes
# Stride 1
tf.random.set_seed(0)
res1 = ResidualBlock('TestResidualBlock_S1', 7, prev_layer_or_block=None, strides=1)
res1(tf.ones([1, 4, 4, 7]))
print(res1)

2025-04-06 17:00:12.147111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20601 MB memory:  -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9


2025-04-06 17:00:17.751333: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 90400


TestResidualBlock_S1:
	Conv2D layer output(TestResidualBlock_S1/main_3x3conv_2) shape: [1, 4, 4, 7]
	Conv2D layer output(TestResidualBlock_S1/main_3x3conv_1) shape: [1, 4, 4, 7]


The above cell should print:

```
TestResidualBlock_S1:
	Conv2D layer output(TestResidualBlock_S1/main_3x3conv_2) shape: [1, 4, 4, 7]
	Conv2D layer output(TestResidualBlock_S1/main_3x3conv_1) shape: [1, 4, 4, 7]
```

In [4]:
# Test activations
tf.random.set_seed(0)
net_acts1 = res1(tf.random.uniform([2, 4, 4, 7]))
print(f'The shape of the netAct output from the block is {net_acts1.shape} and should be (2, 4, 4, 7)')
print(f'The first few activations are:\n{net_acts1[0,:,:, 0]}')
print('and should be:')
print('''[[0.        0.520823  0.1888617 0.       ]
 [0.        0.7621396 0.1734907 0.8486798]
 [0.        0.6156113 0.4272216 0.       ]
 [0.5561852 0.4888234 1.0138503 0.5533389]]''')

The shape of the netAct output from the block is (2, 4, 4, 7) and should be (2, 4, 4, 7)
The first few activations are:
[[0.        0.5208229 0.1888617 0.       ]
 [0.        0.7621396 0.1734906 0.8486798]
 [0.        0.6156112 0.4272216 0.       ]
 [0.5561852 0.4888234 1.0138503 0.5533389]]
and should be:
[[0.        0.520823  0.1888617 0.       ]
 [0.        0.7621396 0.1734907 0.8486798]
 [0.        0.6156113 0.4272216 0.       ]
 [0.5561852 0.4888234 1.0138503 0.5533389]]


#### Test: `ResidualBlock` Stride 2 (2/2)

In [5]:
# Testing architecture and shapes
# Stride 2
tf.random.set_seed(0)
res2 = ResidualBlock('TestResidualBlock_S2', 5, prev_layer_or_block=None, strides=2)
res2(tf.ones([1, 6, 6, 5]))
print(res2)

TestResidualBlock_S2:
	Conv2D layer output(TestResidualBlock_S2/main_3x3conv_2) shape: [1, 3, 3, 5]
	Conv2D layer output(TestResidualBlock_S2/main_3x3conv_1) shape: [1, 3, 3, 5]
	-->Conv2D1x1 layer output(TestResidualBlock_S2/skip_conv1x1) shape: [1, 3, 3, 5]-->


The above cell should print:

```
TestResidualBlock_S2:
	Conv2D layer output(TestResidualBlock_S2/main_3x3conv_2) shape: [1, 3, 3, 5]
	Conv2D layer output(TestResidualBlock_S2/main_3x3conv_1) shape: [1, 3, 3, 5]
	-->Conv2D1x1 layer output(TestResidualBlock_S2/skip_conv1x1) shape: [1, 3, 3, 5]-->
```

*The layer with the --> is the residual branch.*

In [6]:
# Test activations
tf.random.set_seed(0)
net_acts2 = res2(tf.random.uniform([3, 6, 6, 5]))
print(f'The shape of the netAct output from the block is {net_acts2.shape} and should be (3, 3, 3, 5)')
print(f'The first few activations are:\n{net_acts2[0,:,:, :]}')
print('and should be:')
print('''[[[0.2404823 0.        0.        0.2851936 0.       ]
  [0.        0.        0.        0.1339086 0.6898913]
  [0.        0.        0.        0.4596353 0.2781557]]

 [[0.        0.        0.        0.6591434 1.3703969]
  [0.2665227 0.        0.        0.9614864 0.       ]
  [0.        0.        0.        0.3844326 0.7111533]]

 [[0.0933782 0.        0.        0.1378801 0.3006183]
  [0.1873689 0.        0.        0.4464224 1.1067129]
  [0.        0.        0.        0.7910071 0.345379 ]]]''')

The shape of the netAct output from the block is (3, 3, 3, 5) and should be (3, 3, 3, 5)
The first few activations are:
[[[0.2404822 0.        0.        0.2851935 0.       ]
  [0.        0.        0.        0.1339087 0.6898913]
  [0.        0.        0.        0.4596352 0.2781556]]

 [[0.        0.        0.        0.6591434 1.3703969]
  [0.2665228 0.        0.        0.9614864 0.       ]
  [0.        0.        0.        0.3844326 0.7111532]]

 [[0.0933783 0.        0.        0.1378801 0.3006182]
  [0.1873689 0.        0.        0.4464225 1.106713 ]
  [0.        0.        0.        0.7910072 0.345379 ]]]
and should be:
[[[0.2404823 0.        0.        0.2851936 0.       ]
  [0.        0.        0.        0.1339086 0.6898913]
  [0.        0.        0.        0.4596353 0.2781557]]

 [[0.        0.        0.        0.6591434 1.3703969]
  [0.2665227 0.        0.        0.9614864 0.       ]
  [0.        0.        0.        0.3844326 0.7111533]]

 [[0.0933782 0.        0.        0.1378801 0.

## Task 7: ResNet-8

Assemble the Residual Blocks and several other layers to build ResNet-8:

Conv2D → ResidualBlock → ResidualBlock → ResidualBlock → GlobalAveragePooling2D → Dense

After an overfit test to help check whether the network is working, you will train the network on both CIFAR-10 and CIFAR-100.

In [7]:
from resnets import ResNet8

### 7a. Build ResNet-8

Implement the following classes in `resnets.py`:
1. `ResNet`: Parent class of all specific ResNets (e.g. ResNet-8, ResNet-18, etc.). Having this class helps reduce code size/duplication because the forward pass thru all ResNets is exactly the same!
2. `ResNet8`: Assemble the first (*and smallest*) net in the family!

#### Test: `ResNet8` architecture and shapes

In [8]:
res8 = ResNet8(C=3, input_feats_shape=(32, 32, 3))
res8.compile()

---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 3]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 128]
ResidualBlock_3:
	Conv2D layer output(ResidualBlock_3/main_3x3conv_2) shape: [1, 8, 8, 128]
	Conv2D layer output(ResidualBlock_3/main_3x3conv_1) shape: [1, 8, 8, 128]
	-->Conv2D1x1 layer output(ResidualBlock_3/skip_conv1x1) shape: [1, 8, 8, 128]-->
ResidualBlock_2:
	Conv2D layer output(ResidualBlock_2/main_3x3conv_2) shape: [1, 16, 16, 64]
	Conv2D layer output(ResidualBlock_2/main_3x3conv_1) shape: [1, 16, 16, 64]
	-->Conv2D1x1 layer output(ResidualBlock_2/skip_conv1x1) shape: [1, 16, 16, 64]-->
ResidualBlock_1:
	Conv2D layer output(ResidualBlock_1/main_3x3conv_2) shape: [1, 32, 32, 32]
	Conv2D layer output(ResidualBlock_1/main_3x3conv_1) shape: [1, 32, 32, 32]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 32]


The above cell should print:

```
---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 3]
Global Avg Pooling 2D layer output(GlobalAvgPool2D) shape: [1, 128]
ResidualBlock_3:
	Conv2D layer output(ResidualBlock_3/main_3x3conv_2) shape: [1, 8, 8, 128]
	Conv2D layer output(ResidualBlock_3/main_3x3conv_1) shape: [1, 8, 8, 128]
	-->Conv2D1x1 layer output(ResidualBlock_3/skip_conv1x1) shape: [1, 8, 8, 128]-->
ResidualBlock_2:
	Conv2D layer output(ResidualBlock_2/main_3x3conv_2) shape: [1, 16, 16, 64]
	Conv2D layer output(ResidualBlock_2/main_3x3conv_1) shape: [1, 16, 16, 64]
	-->Conv2D1x1 layer output(ResidualBlock_2/skip_conv1x1) shape: [1, 16, 16, 64]-->
ResidualBlock_1:
	Conv2D layer output(ResidualBlock_1/main_3x3conv_2) shape: [1, 32, 32, 32]
	Conv2D layer output(ResidualBlock_1/main_3x3conv_1) shape: [1, 32, 32, 32]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 32]
```

### 7b. CIFAR-10 overfit test

In the cell below, import CIFAR-10 and reproduce our usual overfit protocol:
1. Create a dev set from the 1st 500 training CIFAR-10 samples.
2. Train your net on the dev set for `80` epochs (turn off early stopping for this test). *Do not use any regularization.* 

Your training loss should start out at ~2.3 after the first epoch and rapidly plummet to 0.01 or less by about 70 epochs.

**Note:** If you coded `fit` to assume there will always be a validation set present, no problem, just plug in the dev set for both the train and val sets.

In [9]:
from datasets import get_dataset

In [10]:
x_train, y_train, x_val, y_val, x_test, y_test, classnames = get_dataset('cifar10')
x_dev = x_train[:500]
y_dev = y_train[:500]

In [None]:
tf.keras.backend.clear_session()
tf.random.set_seed(0)

# YOUR CODE HERE
model = ResNet8(10, (32,32,3), reg = 0)
model.compile(optimizer='adamw')
model.fit(x_dev, y_dev, x_dev, y_dev, max_epochs = 80, val_every = 1, verbose = True)

---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 10]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 128]
ResidualBlock_3:
	Conv2D layer output(ResidualBlock_3/main_3x3conv_2) shape: [1, 8, 8, 128]
	Conv2D layer output(ResidualBlock_3/main_3x3conv_1) shape: [1, 8, 8, 128]
	-->Conv2D1x1 layer output(ResidualBlock_3/skip_conv1x1) shape: [1, 8, 8, 128]-->
ResidualBlock_2:
	Conv2D layer output(ResidualBlock_2/main_3x3conv_2) shape: [1, 16, 16, 64]
	Conv2D layer output(ResidualBlock_2/main_3x3conv_1) shape: [1, 16, 16, 64]
	-->Conv2D1x1 layer output(ResidualBlock_2/skip_conv1x1) shape: [1, 16, 16, 64]-->
ResidualBlock_1:
	Conv2D layer output(ResidualBlock_1/main_3x3conv_2) shape: [1, 32, 32, 32]
	Conv2D layer output(ResidualBlock_1/main_3x3conv_1) shape: [1, 32, 32, 32]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 32]


I0000 00:00:1743958870.215802     584 service.cc:145] XLA service 0x799e0e54e640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1743958870.215851     584 service.cc:153]   StreamExecutor device (0): NVIDIA L4, Compute Capability 8.9


2025-04-06 17:01:10.628313: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.


I0000 00:00:1743958871.801254     584 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 1: Training Loss = 2.5854, Validation Loss = 2.3263, Validation Accuracy = 0.1473
Epoch 1/80 took 13.3567 seconds
Epoch 2: Training Loss = 2.2497, Validation Loss = 2.1903, Validation Accuracy = 0.1920
Epoch 2/80 took 0.1009 seconds
Epoch 3: Training Loss = 2.1547, Validation Loss = 2.1454, Validation Accuracy = 0.2188
Epoch 3/80 took 0.0528 seconds


Epoch 4: Training Loss = 2.1165, Validation Loss = 2.0979, Validation Accuracy = 0.2031
Epoch 4/80 took 0.0546 seconds
Epoch 5: Training Loss = 2.0975, Validation Loss = 2.0723, Validation Accuracy = 0.2299
Epoch 5/80 took 0.0524 seconds
Epoch 6: Training Loss = 2.0260, Validation Loss = 2.0176, Validation Accuracy = 0.2388
Epoch 6/80 took 0.0517 seconds
Epoch 7: Training Loss = 2.0233, Validation Loss = 2.0264, Validation Accuracy = 0.2277
Epoch 7/80 took 0.0515 seconds


Epoch 8: Training Loss = 2.0125, Validation Loss = 1.9709, Validation Accuracy = 0.2946
Epoch 8/80 took 0.0540 seconds
Epoch 9: Training Loss = 1.9300, Validation Loss = 1.9440, Validation Accuracy = 0.2701
Epoch 9/80 took 0.0575 seconds
Epoch 10: Training Loss = 1.9202, Validation Loss = 1.8925, Validation Accuracy = 0.2701
Epoch 10/80 took 0.0519 seconds
Epoch 11: Training Loss = 1.8509, Validation Loss = 1.8679, Validation Accuracy = 0.2946
Epoch 11/80 took 0.0522 seconds


Epoch 12: Training Loss = 1.8899, Validation Loss = 1.8093, Validation Accuracy = 0.3661
Epoch 12/80 took 0.0550 seconds
Epoch 13: Training Loss = 1.8793, Validation Loss = 1.8117, Validation Accuracy = 0.3013
Epoch 13/80 took 0.0515 seconds
Epoch 14: Training Loss = 1.7787, Validation Loss = 1.7602, Validation Accuracy = 0.3214
Epoch 14/80 took 0.0514 seconds
Epoch 15: Training Loss = 1.7517, Validation Loss = 1.7021, Validation Accuracy = 0.3884
Epoch 15/80 took 0.0523 seconds


Epoch 16: Training Loss = 1.7173, Validation Loss = 1.7423, Validation Accuracy = 0.3348
Epoch 16/80 took 0.0522 seconds
Epoch 17: Training Loss = 1.6980, Validation Loss = 1.6522, Validation Accuracy = 0.3817
Epoch 17/80 took 0.0516 seconds
Epoch 18: Training Loss = 1.5991, Validation Loss = 1.6928, Validation Accuracy = 0.3906
Epoch 18/80 took 0.0516 seconds
Epoch 19: Training Loss = 1.5919, Validation Loss = 1.5886, Validation Accuracy = 0.4330
Epoch 19/80 took 0.0521 seconds


Epoch 20: Training Loss = 1.6312, Validation Loss = 1.6474, Validation Accuracy = 0.3929
Epoch 20/80 took 0.0538 seconds
Epoch 21: Training Loss = 1.6032, Validation Loss = 1.5041, Validation Accuracy = 0.4754
Epoch 21/80 took 0.0515 seconds
Epoch 22: Training Loss = 1.5456, Validation Loss = 1.5439, Validation Accuracy = 0.4286
Epoch 22/80 took 0.0529 seconds
Epoch 23: Training Loss = 1.5534, Validation Loss = 1.3962, Validation Accuracy = 0.5446
Epoch 23/80 took 0.0515 seconds


Epoch 24: Training Loss = 1.4298, Validation Loss = 1.4104, Validation Accuracy = 0.5134
Epoch 24/80 took 0.0531 seconds
Epoch 25: Training Loss = 1.4482, Validation Loss = 1.4102, Validation Accuracy = 0.4888
Epoch 25/80 took 0.0519 seconds
Epoch 26: Training Loss = 1.3011, Validation Loss = 1.3455, Validation Accuracy = 0.5223
Epoch 26/80 took 0.0522 seconds
Epoch 27: Training Loss = 1.2721, Validation Loss = 1.3218, Validation Accuracy = 0.5156
Epoch 27/80 took 0.0518 seconds


Epoch 28: Training Loss = 1.3519, Validation Loss = 1.2503, Validation Accuracy = 0.5446
Epoch 28/80 took 0.0522 seconds
Epoch 29: Training Loss = 1.1772, Validation Loss = 1.1688, Validation Accuracy = 0.6049
Epoch 29/80 took 0.0516 seconds
Epoch 30: Training Loss = 1.1537, Validation Loss = 1.1478, Validation Accuracy = 0.5982
Epoch 30/80 took 0.0521 seconds
Epoch 31: Training Loss = 1.1815, Validation Loss = 1.1110, Validation Accuracy = 0.6004
Epoch 31/80 took 0.0525 seconds


Epoch 32: Training Loss = 1.1633, Validation Loss = 1.1085, Validation Accuracy = 0.6116
Epoch 32/80 took 0.0547 seconds
Epoch 33: Training Loss = 1.0988, Validation Loss = 1.1109, Validation Accuracy = 0.6161
Epoch 33/80 took 0.0517 seconds
Epoch 34: Training Loss = 1.0707, Validation Loss = 1.0710, Validation Accuracy = 0.6116
Epoch 34/80 took 0.0524 seconds
Epoch 35: Training Loss = 0.9533, Validation Loss = 0.9490, Validation Accuracy = 0.7054
Epoch 35/80 took 0.0520 seconds


Epoch 36: Training Loss = 1.0494, Validation Loss = 1.1507, Validation Accuracy = 0.5625
Epoch 36/80 took 0.0547 seconds
Epoch 37: Training Loss = 1.0818, Validation Loss = 0.9885, Validation Accuracy = 0.6674
Epoch 37/80 took 0.0524 seconds
Epoch 38: Training Loss = 0.9804, Validation Loss = 1.0272, Validation Accuracy = 0.6250
Epoch 38/80 took 0.0518 seconds
Epoch 39: Training Loss = 0.9783, Validation Loss = 0.9010, Validation Accuracy = 0.6920
Epoch 39/80 took 0.0515 seconds


Epoch 40: Training Loss = 0.9178, Validation Loss = 0.8643, Validation Accuracy = 0.7098
Epoch 40/80 took 0.0530 seconds
Epoch 41: Training Loss = 0.8779, Validation Loss = 0.9718, Validation Accuracy = 0.6897
Epoch 41/80 took 0.0516 seconds
Epoch 42: Training Loss = 0.9576, Validation Loss = 0.7852, Validation Accuracy = 0.7232
Epoch 42/80 took 0.0519 seconds
Epoch 43: Training Loss = 0.7696, Validation Loss = 0.8293, Validation Accuracy = 0.7121
Epoch 43/80 took 0.0516 seconds


Epoch 44: Training Loss = 0.8142, Validation Loss = 0.7174, Validation Accuracy = 0.7768
Epoch 44/80 took 0.0522 seconds
Epoch 45: Training Loss = 0.6934, Validation Loss = 0.6895, Validation Accuracy = 0.7679
Epoch 45/80 took 0.0518 seconds
Epoch 46: Training Loss = 0.6646, Validation Loss = 0.7909, Validation Accuracy = 0.7076
Epoch 46/80 took 0.0519 seconds
Epoch 47: Training Loss = 0.7035, Validation Loss = 0.7284, Validation Accuracy = 0.7545
Epoch 47/80 took 0.0521 seconds


Epoch 48: Training Loss = 0.7651, Validation Loss = 0.7334, Validation Accuracy = 0.7210
Epoch 48/80 took 0.0527 seconds
Epoch 49: Training Loss = 0.8158, Validation Loss = 0.7277, Validation Accuracy = 0.7567
Epoch 49/80 took 0.0521 seconds
Epoch 50: Training Loss = 0.7384, Validation Loss = 0.8172, Validation Accuracy = 0.7076
Epoch 50/80 took 0.0520 seconds
Epoch 51: Training Loss = 0.7698, Validation Loss = 0.8071, Validation Accuracy = 0.7321
Epoch 51/80 took 0.0516 seconds


Epoch 52: Training Loss = 0.7460, Validation Loss = 0.7683, Validation Accuracy = 0.7500
Epoch 52/80 took 0.0529 seconds
Epoch 53: Training Loss = 0.6499, Validation Loss = 0.6927, Validation Accuracy = 0.7500
Epoch 53/80 took 0.0522 seconds
Epoch 54: Training Loss = 0.6052, Validation Loss = 0.6268, Validation Accuracy = 0.7790
Epoch 54/80 took 0.0518 seconds
Epoch 55: Training Loss = 0.6178, Validation Loss = 0.5428, Validation Accuracy = 0.8259
Epoch 55/80 took 0.0521 seconds


Epoch 56: Training Loss = 0.5608, Validation Loss = 0.5425, Validation Accuracy = 0.8326
Epoch 56/80 took 0.0543 seconds
Epoch 57: Training Loss = 0.5394, Validation Loss = 0.4458, Validation Accuracy = 0.8571
Epoch 57/80 took 0.0519 seconds
Epoch 58: Training Loss = 0.5119, Validation Loss = 0.4871, Validation Accuracy = 0.8371
Epoch 58/80 took 0.0669 seconds
Epoch 59: Training Loss = 0.4394, Validation Loss = 0.4280, Validation Accuracy = 0.8705
Epoch 59/80 took 0.0521 seconds


Epoch 60: Training Loss = 0.3429, Validation Loss = 0.4125, Validation Accuracy = 0.8728
Epoch 60/80 took 0.0534 seconds
Epoch 61: Training Loss = 0.3609, Validation Loss = 0.2834, Validation Accuracy = 0.9353
Epoch 61/80 took 0.0517 seconds
Epoch 62: Training Loss = 0.3262, Validation Loss = 0.3017, Validation Accuracy = 0.9085
Epoch 62/80 took 0.0519 seconds
Epoch 63: Training Loss = 0.2894, Validation Loss = 0.2906, Validation Accuracy = 0.9040
Epoch 63/80 took 0.0520 seconds


Epoch 64: Training Loss = 0.2480, Validation Loss = 0.2902, Validation Accuracy = 0.9107
Epoch 64/80 took 0.0528 seconds
Epoch 65: Training Loss = 0.2477, Validation Loss = 0.2212, Validation Accuracy = 0.9420
Epoch 65/80 took 0.0517 seconds
Epoch 66: Training Loss = 0.1816, Validation Loss = 0.2300, Validation Accuracy = 0.9420
Epoch 66/80 took 0.0516 seconds
Epoch 67: Training Loss = 0.2251, Validation Loss = 0.1557, Validation Accuracy = 0.9754
Epoch 67/80 took 0.0518 seconds


Epoch 68: Training Loss = 0.1607, Validation Loss = 0.1625, Validation Accuracy = 0.9621
Epoch 68/80 took 0.0529 seconds
Epoch 69: Training Loss = 0.1851, Validation Loss = 0.1756, Validation Accuracy = 0.9598
Epoch 69/80 took 0.0518 seconds
Epoch 70: Training Loss = 0.1694, Validation Loss = 0.2109, Validation Accuracy = 0.9308
Epoch 70/80 took 0.0522 seconds
Epoch 71: Training Loss = 0.1941, Validation Loss = 0.1960, Validation Accuracy = 0.9464
Epoch 71/80 took 0.0520 seconds


Epoch 72: Training Loss = 0.1501, Validation Loss = 0.1372, Validation Accuracy = 0.9732
Epoch 72/80 took 0.0525 seconds
Epoch 73: Training Loss = 0.1382, Validation Loss = 0.1261, Validation Accuracy = 0.9754
Epoch 73/80 took 0.0515 seconds
Epoch 74: Training Loss = 0.1231, Validation Loss = 0.1195, Validation Accuracy = 0.9732
Epoch 74/80 took 0.0521 seconds
Epoch 75: Training Loss = 0.1503, Validation Loss = 0.1631, Validation Accuracy = 0.9487
Epoch 75/80 took 0.0520 seconds


Epoch 76: Training Loss = 0.1663, Validation Loss = 0.1322, Validation Accuracy = 0.9754
Epoch 76/80 took 0.0541 seconds
Epoch 77: Training Loss = 0.1120, Validation Loss = 0.1144, Validation Accuracy = 0.9777
Epoch 77/80 took 0.0519 seconds
Epoch 78: Training Loss = 0.1122, Validation Loss = 0.0961, Validation Accuracy = 0.9888
Epoch 78/80 took 0.0517 seconds
Epoch 79: Training Loss = 0.0958, Validation Loss = 0.0852, Validation Accuracy = 0.9844
Epoch 79/80 took 0.0515 seconds


Epoch 80: Training Loss = 0.0722, Validation Loss = 0.0679, Validation Accuracy = 0.9933
Epoch 80/80 took 0.0529 seconds
Finished training after 80 epochs!


([2.5854297,
  2.2497468,
  2.1547399,
  2.1165285,
  2.0975192,
  2.0260031,
  2.0233045,
  2.0125296,
  1.93002,
  1.9201531,
  1.8509347,
  1.889932,
  1.8792644,
  1.7786869,
  1.7516723,
  1.7173429,
  1.697966,
  1.5990942,
  1.5918684,
  1.6312492,
  1.6032494,
  1.545558,
  1.5533642,
  1.4297822,
  1.4482217,
  1.301119,
  1.2720656,
  1.3518972,
  1.1772319,
  1.1537479,
  1.1814579,
  1.1633275,
  1.098788,
  1.0707209,
  0.95325476,
  1.0494305,
  1.0817847,
  0.98039925,
  0.9782729,
  0.9178203,
  0.87792253,
  0.95758,
  0.76955914,
  0.8141868,
  0.69339216,
  0.6646242,
  0.7035171,
  0.7651026,
  0.8157638,
  0.7384411,
  0.7697588,
  0.74597955,
  0.6499058,
  0.6051854,
  0.61780906,
  0.56082433,
  0.53941995,
  0.5118757,
  0.43944666,
  0.3429498,
  0.3608862,
  0.32623875,
  0.28941652,
  0.24795817,
  0.24768978,
  0.1816405,
  0.22513843,
  0.16066617,
  0.185137,
  0.16936228,
  0.19414638,
  0.15012631,
  0.1381653,
  0.12310675,
  0.15025103,
  0.16629529,


### 7c. Train ResNet-8 on CIFAR-10

Repeat our usual training and evaluation protocol:
1. Train ResNet-8 on CIFAR-10. Use regularization strength of `1.5`, a patience of `15`, learning rate patience of `4`, and keep the rest of the hyperparameters to their defaults.
2. Print the test accuracy.

If everything is working as expected, you should get a test accuracy in the 80s.

In [12]:
tf.keras.backend.clear_session()
tf.random.set_seed(0)

# YOUR CODE HERE
model = ResNet8(10, (32,32,3), reg = 1.5)
model.compile(optimizer='adamw')
model.fit(x_train, y_train, x_val, y_val, val_every = 1, verbose = True, patience=15, lr_patience=4)
print(f"Test acc: {model.evaluate(x_test, y_test)[0]}")

---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 10]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 128]
ResidualBlock_3:
	Conv2D layer output(ResidualBlock_3/main_3x3conv_2) shape: [1, 8, 8, 128]
	Conv2D layer output(ResidualBlock_3/main_3x3conv_1) shape: [1, 8, 8, 128]
	-->Conv2D1x1 layer output(ResidualBlock_3/skip_conv1x1) shape: [1, 8, 8, 128]-->
ResidualBlock_2:
	Conv2D layer output(ResidualBlock_2/main_3x3conv_2) shape: [1, 16, 16, 64]
	Conv2D layer output(ResidualBlock_2/main_3x3conv_1) shape: [1, 16, 16, 64]
	-->Conv2D1x1 layer output(ResidualBlock_2/skip_conv1x1) shape: [1, 16, 16, 64]-->
ResidualBlock_1:
	Conv2D layer output(ResidualBlock_1/main_3x3conv_2) shape: [1, 32, 32, 32]
	Conv2D layer output(ResidualBlock_1/main_3x3conv_1) shape: [1, 32, 32, 32]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 32]


Epoch 1: Training Loss = 1.7492, Validation Loss = 1.5184, Validation Accuracy = 0.4453
Epoch 1/10000 took 6.6293 seconds


Epoch 2: Training Loss = 1.4277, Validation Loss = 1.3107, Validation Accuracy = 0.5226
Epoch 2/10000 took 3.1285 seconds


Epoch 3: Training Loss = 1.3299, Validation Loss = 1.2617, Validation Accuracy = 0.5461
Epoch 3/10000 took 3.1370 seconds


Epoch 4: Training Loss = 1.2878, Validation Loss = 1.1990, Validation Accuracy = 0.5723
Epoch 4/10000 took 3.1521 seconds


Epoch 5: Training Loss = 1.2960, Validation Loss = 1.2461, Validation Accuracy = 0.5501
Epoch 5/10000 took 3.2221 seconds


Epoch 6: Training Loss = 1.2777, Validation Loss = 1.2594, Validation Accuracy = 0.5535
Epoch 6/10000 took 3.1624 seconds


Epoch 7: Training Loss = 1.2627, Validation Loss = 1.1886, Validation Accuracy = 0.5831
Epoch 7/10000 took 3.1776 seconds


Epoch 8: Training Loss = 1.2336, Validation Loss = 1.1834, Validation Accuracy = 0.5815
Epoch 8/10000 took 3.1750 seconds


Epoch 9: Training Loss = 1.2484, Validation Loss = 1.1796, Validation Accuracy = 0.5857
Epoch 9/10000 took 3.1867 seconds


Epoch 10: Training Loss = 1.2276, Validation Loss = 1.2603, Validation Accuracy = 0.5505
Epoch 10/10000 took 3.2105 seconds


Epoch 11: Training Loss = 1.2297, Validation Loss = 1.1390, Validation Accuracy = 0.5859
Epoch 11/10000 took 3.2149 seconds


Epoch 12: Training Loss = 1.2252, Validation Loss = 1.2287, Validation Accuracy = 0.5601
Epoch 12/10000 took 3.2049 seconds


Epoch 13: Training Loss = 1.2241, Validation Loss = 1.2764, Validation Accuracy = 0.5611
Epoch 13/10000 took 3.2073 seconds


Current lr= 0.001 Updated lr= 0.0005
Epoch 14: Training Loss = 1.2332, Validation Loss = 1.2713, Validation Accuracy = 0.5513
Epoch 14/10000 took 3.2331 seconds


Epoch 15: Training Loss = 1.1729, Validation Loss = 1.1867, Validation Accuracy = 0.5677
Epoch 15/10000 took 3.2256 seconds


Epoch 16: Training Loss = 1.1565, Validation Loss = 1.1570, Validation Accuracy = 0.5988
Epoch 16/10000 took 3.2314 seconds


Epoch 17: Training Loss = 1.1674, Validation Loss = 1.2069, Validation Accuracy = 0.5689
Epoch 17/10000 took 3.2467 seconds


Epoch 18: Training Loss = 1.1725, Validation Loss = 1.1347, Validation Accuracy = 0.5960
Epoch 18/10000 took 3.2366 seconds


Epoch 19: Training Loss = 1.1752, Validation Loss = 1.1831, Validation Accuracy = 0.5697
Epoch 19/10000 took 3.2346 seconds


Epoch 20: Training Loss = 1.1776, Validation Loss = 1.1544, Validation Accuracy = 0.5917
Epoch 20/10000 took 3.2369 seconds


Current lr= 0.0005 Updated lr= 0.00025
Epoch 21: Training Loss = 1.1681, Validation Loss = 1.2113, Validation Accuracy = 0.5659
Epoch 21/10000 took 3.2365 seconds


Epoch 22: Training Loss = 1.1304, Validation Loss = 1.1113, Validation Accuracy = 0.6114
Epoch 22/10000 took 3.2264 seconds


Epoch 23: Training Loss = 1.1412, Validation Loss = 1.1097, Validation Accuracy = 0.6070
Epoch 23/10000 took 3.2285 seconds


Epoch 24: Training Loss = 1.1295, Validation Loss = 1.1107, Validation Accuracy = 0.6142
Epoch 24/10000 took 3.2474 seconds


Epoch 25: Training Loss = 1.1200, Validation Loss = 1.1513, Validation Accuracy = 0.5791
Epoch 25/10000 took 3.2386 seconds


Epoch 26: Training Loss = 1.1178, Validation Loss = 1.0878, Validation Accuracy = 0.6238
Epoch 26/10000 took 3.2272 seconds


Epoch 27: Training Loss = 1.1254, Validation Loss = 1.0905, Validation Accuracy = 0.6204
Epoch 27/10000 took 3.2170 seconds


Epoch 28: Training Loss = 1.1198, Validation Loss = 1.1413, Validation Accuracy = 0.6012
Epoch 28/10000 took 3.2156 seconds


Current lr= 0.00025 Updated lr= 0.000125
Epoch 29: Training Loss = 1.1397, Validation Loss = 1.1166, Validation Accuracy = 0.6088
Epoch 29/10000 took 3.2225 seconds


Epoch 30: Training Loss = 1.0867, Validation Loss = 1.0618, Validation Accuracy = 0.6298
Epoch 30/10000 took 3.2261 seconds


Epoch 31: Training Loss = 1.0909, Validation Loss = 1.0827, Validation Accuracy = 0.6162
Epoch 31/10000 took 3.2175 seconds


Epoch 32: Training Loss = 1.0929, Validation Loss = 1.0738, Validation Accuracy = 0.6234
Epoch 32/10000 took 3.2021 seconds


Epoch 33: Training Loss = 1.1016, Validation Loss = 1.0840, Validation Accuracy = 0.6186
Epoch 33/10000 took 3.2372 seconds


Epoch 34: Training Loss = 1.0965, Validation Loss = 1.0726, Validation Accuracy = 0.6336
Epoch 34/10000 took 3.2140 seconds


Epoch 35: Training Loss = 1.0947, Validation Loss = 1.0890, Validation Accuracy = 0.6192
Epoch 35/10000 took 3.2231 seconds


Epoch 36: Training Loss = 1.1007, Validation Loss = 1.0872, Validation Accuracy = 0.6204
Epoch 36/10000 took 3.2124 seconds


Epoch 37: Training Loss = 1.0958, Validation Loss = 1.0597, Validation Accuracy = 0.6312
Epoch 37/10000 took 3.2140 seconds


Epoch 38: Training Loss = 1.0964, Validation Loss = 1.0909, Validation Accuracy = 0.6230
Epoch 38/10000 took 3.2418 seconds


Epoch 39: Training Loss = 1.0947, Validation Loss = 1.0756, Validation Accuracy = 0.6262
Epoch 39/10000 took 3.2152 seconds


Current lr= 0.000125 Updated lr= 6.25e-05
Epoch 40: Training Loss = 1.0961, Validation Loss = 1.0785, Validation Accuracy = 0.6264
Epoch 40/10000 took 3.2212 seconds


Epoch 41: Training Loss = 1.0791, Validation Loss = 1.0546, Validation Accuracy = 0.6348
Epoch 41/10000 took 3.2188 seconds


Epoch 42: Training Loss = 1.0777, Validation Loss = 1.0730, Validation Accuracy = 0.6228
Epoch 42/10000 took 3.2185 seconds


Epoch 43: Training Loss = 1.0730, Validation Loss = 1.0727, Validation Accuracy = 0.6264
Epoch 43/10000 took 3.2187 seconds


Epoch 44: Training Loss = 1.0814, Validation Loss = 1.0580, Validation Accuracy = 0.6356
Epoch 44/10000 took 3.2230 seconds


Epoch 45: Training Loss = 1.0641, Validation Loss = 1.0590, Validation Accuracy = 0.6342
Epoch 45/10000 took 3.2132 seconds


Epoch 46: Training Loss = 1.0727, Validation Loss = 1.0573, Validation Accuracy = 0.6284
Epoch 46/10000 took 3.2159 seconds


Epoch 47: Training Loss = 1.0786, Validation Loss = 1.0612, Validation Accuracy = 0.6304
Epoch 47/10000 took 3.2223 seconds


Epoch 48: Training Loss = 1.0791, Validation Loss = 1.0647, Validation Accuracy = 0.6264
Epoch 48/10000 took 3.2221 seconds


Current lr= 6.25e-05 Updated lr= 3.125e-05
Epoch 49: Training Loss = 1.0721, Validation Loss = 1.0833, Validation Accuracy = 0.6182
Epoch 49/10000 took 3.2165 seconds


Epoch 50: Training Loss = 1.0576, Validation Loss = 1.0592, Validation Accuracy = 0.6294
Epoch 50/10000 took 3.2159 seconds


Epoch 51: Training Loss = 1.0612, Validation Loss = 1.0441, Validation Accuracy = 0.6400
Epoch 51/10000 took 3.2148 seconds


Epoch 52: Training Loss = 1.0626, Validation Loss = 1.0661, Validation Accuracy = 0.6248
Epoch 52/10000 took 3.2474 seconds


Epoch 53: Training Loss = 1.0562, Validation Loss = 1.0397, Validation Accuracy = 0.6404
Epoch 53/10000 took 3.2129 seconds


Epoch 54: Training Loss = 1.0576, Validation Loss = 1.0464, Validation Accuracy = 0.6348
Epoch 54/10000 took 3.2155 seconds


Epoch 55: Training Loss = 1.0483, Validation Loss = 1.0608, Validation Accuracy = 0.6318
Epoch 55/10000 took 3.2127 seconds


Current lr= 3.125e-05 Updated lr= 1.5625e-05
Epoch 56: Training Loss = 1.0493, Validation Loss = 1.0526, Validation Accuracy = 0.6352
Epoch 56/10000 took 3.2199 seconds


Epoch 57: Training Loss = 1.0492, Validation Loss = 1.0408, Validation Accuracy = 0.6390
Epoch 57/10000 took 3.2108 seconds


Epoch 58: Training Loss = 1.0485, Validation Loss = 1.0425, Validation Accuracy = 0.6390
Epoch 58/10000 took 3.2155 seconds


Epoch 59: Training Loss = 1.0485, Validation Loss = 1.0396, Validation Accuracy = 0.6356
Epoch 59/10000 took 3.2323 seconds


Epoch 60: Training Loss = 1.0494, Validation Loss = 1.0426, Validation Accuracy = 0.6374
Epoch 60/10000 took 3.2128 seconds


Epoch 61: Training Loss = 1.0501, Validation Loss = 1.0429, Validation Accuracy = 0.6394
Epoch 61/10000 took 3.2189 seconds


Current lr= 1.5625e-05 Updated lr= 7.8125e-06
Epoch 62: Training Loss = 1.0506, Validation Loss = 1.0463, Validation Accuracy = 0.6388
Epoch 62/10000 took 3.2114 seconds


Epoch 63: Training Loss = 1.0471, Validation Loss = 1.0399, Validation Accuracy = 0.6380
Epoch 63/10000 took 3.2149 seconds


Epoch 64: Training Loss = 1.0486, Validation Loss = 1.0373, Validation Accuracy = 0.6414
Epoch 64/10000 took 3.2161 seconds


Epoch 65: Training Loss = 1.0401, Validation Loss = 1.0346, Validation Accuracy = 0.6384
Epoch 65/10000 took 3.2205 seconds


Epoch 66: Training Loss = 1.0482, Validation Loss = 1.0333, Validation Accuracy = 0.6416
Epoch 66/10000 took 3.2465 seconds


Epoch 67: Training Loss = 1.0421, Validation Loss = 1.0326, Validation Accuracy = 0.6420
Epoch 67/10000 took 3.2209 seconds


Epoch 68: Training Loss = 1.0409, Validation Loss = 1.0326, Validation Accuracy = 0.6412
Epoch 68/10000 took 3.2178 seconds


Epoch 69: Training Loss = 1.0433, Validation Loss = 1.0352, Validation Accuracy = 0.6412
Epoch 69/10000 took 3.2266 seconds


Current lr= 7.8125e-06 Updated lr= 3.90625e-06
Epoch 70: Training Loss = 1.0379, Validation Loss = 1.0412, Validation Accuracy = 0.6394
Epoch 70/10000 took 3.2248 seconds


Epoch 71: Training Loss = 1.0422, Validation Loss = 1.0324, Validation Accuracy = 0.6392
Epoch 71/10000 took 3.2115 seconds


Epoch 72: Training Loss = 1.0394, Validation Loss = 1.0326, Validation Accuracy = 0.6390
Epoch 72/10000 took 3.2131 seconds


Epoch 73: Training Loss = 1.0415, Validation Loss = 1.0322, Validation Accuracy = 0.6394
Epoch 73/10000 took 3.2155 seconds


Epoch 74: Training Loss = 1.0357, Validation Loss = 1.0314, Validation Accuracy = 0.6398
Epoch 74/10000 took 3.2258 seconds


Epoch 75: Training Loss = 1.0368, Validation Loss = 1.0322, Validation Accuracy = 0.6424
Epoch 75/10000 took 3.2165 seconds


Epoch 76: Training Loss = 1.0352, Validation Loss = 1.0329, Validation Accuracy = 0.6438
Epoch 76/10000 took 3.2131 seconds


Epoch 77: Training Loss = 1.0405, Validation Loss = 1.0303, Validation Accuracy = 0.6420
Epoch 77/10000 took 3.2196 seconds


Epoch 78: Training Loss = 1.0496, Validation Loss = 1.0347, Validation Accuracy = 0.6394
Epoch 78/10000 took 3.2165 seconds


Epoch 79: Training Loss = 1.0343, Validation Loss = 1.0323, Validation Accuracy = 0.6424
Epoch 79/10000 took 3.2120 seconds


Current lr= 3.90625e-06 Updated lr= 1.953125e-06
Epoch 80: Training Loss = 1.0360, Validation Loss = 1.0319, Validation Accuracy = 0.6440
Epoch 80/10000 took 3.2439 seconds


Epoch 81: Training Loss = 1.0366, Validation Loss = 1.0299, Validation Accuracy = 0.6434
Epoch 81/10000 took 3.2066 seconds


Epoch 82: Training Loss = 1.0324, Validation Loss = 1.0305, Validation Accuracy = 0.6426
Epoch 82/10000 took 3.2159 seconds


Epoch 83: Training Loss = 1.0383, Validation Loss = 1.0311, Validation Accuracy = 0.6396
Epoch 83/10000 took 3.2135 seconds


Epoch 84: Training Loss = 1.0393, Validation Loss = 1.0305, Validation Accuracy = 0.6396
Epoch 84/10000 took 3.2189 seconds


Epoch 85: Training Loss = 1.0333, Validation Loss = 1.0299, Validation Accuracy = 0.6430
Epoch 85/10000 took 3.2131 seconds


Epoch 86: Training Loss = 1.0444, Validation Loss = 1.0310, Validation Accuracy = 0.6412
Epoch 86/10000 took 3.2119 seconds


Epoch 87: Training Loss = 1.0403, Validation Loss = 1.0303, Validation Accuracy = 0.6424
Epoch 87/10000 took 3.2203 seconds


Epoch 88: Training Loss = 1.0427, Validation Loss = 1.0285, Validation Accuracy = 0.6414
Epoch 88/10000 took 3.2100 seconds


Epoch 89: Training Loss = 1.0438, Validation Loss = 1.0306, Validation Accuracy = 0.6446
Epoch 89/10000 took 3.2382 seconds


Epoch 90: Training Loss = 1.0367, Validation Loss = 1.0316, Validation Accuracy = 0.6406
Epoch 90/10000 took 3.2214 seconds


Current lr= 1.953125e-06 Updated lr= 9.765625e-07
Epoch 91: Training Loss = 1.0432, Validation Loss = 1.0296, Validation Accuracy = 0.6436
Epoch 91/10000 took 3.2137 seconds


Epoch 92: Training Loss = 1.0272, Validation Loss = 1.0294, Validation Accuracy = 0.6418
Epoch 92/10000 took 3.2105 seconds


Epoch 93: Training Loss = 1.0356, Validation Loss = 1.0288, Validation Accuracy = 0.6426
Epoch 93/10000 took 3.2243 seconds


Epoch 94: Training Loss = 1.0383, Validation Loss = 1.0304, Validation Accuracy = 0.6432
Epoch 94/10000 took 3.2329 seconds


Epoch 95: Training Loss = 1.0382, Validation Loss = 1.0293, Validation Accuracy = 0.6414
Epoch 95/10000 took 3.2246 seconds


Current lr= 9.765625e-07 Updated lr= 4.882813e-07
Epoch 96: Training Loss = 1.0352, Validation Loss = 1.0295, Validation Accuracy = 0.6412
Epoch 96/10000 took 3.2248 seconds


Epoch 97: Training Loss = 1.0314, Validation Loss = 1.0296, Validation Accuracy = 0.6422
Epoch 97/10000 took 3.2129 seconds


Epoch 98: Training Loss = 1.0409, Validation Loss = 1.0290, Validation Accuracy = 0.6408
Epoch 98/10000 took 3.2202 seconds


Epoch 99: Training Loss = 1.0364, Validation Loss = 1.0288, Validation Accuracy = 0.6410
Epoch 99/10000 took 3.2124 seconds


Epoch 100: Training Loss = 1.0368, Validation Loss = 1.0300, Validation Accuracy = 0.6420
Epoch 100/10000 took 3.2166 seconds


Epoch 101: Training Loss = 1.0380, Validation Loss = 1.0294, Validation Accuracy = 0.6420
Epoch 101/10000 took 3.2096 seconds


Current lr= 4.882813e-07 Updated lr= 2.4414064e-07
Epoch 102: Training Loss = 1.0346, Validation Loss = 1.0290, Validation Accuracy = 0.6410
Early stopping triggered at epoch 102
Finished training after 102 epochs!


Test acc: 0.6266025900840759


### 7d. Train ResNet-8 on CIFAR-100

Repeat what you did with CIFAR-10, but this time with CIFAR-100.

The test accuracy that you achieve should be better than chance, but should NOT be satisfying.

In [13]:
tf.keras.backend.clear_session()
tf.random.set_seed(0)

x100_train, y100_train, x100_val, y100_val, x100_test, y100_test, classnames = get_dataset('cifar100')

model = ResNet8(100, (32,32,3), reg = 1.5)
model.compile(optimizer='adamw')
model.fit(x100_train, y100_train, x100_val, y100_val, val_every = 1, verbose = True, patience=15, lr_patience=4)
print(f"Test acc: {model.evaluate(x_test, y_test)[0]}")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


     8192/169001437 [..............................] - ETA: 0s

   122880/169001437 [..............................] - ETA: 1:09

   630784/169001437 [..............................] - ETA: 26s 

  2285568/169001437 [..............................] - ETA: 11s

  5193728/169001437 [..............................] - ETA: 6s 

  8126464/169001437 [>.............................] - ETA: 4s

 11001856/169001437 [>.............................] - ETA: 4s

 13910016/169001437 [=>............................] - ETA: 3s

 16826368/169001437 [=>............................] - ETA: 3s

 19750912/169001437 [==>...........................] - ETA: 3s

 22626304/169001437 [===>..........................] - ETA: 3s

 25534464/169001437 [===>..........................] - ETA: 3s

 28442624/169001437 [====>.........................] - ETA: 2s

 31350784/169001437 [====>.........................] - ETA: 2s

 34283520/169001437 [=====>........................] - ETA: 2s

 37167104/169001437 [=====>........................] - ETA: 2s



























































---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 100]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 128]
ResidualBlock_3:
	Conv2D layer output(ResidualBlock_3/main_3x3conv_2) shape: [1, 8, 8, 128]
	Conv2D layer output(ResidualBlock_3/main_3x3conv_1) shape: [1, 8, 8, 128]
	-->Conv2D1x1 layer output(ResidualBlock_3/skip_conv1x1) shape: [1, 8, 8, 128]-->
ResidualBlock_2:
	Conv2D layer output(ResidualBlock_2/main_3x3conv_2) shape: [1, 16, 16, 64]
	Conv2D layer output(ResidualBlock_2/main_3x3conv_1) shape: [1, 16, 16, 64]
	-->Conv2D1x1 layer output(ResidualBlock_2/skip_conv1x1) shape: [1, 16, 16, 64]-->
ResidualBlock_1:
	Conv2D layer output(ResidualBlock_1/main_3x3conv_2) shape: [1, 32, 32, 32]
	Conv2D layer output(ResidualBlock_1/main_3x3conv_1) shape: [1, 32, 32, 32]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 32]


Epoch 1: Training Loss = 4.0777, Validation Loss = 3.8198, Validation Accuracy = 0.1138
Epoch 1/10000 took 6.8931 seconds


Epoch 2: Training Loss = 3.5954, Validation Loss = 3.5259, Validation Accuracy = 0.1611
Epoch 2/10000 took 3.2170 seconds


Epoch 3: Training Loss = 3.3591, Validation Loss = 3.4013, Validation Accuracy = 0.1843
Epoch 3/10000 took 3.2229 seconds


Epoch 4: Training Loss = 3.2372, Validation Loss = 3.2863, Validation Accuracy = 0.1957
Epoch 4/10000 took 3.2675 seconds


Epoch 5: Training Loss = 3.1385, Validation Loss = 3.2131, Validation Accuracy = 0.2169
Epoch 5/10000 took 3.2474 seconds


Epoch 6: Training Loss = 3.0977, Validation Loss = 3.1697, Validation Accuracy = 0.2226
Epoch 6/10000 took 3.2627 seconds


Epoch 7: Training Loss = 3.0435, Validation Loss = 3.1140, Validation Accuracy = 0.2382
Epoch 7/10000 took 3.2644 seconds


Epoch 8: Training Loss = 3.0060, Validation Loss = 3.0404, Validation Accuracy = 0.2522
Epoch 8/10000 took 3.2504 seconds


Epoch 9: Training Loss = 2.9963, Validation Loss = 3.1313, Validation Accuracy = 0.2414
Epoch 9/10000 took 3.2402 seconds


Epoch 10: Training Loss = 2.9722, Validation Loss = 3.0262, Validation Accuracy = 0.2588
Epoch 10/10000 took 3.2358 seconds


Epoch 11: Training Loss = 2.9763, Validation Loss = 3.0136, Validation Accuracy = 0.2596
Epoch 11/10000 took 3.2295 seconds


Epoch 12: Training Loss = 2.9703, Validation Loss = 3.0335, Validation Accuracy = 0.2572
Epoch 12/10000 took 3.2299 seconds


Epoch 13: Training Loss = 2.9459, Validation Loss = 3.0852, Validation Accuracy = 0.2518
Epoch 13/10000 took 3.2312 seconds


Current lr= 0.001 Updated lr= 0.0005
Epoch 14: Training Loss = 2.9322, Validation Loss = 3.0826, Validation Accuracy = 0.2466
Epoch 14/10000 took 3.2271 seconds


Epoch 15: Training Loss = 2.7682, Validation Loss = 2.8718, Validation Accuracy = 0.2895
Epoch 15/10000 took 3.2186 seconds


Epoch 16: Training Loss = 2.7297, Validation Loss = 2.9567, Validation Accuracy = 0.2698
Epoch 16/10000 took 3.2175 seconds


Epoch 17: Training Loss = 2.7366, Validation Loss = 2.8812, Validation Accuracy = 0.2790
Epoch 17/10000 took 3.2144 seconds


Epoch 18: Training Loss = 2.7248, Validation Loss = 2.8225, Validation Accuracy = 0.2963
Epoch 18/10000 took 3.2146 seconds


Epoch 19: Training Loss = 2.7305, Validation Loss = 2.8197, Validation Accuracy = 0.2987
Epoch 19/10000 took 3.2227 seconds


Epoch 20: Training Loss = 2.7191, Validation Loss = 2.8531, Validation Accuracy = 0.2975
Epoch 20/10000 took 3.2299 seconds


Epoch 21: Training Loss = 2.7084, Validation Loss = 2.8706, Validation Accuracy = 0.2845
Epoch 21/10000 took 3.2214 seconds


Epoch 22: Training Loss = 2.7331, Validation Loss = 2.8014, Validation Accuracy = 0.3075
Epoch 22/10000 took 3.1989 seconds


Epoch 23: Training Loss = 2.7222, Validation Loss = 2.8485, Validation Accuracy = 0.2983
Epoch 23/10000 took 3.2011 seconds


Epoch 24: Training Loss = 2.7385, Validation Loss = 2.8615, Validation Accuracy = 0.2959
Epoch 24/10000 took 3.1969 seconds


Current lr= 0.0005 Updated lr= 0.00025
Epoch 25: Training Loss = 2.7344, Validation Loss = 2.8503, Validation Accuracy = 0.2957
Epoch 25/10000 took 3.1992 seconds


Epoch 26: Training Loss = 2.5950, Validation Loss = 2.7369, Validation Accuracy = 0.3211
Epoch 26/10000 took 3.2123 seconds


Epoch 27: Training Loss = 2.5790, Validation Loss = 2.7756, Validation Accuracy = 0.3171
Epoch 27/10000 took 3.2069 seconds


Epoch 28: Training Loss = 2.5636, Validation Loss = 2.7076, Validation Accuracy = 0.3131
Epoch 28/10000 took 3.2107 seconds


Epoch 29: Training Loss = 2.5831, Validation Loss = 2.7093, Validation Accuracy = 0.3235
Epoch 29/10000 took 3.2222 seconds


Epoch 30: Training Loss = 2.5894, Validation Loss = 2.7103, Validation Accuracy = 0.3207
Epoch 30/10000 took 3.2149 seconds


Current lr= 0.00025 Updated lr= 0.000125
Epoch 31: Training Loss = 2.5756, Validation Loss = 2.7540, Validation Accuracy = 0.3169
Epoch 31/10000 took 3.2202 seconds


Epoch 32: Training Loss = 2.4789, Validation Loss = 2.6383, Validation Accuracy = 0.3403
Epoch 32/10000 took 3.2181 seconds


Epoch 33: Training Loss = 2.4627, Validation Loss = 2.6625, Validation Accuracy = 0.3323
Epoch 33/10000 took 3.2274 seconds


Epoch 34: Training Loss = 2.4710, Validation Loss = 2.6230, Validation Accuracy = 0.3397
Epoch 34/10000 took 3.2438 seconds


Epoch 35: Training Loss = 2.4769, Validation Loss = 2.6326, Validation Accuracy = 0.3403
Epoch 35/10000 took 3.2325 seconds


Epoch 36: Training Loss = 2.4597, Validation Loss = 2.6549, Validation Accuracy = 0.3393
Epoch 36/10000 took 3.2247 seconds


Current lr= 0.000125 Updated lr= 6.25e-05
Epoch 37: Training Loss = 2.4661, Validation Loss = 2.6266, Validation Accuracy = 0.3472
Epoch 37/10000 took 3.2311 seconds


Epoch 38: Training Loss = 2.3962, Validation Loss = 2.5987, Validation Accuracy = 0.3466
Epoch 38/10000 took 3.2326 seconds


Epoch 39: Training Loss = 2.4017, Validation Loss = 2.5965, Validation Accuracy = 0.3492
Epoch 39/10000 took 3.2258 seconds


Epoch 40: Training Loss = 2.3947, Validation Loss = 2.6127, Validation Accuracy = 0.3417
Epoch 40/10000 took 3.2329 seconds


Epoch 41: Training Loss = 2.3854, Validation Loss = 2.5735, Validation Accuracy = 0.3508
Epoch 41/10000 took 3.2383 seconds


Epoch 42: Training Loss = 2.3933, Validation Loss = 2.5874, Validation Accuracy = 0.3498
Epoch 42/10000 took 3.2300 seconds


Epoch 43: Training Loss = 2.3834, Validation Loss = 2.5593, Validation Accuracy = 0.3548
Epoch 43/10000 took 3.2275 seconds


Epoch 44: Training Loss = 2.3804, Validation Loss = 2.5902, Validation Accuracy = 0.3506
Epoch 44/10000 took 3.2286 seconds


Epoch 45: Training Loss = 2.3803, Validation Loss = 2.5757, Validation Accuracy = 0.3534
Epoch 45/10000 took 3.2291 seconds


Current lr= 6.25e-05 Updated lr= 3.125e-05
Epoch 46: Training Loss = 2.3848, Validation Loss = 2.5765, Validation Accuracy = 0.3488
Epoch 46/10000 took 3.2347 seconds


Epoch 47: Training Loss = 2.3441, Validation Loss = 2.5482, Validation Accuracy = 0.3560
Epoch 47/10000 took 3.2278 seconds


Epoch 48: Training Loss = 2.3479, Validation Loss = 2.5438, Validation Accuracy = 0.3594
Epoch 48/10000 took 3.2419 seconds


Epoch 49: Training Loss = 2.3352, Validation Loss = 2.5322, Validation Accuracy = 0.3584
Epoch 49/10000 took 3.2178 seconds


Epoch 50: Training Loss = 2.3388, Validation Loss = 2.5394, Validation Accuracy = 0.3592
Epoch 50/10000 took 3.2308 seconds


Epoch 51: Training Loss = 2.3329, Validation Loss = 2.5356, Validation Accuracy = 0.3630
Epoch 51/10000 took 3.2138 seconds


Current lr= 3.125e-05 Updated lr= 1.5625e-05
Epoch 52: Training Loss = 2.3291, Validation Loss = 2.5452, Validation Accuracy = 0.3558
Epoch 52/10000 took 3.2171 seconds


Epoch 53: Training Loss = 2.3197, Validation Loss = 2.5225, Validation Accuracy = 0.3640
Epoch 53/10000 took 3.2152 seconds


Epoch 54: Training Loss = 2.3160, Validation Loss = 2.5255, Validation Accuracy = 0.3628
Epoch 54/10000 took 3.2131 seconds


Epoch 55: Training Loss = 2.3116, Validation Loss = 2.5188, Validation Accuracy = 0.3644
Epoch 55/10000 took 3.2239 seconds


Epoch 56: Training Loss = 2.3171, Validation Loss = 2.5214, Validation Accuracy = 0.3612
Epoch 56/10000 took 3.2175 seconds


Epoch 57: Training Loss = 2.3156, Validation Loss = 2.5218, Validation Accuracy = 0.3634
Epoch 57/10000 took 3.2181 seconds


Epoch 58: Training Loss = 2.3241, Validation Loss = 2.5161, Validation Accuracy = 0.3634
Epoch 58/10000 took 3.2187 seconds


Epoch 59: Training Loss = 2.3177, Validation Loss = 2.5197, Validation Accuracy = 0.3628
Epoch 59/10000 took 3.2443 seconds


Epoch 60: Training Loss = 2.3081, Validation Loss = 2.5294, Validation Accuracy = 0.3596
Epoch 60/10000 took 3.2264 seconds


Current lr= 1.5625e-05 Updated lr= 7.8125e-06
Epoch 61: Training Loss = 2.3038, Validation Loss = 2.5246, Validation Accuracy = 0.3624
Epoch 61/10000 took 3.2252 seconds


Epoch 62: Training Loss = 2.2906, Validation Loss = 2.5194, Validation Accuracy = 0.3666
Epoch 62/10000 took 3.2430 seconds


Epoch 63: Training Loss = 2.3128, Validation Loss = 2.5058, Validation Accuracy = 0.3654
Epoch 63/10000 took 3.2524 seconds


Epoch 64: Training Loss = 2.2976, Validation Loss = 2.5126, Validation Accuracy = 0.3684
Epoch 64/10000 took 3.2276 seconds


Epoch 65: Training Loss = 2.3004, Validation Loss = 2.5141, Validation Accuracy = 0.3678
Epoch 65/10000 took 3.2286 seconds


Current lr= 7.8125e-06 Updated lr= 3.90625e-06
Epoch 66: Training Loss = 2.2916, Validation Loss = 2.5090, Validation Accuracy = 0.3680
Epoch 66/10000 took 3.2271 seconds


Epoch 67: Training Loss = 2.2767, Validation Loss = 2.5035, Validation Accuracy = 0.3660
Epoch 67/10000 took 3.2229 seconds


Epoch 68: Training Loss = 2.3021, Validation Loss = 2.5013, Validation Accuracy = 0.3692
Epoch 68/10000 took 3.2232 seconds


Epoch 69: Training Loss = 2.2806, Validation Loss = 2.5027, Validation Accuracy = 0.3684
Epoch 69/10000 took 3.2263 seconds


Epoch 70: Training Loss = 2.2866, Validation Loss = 2.5005, Validation Accuracy = 0.3688
Epoch 70/10000 took 3.2238 seconds


Epoch 71: Training Loss = 2.2851, Validation Loss = 2.4989, Validation Accuracy = 0.3658
Epoch 71/10000 took 3.2257 seconds


Epoch 72: Training Loss = 2.2905, Validation Loss = 2.4986, Validation Accuracy = 0.3656
Epoch 72/10000 took 3.2223 seconds


Epoch 73: Training Loss = 2.2860, Validation Loss = 2.5001, Validation Accuracy = 0.3672
Epoch 73/10000 took 3.2261 seconds


Epoch 74: Training Loss = 2.2881, Validation Loss = 2.5017, Validation Accuracy = 0.3650
Epoch 74/10000 took 3.2302 seconds


Current lr= 3.90625e-06 Updated lr= 1.953125e-06
Epoch 75: Training Loss = 2.2826, Validation Loss = 2.4989, Validation Accuracy = 0.3678
Epoch 75/10000 took 3.2555 seconds


Epoch 76: Training Loss = 2.2818, Validation Loss = 2.4969, Validation Accuracy = 0.3666
Epoch 76/10000 took 3.2565 seconds


Epoch 77: Training Loss = 2.2816, Validation Loss = 2.4990, Validation Accuracy = 0.3648
Epoch 77/10000 took 3.2387 seconds


Epoch 78: Training Loss = 2.2906, Validation Loss = 2.4986, Validation Accuracy = 0.3658
Epoch 78/10000 took 3.2422 seconds


Epoch 79: Training Loss = 2.2782, Validation Loss = 2.4985, Validation Accuracy = 0.3642
Epoch 79/10000 took 3.2288 seconds


Epoch 80: Training Loss = 2.2831, Validation Loss = 2.4978, Validation Accuracy = 0.3658
Epoch 80/10000 took 3.2262 seconds


Epoch 81: Training Loss = 2.2689, Validation Loss = 2.4980, Validation Accuracy = 0.3666
Epoch 81/10000 took 3.2257 seconds


Epoch 82: Training Loss = 2.2847, Validation Loss = 2.4965, Validation Accuracy = 0.3652
Epoch 82/10000 took 3.2258 seconds


Epoch 83: Training Loss = 2.2774, Validation Loss = 2.4953, Validation Accuracy = 0.3670
Epoch 83/10000 took 3.2357 seconds


Epoch 84: Training Loss = 2.2774, Validation Loss = 2.4987, Validation Accuracy = 0.3686
Epoch 84/10000 took 3.2308 seconds


Epoch 85: Training Loss = 2.2637, Validation Loss = 2.4969, Validation Accuracy = 0.3694
Epoch 85/10000 took 3.2237 seconds


Current lr= 1.953125e-06 Updated lr= 9.765625e-07
Epoch 86: Training Loss = 2.2731, Validation Loss = 2.4985, Validation Accuracy = 0.3658
Epoch 86/10000 took 3.2259 seconds


Epoch 87: Training Loss = 2.2750, Validation Loss = 2.4960, Validation Accuracy = 0.3664
Epoch 87/10000 took 3.2414 seconds


Epoch 88: Training Loss = 2.2737, Validation Loss = 2.4964, Validation Accuracy = 0.3670
Epoch 88/10000 took 3.2214 seconds


Epoch 89: Training Loss = 2.2746, Validation Loss = 2.4960, Validation Accuracy = 0.3668
Epoch 89/10000 took 3.2400 seconds


Epoch 90: Training Loss = 2.2881, Validation Loss = 2.4953, Validation Accuracy = 0.3664
Epoch 90/10000 took 3.2520 seconds


Epoch 91: Training Loss = 2.2849, Validation Loss = 2.4960, Validation Accuracy = 0.3650
Epoch 91/10000 took 3.2300 seconds


Epoch 92: Training Loss = 2.2603, Validation Loss = 2.4952, Validation Accuracy = 0.3664
Epoch 92/10000 took 3.2202 seconds


Epoch 93: Training Loss = 2.2688, Validation Loss = 2.4955, Validation Accuracy = 0.3654
Epoch 93/10000 took 3.2300 seconds


Epoch 94: Training Loss = 2.2768, Validation Loss = 2.4954, Validation Accuracy = 0.3662
Epoch 94/10000 took 3.2229 seconds


Epoch 95: Training Loss = 2.2810, Validation Loss = 2.4951, Validation Accuracy = 0.3646
Epoch 95/10000 took 3.2289 seconds


Epoch 96: Training Loss = 2.2734, Validation Loss = 2.4943, Validation Accuracy = 0.3660
Epoch 96/10000 took 3.2239 seconds


Epoch 97: Training Loss = 2.2770, Validation Loss = 2.4960, Validation Accuracy = 0.3678
Epoch 97/10000 took 3.2406 seconds


Epoch 98: Training Loss = 2.2863, Validation Loss = 2.4945, Validation Accuracy = 0.3648
Epoch 98/10000 took 3.2249 seconds


Current lr= 9.765625e-07 Updated lr= 4.882813e-07
Epoch 99: Training Loss = 2.2831, Validation Loss = 2.4950, Validation Accuracy = 0.3656
Epoch 99/10000 took 3.2343 seconds


Epoch 100: Training Loss = 2.2720, Validation Loss = 2.4953, Validation Accuracy = 0.3664
Epoch 100/10000 took 3.2193 seconds


Epoch 101: Training Loss = 2.2789, Validation Loss = 2.4942, Validation Accuracy = 0.3666
Epoch 101/10000 took 3.2203 seconds


Epoch 102: Training Loss = 2.2769, Validation Loss = 2.4946, Validation Accuracy = 0.3658
Epoch 102/10000 took 3.2288 seconds


Epoch 103: Training Loss = 2.2782, Validation Loss = 2.4953, Validation Accuracy = 0.3664
Epoch 103/10000 took 3.2258 seconds


Current lr= 4.882813e-07 Updated lr= 2.4414064e-07
Epoch 104: Training Loss = 2.2848, Validation Loss = 2.4950, Validation Accuracy = 0.3666
Epoch 104/10000 took 3.2492 seconds


Epoch 105: Training Loss = 2.2772, Validation Loss = 2.4950, Validation Accuracy = 0.3656
Epoch 105/10000 took 3.2259 seconds


Epoch 106: Training Loss = 2.2708, Validation Loss = 2.4955, Validation Accuracy = 0.3650
Epoch 106/10000 took 3.2442 seconds


Epoch 107: Training Loss = 2.2654, Validation Loss = 2.4950, Validation Accuracy = 0.3664
Epoch 107/10000 took 3.2307 seconds


Epoch 108: Training Loss = 2.2816, Validation Loss = 2.4947, Validation Accuracy = 0.3652
Epoch 108/10000 took 3.2230 seconds


Epoch 109: Training Loss = 2.2753, Validation Loss = 2.4945, Validation Accuracy = 0.3654
Epoch 109/10000 took 3.2328 seconds


Epoch 110: Training Loss = 2.2762, Validation Loss = 2.4945, Validation Accuracy = 0.3680
Epoch 110/10000 took 3.2321 seconds


Epoch 111: Training Loss = 2.2695, Validation Loss = 2.4942, Validation Accuracy = 0.3674
Epoch 111/10000 took 3.2250 seconds


Epoch 112: Training Loss = 2.2708, Validation Loss = 2.4940, Validation Accuracy = 0.3664
Epoch 112/10000 took 3.2256 seconds


Epoch 113: Training Loss = 2.2853, Validation Loss = 2.4943, Validation Accuracy = 0.3670
Epoch 113/10000 took 3.2275 seconds


Epoch 114: Training Loss = 2.2884, Validation Loss = 2.4944, Validation Accuracy = 0.3660
Epoch 114/10000 took 3.2253 seconds


Epoch 115: Training Loss = 2.2744, Validation Loss = 2.4942, Validation Accuracy = 0.3664
Epoch 115/10000 took 3.2416 seconds


Epoch 116: Training Loss = 2.2715, Validation Loss = 2.4944, Validation Accuracy = 0.3652
Epoch 116/10000 took 3.2303 seconds


Epoch 117: Training Loss = 2.2748, Validation Loss = 2.4944, Validation Accuracy = 0.3652
Epoch 117/10000 took 3.2296 seconds


Epoch 118: Training Loss = 2.2595, Validation Loss = 2.4937, Validation Accuracy = 0.3664
Epoch 118/10000 took 3.2287 seconds


Epoch 119: Training Loss = 2.2657, Validation Loss = 2.4940, Validation Accuracy = 0.3662
Epoch 119/10000 took 3.2343 seconds


Epoch 120: Training Loss = 2.2744, Validation Loss = 2.4934, Validation Accuracy = 0.3652
Epoch 120/10000 took 3.2341 seconds


Epoch 121: Training Loss = 2.2671, Validation Loss = 2.4941, Validation Accuracy = 0.3662
Epoch 121/10000 took 3.2429 seconds


Epoch 122: Training Loss = 2.2732, Validation Loss = 2.4940, Validation Accuracy = 0.3654
Epoch 122/10000 took 3.2273 seconds


Epoch 123: Training Loss = 2.2643, Validation Loss = 2.4940, Validation Accuracy = 0.3670
Epoch 123/10000 took 3.2309 seconds


Epoch 124: Training Loss = 2.2803, Validation Loss = 2.4943, Validation Accuracy = 0.3664
Epoch 124/10000 took 3.2176 seconds


Epoch 125: Training Loss = 2.2752, Validation Loss = 2.4942, Validation Accuracy = 0.3662
Epoch 125/10000 took 3.2359 seconds


Epoch 126: Training Loss = 2.2773, Validation Loss = 2.4946, Validation Accuracy = 0.3668
Epoch 126/10000 took 3.2318 seconds


Epoch 127: Training Loss = 2.2633, Validation Loss = 2.4937, Validation Accuracy = 0.3650
Epoch 127/10000 took 3.2305 seconds


Epoch 128: Training Loss = 2.2684, Validation Loss = 2.4944, Validation Accuracy = 0.3654
Epoch 128/10000 took 3.2141 seconds


Epoch 129: Training Loss = 2.2678, Validation Loss = 2.4945, Validation Accuracy = 0.3662
Epoch 129/10000 took 3.2352 seconds


Epoch 130: Training Loss = 2.2693, Validation Loss = 2.4948, Validation Accuracy = 0.3670
Epoch 130/10000 took 3.2401 seconds


Epoch 131: Training Loss = 2.2871, Validation Loss = 2.4948, Validation Accuracy = 0.3650
Epoch 131/10000 took 3.2474 seconds


Epoch 132: Training Loss = 2.2738, Validation Loss = 2.4945, Validation Accuracy = 0.3666
Epoch 132/10000 took 3.2417 seconds


Epoch 133: Training Loss = 2.2795, Validation Loss = 2.4944, Validation Accuracy = 0.3656
Epoch 133/10000 took 3.2235 seconds


Epoch 134: Training Loss = 2.2715, Validation Loss = 2.4938, Validation Accuracy = 0.3654
Early stopping triggered at epoch 134
Finished training after 134 epochs!


Test acc: 0.010016025975346565


### 7e. Questions

**Question 3:** Compare your ResNet-8 with Inception Net with respect to CIFAR-10 test accuracy, runtime (per epoch), and the train/val loss progression throughout training. 

**Question 4:** How did ResNet-8 do on at CIFAR-100 test set classification compared to Inception Net?

**Answer 3:**

**Answer 4:**

## Task 8: ResNet-18

ResNet is an incredibly flexible/extensible neural network architecture. To get a better sense of this, let's build a deeper ResNet then train it on CIFAR-100.

### 8a. Stacking multiple Residual Blocks together in sequence

In ResNet-8, the spatial resolution/number of filters changed in every residual block. In deeper ResNets, this is not usually the case — there is a "string"/sequence of Residual Blocks with the SAME resolution and filter count stacked together after the change occurs.

To streamline the process of stacking multiple Residual Blocks with the same hyperparameters together, write the `stack_residualblocks` function in `resnets.py`. This should save you lots of copy-pasting and/or typing!

In [25]:
from resnets import stack_residualblocks

#### Test: `stack_residualblocks`

In [26]:
for i in range(4):
    test_stack = stack_residualblocks('TestStack', 4, i+1, prev_layer_or_block=None, first_block_stride=1)
    print(f'There are {len(test_stack)} blocks in the residual stack. There should be {i+1}.')

strides_in_stack = [block.strides for block in test_stack]
print(f'The strides in each block are: {strides_in_stack}. They should be [1, 1, 1, 1]')

test_stack = stack_residualblocks('TestStack', 4, 3, prev_layer_or_block=None, first_block_stride=2)
strides_in_stack = [block.strides for block in test_stack]
print(f'The strides in each block are: {strides_in_stack}. They should be [2, 1, 1, 1]')


There are 1 blocks in the residual stack. There should be 1.
There are 2 blocks in the residual stack. There should be 2.
There are 3 blocks in the residual stack. There should be 3.
There are 4 blocks in the residual stack. There should be 4.
The strides in each block are: [1, 1, 1, 1]. They should be [1, 1, 1, 1]
The strides in each block are: [2, 1, 1]. They should be [2, 1, 1, 1]


In [27]:
print('The blocks are:')
for block in test_stack:
    print(block)

The blocks are:
TestStack/block_1:
	Conv2D layer output(TestStack/block_1/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_1/main_3x3conv_1) shape: None
	-->Conv2D1x1 layer output(TestStack/block_1/skip_conv1x1) shape: None-->
TestStack/block_2:
	Conv2D layer output(TestStack/block_2/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_2/main_3x3conv_1) shape: None
TestStack/block_3:
	Conv2D layer output(TestStack/block_3/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_3/main_3x3conv_1) shape: None


The above should print:

```
The blocks are:
TestStack/block_1:
	Conv2D layer output(TestStack/block_1/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_1/main_3x3conv_1) shape: None
	-->Conv2D1x1 layer output(TestStack/block_1/skip_conv1x1) shape: None-->
TestStack/block_2:
	Conv2D layer output(TestStack/block_2/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_2/main_3x3conv_1) shape: None
TestStack/block_3:
	Conv2D layer output(TestStack/block_3/main_3x3conv_2) shape: None
	Conv2D layer output(TestStack/block_3/main_3x3conv_1) shape: None
```

### 8b. Build ResNet-18

Implement the `ResNet18` class in `resnets.py`.

In [146]:
from resnets import ResNet18

#### Test: `ResNet18`

In [147]:
res18 = ResNet18(C=4, input_feats_shape=(32, 32, 3))
res18.compile()

---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 4]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 512]
stack_4/block_2:
	Conv2D layer output(stack_4/block_2/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack_4/block_2/main_3x3conv_1) shape: [1, 4, 4, 512]
stack_4/block_1:
	Conv2D layer output(stack_4/block_1/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack_4/block_1/main_3x3conv_1) shape: [1, 4, 4, 512]
	-->Conv2D1x1 layer output(stack_4/block_1/skip_conv1x1) shape: [1, 4, 4, 512]-->
stack_3/block_2:
	Conv2D layer output(stack_3/block_2/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack_3/block_2/main_3x3conv_1) shape: [1, 8, 8, 256]
stack_3/block_1:
	Conv2D layer output(stack_3/block_1/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack_3/block_1/main_3x3conv_1) shape: [1, 8, 8, 256]
	-->Conv2D1x1 layer output(stack_3/block_1/skip_conv1x1) shap

The above cell should print:

```
---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 4]
Global Avg Pooling 2D layer output(GlobalAvgPool2D) shape: [1, 512]
stack4/block_2:
	Conv2D layer output(stack4/block_2/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack4/block_2/main_3x3conv_1) shape: [1, 4, 4, 512]
stack4/block_1:
	Conv2D layer output(stack4/block_1/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack4/block_1/main_3x3conv_1) shape: [1, 4, 4, 512]
	-->Conv2D1x1 layer output(stack4/block_1/skip_conv1x1) shape: [1, 4, 4, 512]-->
stack3/block_2:
	Conv2D layer output(stack3/block_2/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack3/block_2/main_3x3conv_1) shape: [1, 8, 8, 256]
stack3/block_1:
	Conv2D layer output(stack3/block_1/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack3/block_1/main_3x3conv_1) shape: [1, 8, 8, 256]
	-->Conv2D1x1 layer output(stack3/block_1/skip_conv1x1) shape: [1, 8, 8, 256]-->
stack2/block_2:
	Conv2D layer output(stack2/block_2/main_3x3conv_2) shape: [1, 16, 16, 128]
	Conv2D layer output(stack2/block_2/main_3x3conv_1) shape: [1, 16, 16, 128]
stack2/block_1:
	Conv2D layer output(stack2/block_1/main_3x3conv_2) shape: [1, 16, 16, 128]
	Conv2D layer output(stack2/block_1/main_3x3conv_1) shape: [1, 16, 16, 128]
	-->Conv2D1x1 layer output(stack2/block_1/skip_conv1x1) shape: [1, 16, 16, 128]-->
stack1/block_2:
	Conv2D layer output(stack1/block_2/main_3x3conv_2) shape: [1, 32, 32, 64]
	Conv2D layer output(stack1/block_2/main_3x3conv_1) shape: [1, 32, 32, 64]
stack1/block_1:
	Conv2D layer output(stack1/block_1/main_3x3conv_2) shape: [1, 32, 32, 64]
	Conv2D layer output(stack1/block_1/main_3x3conv_1) shape: [1, 32, 32, 64]
Conv2D layer output(Conv2D_1) shape: [1, 32, 32, 64]
```

### 8c. Overfit ResNet-18 on CIFAR-100 dev set

Perform the usual overfitting protocol to test out your ResNet-18. However, this time use the 1st 500 samples of CIFAR-100 rather than CIFAR-10 to conduct the test.

In the cell below, import CIFAR-100 and reproduce our usual overfit protocol:
1. Create a dev set from the 1st 500 training CIFAR-100 samples.
2. Train your net on the dev set for `80` epochs (turn off early stopping for this test). *Do not use any regularization.* 

Your training loss should start out at ~4.7 after the first epoch and rapidly plummet to 0.01 or less after about 30 epochs.

In [148]:
x100_train, y100_train, x100_val, y100_val, x100_test, y100_test, classnames100 = get_dataset('cifar10')
x100_dev = x100_train[:500]
y100_dev = y100_train[:500]

In [149]:
tf.keras.backend.clear_session()
tf.random.set_seed(0)

# YOUR CODE HERE
model = ResNet18(100, (32,32,3), reg = 0)
model.compile(optimizer='adamw')
model.fit(x100_dev, y100_dev, x100_dev, y100_dev, max_epochs = 80, val_every = 1, verbose = True)

---------------------------------------------------------------------------
Dense layer output(Output) shape: [1, 100]
Global Avg Pooling 2D layer output(GlobalAveragePool2D) shape: [1, 512]
stack_4/block_2:
	Conv2D layer output(stack_4/block_2/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack_4/block_2/main_3x3conv_1) shape: [1, 4, 4, 512]
stack_4/block_1:
	Conv2D layer output(stack_4/block_1/main_3x3conv_2) shape: [1, 4, 4, 512]
	Conv2D layer output(stack_4/block_1/main_3x3conv_1) shape: [1, 4, 4, 512]
	-->Conv2D1x1 layer output(stack_4/block_1/skip_conv1x1) shape: [1, 4, 4, 512]-->
stack_3/block_2:
	Conv2D layer output(stack_3/block_2/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack_3/block_2/main_3x3conv_1) shape: [1, 8, 8, 256]
stack_3/block_1:
	Conv2D layer output(stack_3/block_1/main_3x3conv_2) shape: [1, 8, 8, 256]
	Conv2D layer output(stack_3/block_1/main_3x3conv_1) shape: [1, 8, 8, 256]
	-->Conv2D1x1 layer output(stack_3/block_1/skip_conv1x1) sh

([9.090593,
  3.7235336,
  3.4705682,
  3.1563995,
  2.8880653,
  2.7615976,
  2.5381956,
  2.3496616,
  2.3408196,
  2.2965195,
  2.327388,
  2.3135433,
  2.28564,
  2.2746449,
  2.2715006,
  2.192308,
  2.155007,
  2.126965,
  2.1231234,
  2.0798275,
  2.0611181,
  1.9823172,
  1.9691477,
  1.9479959,
  1.9123169,
  1.821917,
  1.9092288,
  1.885099,
  1.7659696,
  1.767422,
  1.6690844,
  1.5585985,
  1.5412904,
  1.432499,
  1.4774876,
  1.5017985,
  1.3096576,
  1.1933943,
  1.1150746,
  0.8963085,
  0.7793431,
  0.82272464,
  0.5462443,
  0.58766615,
  0.48381257,
  0.5348976,
  0.59862524,
  0.43222013,
  0.36099023,
  0.20350632,
  0.2105668,
  0.121621594,
  0.1438151,
  0.06145041,
  0.09728874,
  0.124000885,
  0.13201112,
  0.19484606,
  0.082511134,
  0.057344172,
  0.045132745,
  0.058786523,
  0.03651157,
  0.033744987,
  0.049093366,
  0.03879678,
  0.07588796,
  0.20495546,
  0.16777228,
  0.16636345,
  0.1398708,
  0.08132751,
  0.04630975,
  0.020889256,
  0.02153016

### 8d. Train ResNet-18 on CIFAR-100

In the cell below, train your ResNet-18 on CIFAR-100. Print out the test set after training concludes.

Use regularization strength of `1.5`, a patience of `15`, learning rate patience of `4`, and keep the rest of the hyperparameters to their defaults.

In [0]:
tf.keras.backend.clear_session()
tf.random.set_seed(0)

model = ResNet18(100, (32,32,3), reg = 0)
model.compile(optimizer='adamw')
model.fit(x100_train, y100_train, x100_val, y100_val, max_epochs = 80, val_every = 1, verbose = True)
print(f"Test acc: {model.evaluate(x_test, y_test)[0]}")

### 8e. Visualize the predictions made by ResNet-18 on the CIFAR-100 test set.

In the cell below, use your trained ResNet-18 to get the predicted classes of thr 1st 225 test set images.

Run the code below to create a 15x15 grid of CIFAR-100 images with the true and predicted classes in the title. The predicted classes are color-coded  blue if they are correct, red if they are incorrect.

In [0]:
_,_,_,_, x100_test_vis, y100_test_vis, classnames = get_dataset('cifar100', standardize_ds=False)

panel_sz = 4
grid = 15
fig, axes, = plt.subplots(nrows=grid, ncols=grid, figsize=(grid*panel_sz, grid*panel_sz))

for r in range(grid):
    for c in range(grid):
        ind = grid*r + c
        axes[r,c].imshow(x100_test_vis[ind])
        axes[r,c].set_xticks([])
        axes[r,c].set_yticks([])
        title = f'{classnames[y100_test[ind]]}\nPredicted: '
        title += f'{classnames[y_pred[ind]]}'

        color = 'blue'
        if y100_test[ind] != y_pred[ind]:
            color = 'red'

        axes[r,c].set_title(title, color=color)
plt.tight_layout()
plt.show()


### 8f. Questions

**Question 5:** Take a look at the above montage. Does the mistakes made by ResNet-18 seem reasonable? Provide some specific examples to support your conclusion.

**Answer 5:**

## Extensions

### General guidelines

1. Never integrate extensions into your base project so that they change the expected behavior of core functions. If your extension changes the core design/behavior, no problem, duplicate your working base project and add features from there.
2. Check the rubric to keep in mind how extensions on this project will be graded.
3. While I may consult your code and "written log" of what you did, **I am grading your extensions based on what you present in your 3-5 min video.**
3. I suggest documenting your explorations in a "log" or "lab notebook" style (i.e. documenting your thought/progression/discovery/learning process). I'm not grading your writing, so you can keep it succinct. **Whatever is most useful to you to remember what you did.** 
4. I suggest taking a hypothesis driven approach. For example "I was curious about X so I explored Y. I found Z, which was not what I expected because..., so then tried A..."
5. Make plots to help showcase your results.
6. **More is not necessarily better.** Generally, a small number of "in-depth" extensions count for more than many "shallow" extensions.

### AI guidelines

You may use AI in mostly any capacity for extensions. However, keep in mind:
1. There is no need to use AI at all!
2. You are welcome to use AI as a tool (e.g. automate something that is tedious, help you get unstuck, etc.). However, you should be coding, you should be thinking, you should be writing, you should be creating. If you are spending most (or even close to most) of your time typing into a chatbot and copy-pasting, you have probably gone too far with AI use.
3. I don't find large volumes of AI generated code/text/plots to be particularly impressive and you risk losing my interest while grading. Remember: I'm grading your extensions based on your video presentation. **More is not necessarily better.**

### Video guidelines

1. Please try to keep your video to 5 minutes (*I have other projects to grade!*). If you turn in a longer video, I make no promise that I will watch more than 5 minutes.
2. Your screen should be shared as you show me what you did. A live video of your face should also appear somewhere on the screen (e.g. picture-in-picture overlay / split screen).
3. Your partner should join you for the video and take turns talking, but, if necessary, it is fine to have one team member present during the record the video.
4. Do not simply read text from your notebook, do not read from a prepared script. I am not grading how polished your video presentation is (see extension grading criteria on rubric). 
5. I am looking for original and creative explorations sparked by your curiosity/interest/passion in a topic. This should be apparent in your video.
6. Be natural,, don't feel the need to impress me with fancy language. If it is helpful, imagine that we are talking one-on-one about your extension. Tell me what you did :)

### Extension ideas

#### 1. ResNet-34

Create and train the well-known network of the ResNet family called ResNet-34. Here is a suggested network configuration to experiment with:

```
block_units = [64, 128, 256, 512]
num_blocks = [3, 4, 6, 3]
first_block_strides = [1, 2, 2, 2]
```

#### 2. ResNet-50

Create and train the well-known network of the ResNet family called ResNet-50. Given its depth, it uses a "Bottleneck block" rather than a normal Residual Block, but the overall structure is very similar. Here is a suggested network configuration to experiment with:

```
block_units = [64, 128, 256, 512]
num_blocks = [3, 4, 6, 3]
first_block_strides = [1, 2, 2, 2]
```

#### 3. VGG networks on CIFAR-100

How does one or more of your VGG networks do at classifying images in CIFAR-100?

#### 4. Other ResNets on CIFAR-10

How do the other ResNets do at classifying images in CIFAR-10?

#### 5. Multi-network comparison

Compare the accuracy, efficiency, etc of any number of networks from the VGG, Inception Net, and ResNet families.

#### 6. Add support for saving/loading network weights

A key limitation of your current deep learning library is that parameters that capture the learning in networks are completely reset/lost/wiped out when the notebook kernel is terminated. Add (and test!) support for saving network parameters to disk after (or periodically during) training. Add (and test!) support for loading network parameters back into the network from disk before training. 

Be careful to include the moving mean and standard deviation parameters in batch normalization layers otherwise the whole net will not work!

#### 7. Other image datasets

Apply any of the three deep network families to another dataset of your choice. 

#### 8. Hyperparameter tuning

Try and find hyperparameters that allow Inception Net and the ResNets to achieve better accuracy on CIFAR-10 and/or CIFAR-100.

#### 9. Build other Inception Nets

We only built a single network, but just like VGG and ResNet, you can modify the network depth while following the computational motifs of the Inception Net architecture. Design and experiment with your own Inception Net!

#### 10. Analyze errors made by one or more of the nets

Make a confusion matrix for CIFAR-10 or CIFAR-100 (*a challenge to make it useful!*).

Visualize the predictions made by Inception Net and/or a VGG net, perhaps similar what was done with the ResNet.