**Saad Khan and Veer Khosla**

Fall 2024

CS343: Neural Networks

Project 3: Convolutional Neural Networks

In [103]:
import numpy as np
import matplotlib.pyplot as plt

plt.style.use(['seaborn-v0_8-colorblind', 'seaborn-v0_8-darkgrid'])
plt.rcParams.update({'font.size': 20})

np.set_printoptions(suppress=True, precision=7)

# Automatically reload your external source code
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Goal

The goal of this notebook is to walk you through the recommended implementation order for your convolutional neural network and provide test code to help make sure your implementation works as expected every step of the way.

Next week, you will test your network on the STL-10 dataset.

**Global note: Make sure any debug printouts do not appear if `verbose=False`!**


## Task 4: Building a convolutional neural network

Now that you have the core convolution and max pool operations implemented, you can tackle the main task of building a convolutional neural network. This will be a "deep" 4-layer neural network with the following architecture:

1. Convolution (net-in), Relu (net-act).
2. Max pool 2D (net-in), linear (net-act).
3. Flatten (net-in), linear (net-act).
4. Dense (net-in), Relu (net-act).
5. Dense (net-in), soft-max (net-act).

In the above outline, the first part is the layer net-in type (e.g. conv, maxpool, etc) and the second is the layer's activation function (rectified linear (Relu), soft-max, etc). 

Unlike the MLP project, your network will adopt an object-oriented, modular design that should make it straightforward to add/remove/customize layers with minimal code changes.

### 4a. Migrate existing code (`layer.py`)

Copy-paste your code from the last project to implement the following functions:

- `one_hot`

### 4b. Network layer activation functions (`layer.py`)

Implement the following activation functions. Remember, an activation function transforms a layer's "net input" to "net activation".

- `linear`
- `relu`
- `softmax`
- `compute_net_act`

Equation for softmax:

$e^{x_{ij}} / \sum_{k=1}^C e^{x_{ik}}$ where $x_{ij}$ is the "net in" value for neuron $j$ for input $i$. $C$ corresponds to the number of classes in the dataset.

In [104]:
from layer import *

#### Test: `linear()`

In [105]:
test_layer = Layer(0, 'test')
test_layer.net_in = np.arange(10)
test_layer.linear()
print(f'{test_layer.net_act} should be [0 1 2 3 4 5 6 7 8 9]')

[0 1 2 3 4 5 6 7 8 9] should be [0 1 2 3 4 5 6 7 8 9]


#### Test: `relu()`

In [106]:
rng = np.random.default_rng(0)
test_layer = Layer(0, 'test')
test_layer.net_in = rng.random([3, 3]) - 0.5
test_layer.relu()
print(f'{test_layer.net_act}')

[[0.1369617 0.        0.       ]
 [0.        0.3132702 0.4127556]
 [0.1066358 0.2294966 0.043625 ]]


You should get:


    [[0.1369617 0.        0.       ]
     [0.        0.3132702 0.4127556]
     [0.1066358 0.2294966 0.043625 ]]

#### Test: `softmax()`

In [107]:
rng = np.random.default_rng(0)
test_layer = Layer(0, 'test')
test_layer.net_in = rng.random([2, 5])
test_layer.softmax()
print(f'{test_layer.net_act}')

[[0.2516215 0.1742953 0.1386479 0.1352996 0.3001356]
 [0.2334946 0.1719217 0.1943965 0.161423  0.2387641]]


You should get:

    [[0.2516215 0.1742953 0.1386479 0.1352996 0.3001356]
     [0.2334946 0.1719217 0.1943965 0.161423  0.2387641]]

### 4c. Implement loss function

In `layer.py`, implement cross-entropy loss `cross_entropy()` (see `loss()` for usage). 

Mathematical equation:

$-\frac{1}{B}\sum_{i=1}^B Log \left ( y_{ic} \right )$ 

where $y_{ic}$ is the softmax activation value $y$ for the NEURON CODING THE CORRECT CLASS $c$ for the $i^{th}$ input in the mini-batch ($i: 1...B$).

#### Test: `cross_entropy()`

In [108]:
rng = np.random.default_rng(0)
y = np.array([0, 4, 1])
test_layer = Layer(0, 'test')
test_layer.net_in = rng.random([3, 5])
test_layer.softmax()
print(f'Your loss is {test_layer.cross_entropy(y):.5f} and it should be 1.65869')

Your loss is 1.65869 and it should be 1.65869


### 4d. Implement the forward pass of the convolution layer

Do this first because this is the first layer of the network (see above architecture).

Implement and test the following methods in `layer.py`:

- constructor in `Conv2D`
- `compute_net_in` in `Conv2D`
- `forward` in `Layer`. The `forward` method synthesizes your work so far and computes all the forward operations for this (and any other layers you create later on).

##### Test Conv2D initialization

In [109]:
conv2_layer = Conv2D(0, 'conv2', n_kers=2, ker_sz=2, wt_scale=1e-1, r_seed=2)
print(f'Your filter weights are\n{conv2_layer.wts}')
print(f'Your bias terms are\n{conv2_layer.b}')

Your filter weights are
[[[[ 0.0189053 -0.0522748]
   [-0.0413064 -0.2441467]]

  [[ 0.1799707  0.1144166]
   [-0.0325423  0.0773807]]

  [[ 0.0281211 -0.0553823]
   [ 0.0977567 -0.0310557]]]


 [[[-0.0328824 -0.0792147]
   [ 0.0454958 -0.0099198]]

  [[ 0.0545289 -0.0607186]
   [ 0.0126828 -0.0892274]]

  [[ 0.0841465  0.0188035]
   [ 0.0330571  0.0410504]]]]
Your bias terms are
[-0.1010758  0.0783181]


The above should yield:

```
Your filter weights are
[[[[ 0.0189053 -0.0522748]
   [-0.0413064 -0.2441467]]

  [[ 0.1799707  0.1144166]
   [-0.0325423  0.0773807]]

  [[ 0.0281211 -0.0553823]
   [ 0.0977567 -0.0310557]]]


 [[[-0.0328824 -0.0792147]
   [ 0.0454958 -0.0099198]]

  [[ 0.0545289 -0.0607186]
   [ 0.0126828 -0.0892274]]

  [[ 0.0841465  0.0188035]
   [ 0.0330571  0.0410504]]]]
Your bias terms are
[-0.1010758  0.0783181]
```

#####  Test `forward` using `Conv2D` layer

In [110]:
rng = np.random.default_rng(1)
# Create test net parameters
mini_batch_sz, n_kers, n_chans, ker_sz, img_y, img_x = 1, 2, 3, 4, 5, 5
# Create random test input
inputs = rng.standard_normal((mini_batch_sz, n_chans, img_y, img_x))

# Create a convolution layer with ReLU activation function
conv_layer = Conv2D(0, 'test', n_kers, ker_sz, n_chans=n_chans, wt_scale=1e-1, activation='relu', r_seed=3)

# Do a forward pass thru the layer
net_act = conv_layer.forward(inputs)

# Extract the computed net values
net_in = conv_layer.net_in
wts = conv_layer.get_wts()
inp = conv_layer.input

print(f'Your input stored in the net has shape: {inp.shape} and it should be (1, 3, 5, 5)')
print(f'Your network wts stored in the net has shape: {wts.shape} and it should be (2, 3, 4, 4)')
print(f'Your network activation has shape: {net_act.shape} and it should be (1, 2, 5, 5)')
print(f'Your net-in has shape: {net_in.shape} and it should be (1, 2, 5, 5)')
print()
print('The first chunk of your filters/weights is:\n', wts[0, 0])
print('The first chunk of your net_in is:\n', net_in[0,0])
print('The first chunk of your net_act is:\n', net_act[0,0])

batch_sz=1, n_chan=3, img_x=5, img_y=5
n_kers=2, n_ker_chans=3, ker_y=4, ker_x=4
Output shape: (1, 2, 5, 5)
Your input stored in the net has shape: (1, 3, 5, 5) and it should be (1, 3, 5, 5)
Your network wts stored in the net has shape: (2, 3, 4, 4) and it should be (2, 3, 4, 4)
Your network activation has shape: (1, 2, 5, 5) and it should be (1, 2, 5, 5)
Your net-in has shape: (1, 2, 5, 5) and it should be (1, 2, 5, 5)

The first chunk of your filters/weights is:
 [[ 0.2040919 -0.2555665  0.0418099 -0.056777 ]
 [-0.0452649 -0.0215597 -0.2019986 -0.0231932]
 [-0.0865213  0.3323     0.0225787 -0.0352631]
 [-0.0281287 -0.0668046 -0.1055151 -0.0390801]]
The first chunk of your net_in is:
 [[-0.3826142 -0.0280612 -0.1979558 -0.3622447  0.3818603]
 [ 0.0064714 -0.4900334  0.5608707 -0.8228845  0.6972164]
 [-0.1063832  0.5554649 -0.8752727  0.1988049  1.093694 ]
 [-0.0834898 -0.2933751  0.07192   -1.255426   0.591067 ]
 [ 0.4135758 -0.2120413 -0.7609696 -0.2705517 -0.2680388]]
The first chun

The expected output above is:

```
The first chunk of your filters/weights is:
 [[ 0.2040919 -0.2555665  0.0418099 -0.056777 ]
 [-0.0452649 -0.0215597 -0.2019986 -0.0231932]
 [-0.0865213  0.3323     0.0225787 -0.0352631]
 [-0.0281287 -0.0668046 -0.1055151 -0.0390801]]
The first chunk of your net_in is:
 [[-0.3826142 -0.0280612 -0.1979558 -0.3622447  0.3818603]
 [ 0.0064714 -0.4900334  0.5608707 -0.8228845  0.6972164]
 [-0.1063832  0.5554649 -0.8752727  0.1988049  1.093694 ]
 [-0.0834898 -0.2933751  0.07192   -1.255426   0.591067 ]
 [ 0.4135758 -0.2120413 -0.7609696 -0.2705517 -0.2680388]]
The first chunk of your net_act is:
 [[0.        0.        0.        0.        0.3818603]
 [0.0064714 0.        0.5608707 0.        0.6972164]
 [0.        0.5554649 0.        0.1988049 1.093694 ]
 [0.        0.        0.07192   0.        0.591067 ]
 [0.4135758 0.        0.        0.        0.       ]]
```

### 4e. Implement the forward pass of the max pooling layer

The second layer in the `ConvNet4` architecture is a `MaxPool2D` layer (uses `MaxPool2D` to compute `netIn`) that does a max pooling operation on the output (`netAct`) of the previous layer (`Conv2D`).

Implement and test the following methods:
-  `compute_net_in` in `MaxPool2D`

#####  Test `forward` using `MaxPool2D` layer

In [111]:
rng = np.random.default_rng(0)
# Create test net parameters
mini_batch_sz, n_kers, n_chans, ker_sz, img_y, img_x = 1, 2, 3, 4, 6, 6
# Create random test input
inputs = rng.standard_normal((mini_batch_sz, n_chans, img_y, img_x))

# Create a max pooling layer with default (linear) activation function
pool_layer = MaxPool2D(0, 'pool', pool_size=2, strides=2)

# Do a forward pass thru the layer
net_act = pool_layer.forward(inputs)

# Extract the computed net values
net_in = pool_layer.net_in
wts = pool_layer.wts
inp = pool_layer.input

print(f'Your input stored in the net has shape: {inp.shape} and it should be (1, 3, 6, 6)')
print(f'Your network wts stored is None (as it should be)? {wts is None}')
print(f'Your network activation has shape: {net_act.shape} and it should be (1, 3, 3, 3)')
print(f'Your net in has shape: {net_in.shape} and it should be (1, 3, 3, 3)')
print()
print('The first chunk of your net_in is:\n', net_in[0,0])
print('The first chunk of your net_act is:\n', net_act[0,0])

Input shape: mini_batch_sz=1, n_chans=3, height=6, width=6
Output shape: mini_batch_sz=1, n_chans=3, height=3, width=3
Your input stored in the net has shape: (1, 3, 6, 6) and it should be (1, 3, 6, 6)
Your network wts stored is None (as it should be)? True
Your network activation has shape: (1, 3, 3, 3) and it should be (1, 3, 3, 3)
Your net in has shape: (1, 3, 3, 3) and it should be (1, 3, 3, 3)

The first chunk of your net_in is:
 [[1.304     0.6404227 0.3615951]
 [1.0425134 1.3664635 0.3515101]
 [0.9034702 0.5408456 0.3553727]]
The first chunk of your net_act is:
 [[1.304     0.6404227 0.3615951]
 [1.0425134 1.3664635 0.3515101]
 [0.9034702 0.5408456 0.3553727]]


The expected output above is:

```
The first chunk of your net_in is:
 [[1.304     0.6404227 0.3615951]
 [1.0425134 1.3664635 0.3515101]
 [0.9034702 0.5408456 0.3553727]]
The first chunk of your net_act is:
 [[1.304     0.6404227 0.3615951]
 [1.0425134 1.3664635 0.3515101]
 [0.9034702 0.5408456 0.3553727]]
```

### 4f. Implement the Flatten layer

The 3rd layer is a "fake layer" that acts as glue to format the activations produced by the last max pooling layer (*2D spatial features*) in a way that is appropriate for subsequent dense layers (*1D features*).

Implement and test method: `compute_net_in` in `Flatten`.

**NOTE:** You will test this soon when you implement the next subtask involving the Dense layer forward pass.

### 4g. Implement the forward pass of the Dense layer

The 4th (hidden) and 5th (output) layers in the `ConvNet4` architecture are ones that use `Dense` `netIn` (these are like the layers in ADALINE/MLP).

Implement and test the following methods:
-  constructor in `Dense`
- `compute_net_in` in `Dense`

##### Test Dense initialization

In [112]:
hidden_layer = Dense(6, 'dense', units=10, n_units_prev_layer=3, wt_scale=1e-1, r_seed=1)
print(f'Your filter weights are\n{hidden_layer.wts}')
print(f'Your bias terms are\n{hidden_layer.b}')

Your filter weights are
[[ 0.000123   0.0298746 -0.0274138 -0.0890592 -0.0454671 -0.0991647
   0.0060144  0.1340215 -0.0492207 -0.0620475]
 [ 0.0489842  0.0356887  0.0105414 -0.0930468 -0.0029252  0.0695303
  -0.1344215 -0.0457616 -0.1901223 -0.1289538]
 [-0.1841735 -0.0235091 -0.1267446  0.0271264  0.0156751 -0.0186931
  -0.251676  -0.0538693 -0.0048501  0.0113309]]
Your bias terms are
[-0.1530136 -0.0477753 -0.0978519 -0.0808837  0.1060899 -0.0807535
 -0.0032522  0.088439  -0.05836   -0.0111702]


The above should yield:

    Your filter weights are
    [[ 0.000123   0.0298746 -0.0274138 -0.0890592 -0.0454671 -0.0991647
      0.0060144  0.1340215 -0.0492207 -0.0620475]
    [ 0.0489842  0.0356887  0.0105414 -0.0930468 -0.0029252  0.0695303
      -0.1344215 -0.0457616 -0.1901223 -0.1289538]
    [-0.1841735 -0.0235091 -0.1267446  0.0271264  0.0156751 -0.0186931
      -0.251676  -0.0538693 -0.0048501  0.0113309]]
    Your bias terms are
    [-0.1530136 -0.0477753 -0.0978519 -0.0808837  0.1060899 -0.0807535
    -0.0032522  0.088439  -0.05836   -0.0111702]

#### Test Dense layer forward pass

In [113]:
rng = np.random.default_rng(0)
mini_batch_sz, n_kers, n_chans, ker_sz, img_y, img_x = 2, 2, 3, 4, 6, 6
inputs = rng.standard_normal((mini_batch_sz, n_chans, img_y, img_x))

flat_layer = Flatten(4, 'flatten')
hidden_layer = Dense(5, 'hidden', units=5, wt_scale=1e-1, n_units_prev_layer=n_chans*img_y*img_x, activation='relu', r_seed=2)
hidden_layer.b -= 0.01

net_act_f = flat_layer.forward(inputs)
net_act = hidden_layer.forward(net_act_f)
net_in = hidden_layer.net_in
wts = hidden_layer.wts
inp = hidden_layer.input

print(f'Your flattened layer activations have shape: {net_act_f.shape} and it should be (2, 108)')
print(f'Your input stored in the net has shape: {inp.shape} and it should be (2, 108)')
print(f'Your network wts have shape {wts.shape} and it should be (108, 5)')
print(f'Your network activation has shape: {net_act.shape} and it should be (2, 5)')
print(f'Your net in has shape: {net_in.shape} and it should be (2, 5)')
print()
print('Your net_in is:\n', net_in)
print('Your net_act is:\n', net_act)

Your flattened layer activations have shape: (2, 108) and it should be (2, 108)
Your input stored in the net has shape: (2, 108) and it should be (2, 108)
Your network wts have shape (108, 5) and it should be (108, 5)
Your network activation has shape: (2, 5) and it should be (2, 5)
Your net in has shape: (2, 5) and it should be (2, 5)

Your net_in is:
 [[-0.1462589  0.3849604 -0.9584967 -1.4041901 -1.2886871]
 [ 2.3934785 -0.7663568 -0.5352005 -0.616648  -0.8445118]]
Your net_act is:
 [[0.        0.3849604 0.        0.        0.       ]
 [2.3934785 0.        0.        0.        0.       ]]


The expected output above is:

        Your net_in is:
        [[-0.1462589  0.3849604 -0.9584967 -1.4041901 -1.2886871]
        [ 2.3934785 -0.7663568 -0.5352005 -0.616648  -0.8445118]]
        Your net_act is:
        [[0.        0.3849604 0.        0.        0.       ]
        [2.3934785 0.        0.        0.        0.       ]]

### 4h. Implement network full forward pass

Now it's time to chain all the individual layers together into a network. 

In `network.py`, implement and the following methods:

- `forward`. This is the forward pass in the that calls each layer's `forward` method that you implemented above. The result of this method will be the loss derived from the activation (`net_act`) of the Output layer, 5 layers deep.
- `wt_reg_reduce`. This is needed for a complete implementation of the full forward method.

Before you can test the forward pass of the network, you need to define what layers belong in the network and how they are arranged! This is done by making a subclass of `Network`. 

- Implement the constructor of `ConvNet4`, adding the layers (in forward pass order):

Conv2D → MaxPool2D → Flatten → Dense → Dense

##### Test network forward pass

In [114]:
from network import ConvNet4

In [115]:
rng = np.random.default_rng(3)
n_inputs = 5
X = rng.standard_normal((n_inputs, 3, 32, 32))
y = rng.integers(10, size=n_inputs)

net = ConvNet4(wt_scale=5e-2, r_seed=10)
loss = net.forward(X, y)
print(f'Forward testing loss is {loss:.4f} and it should be 4.0370')

net.reg = 0.1
for layer in net.layers:
    if hasattr(layer, 'reg'):
        layer.reg = 0.1
loss = net.forward(X, y)
print(f'Forward testing regularized loss is {loss:.4f} and it should be 107.1789')
print()
print(f'Your output layer activation values are\n{net.layers[-1].net_act}')

Forward testing loss is 4.0370 and it should be 4.0370
Forward testing regularized loss is 107.1789 and it should be 107.1789

Your output layer activation values are
[[0.0460154 0.0166654 0.2381854 0.0720402 0.0145981 0.3102782 0.0425223
  0.0130873 0.1806306 0.065977 ]
 [0.03702   0.0359927 0.1820291 0.0583751 0.0070645 0.1171269 0.2426893
  0.0638169 0.1969492 0.0589363]
 [0.0253248 0.0110764 0.2864853 0.0469163 0.0250096 0.2960596 0.0550878
  0.0174456 0.2199306 0.0166642]
 [0.055229  0.0040458 0.1512233 0.1061286 0.0298962 0.351998  0.0631639
  0.0128086 0.192367  0.0331395]
 [0.0349401 0.0418617 0.1294357 0.0627794 0.0267826 0.3237357 0.0595409
  0.0247131 0.217747  0.0784637]]


The above should print:

    Your output layer activation values are
    [[0.0460154 0.0166654 0.2381854 0.0720402 0.0145981 0.3102782 0.0425223
      0.0130873 0.1806306 0.065977 ]
    [0.03702   0.0359927 0.1820291 0.0583751 0.0070645 0.1171269 0.2426893
      0.0638169 0.1969492 0.0589363]
    [0.0253248 0.0110764 0.2864853 0.0469163 0.0250096 0.2960596 0.0550878
      0.0174456 0.2199306 0.0166642]
    [0.055229  0.0040458 0.1512233 0.1061286 0.0298962 0.351998  0.0631639
      0.0128086 0.192367  0.0331395]
    [0.0349401 0.0418617 0.1294357 0.0627794 0.0267826 0.3237357 0.0595409
      0.0247131 0.217747  0.0784637]]

### 4i. Implement the backward pass of the convolutional neural network

Next, you are going to implement the backward pass of gradients that stem from the loss function and propogate all the way to the 1st layer of the network. 

As usual, we need to compute several types of gradients for EACH network layer (see instance variable placeholders in the constructor):
- `d_net_act`
- `d_net_wts` (for layers that have weights)
- `d_net_in`
- `d_b` (for layers that have weights)

#### The flow of the backward gradients

- `d_upstream` gives us the gradient from the layer above that GOT US TO the `net_act` stage of the new, current layer. Using `d_upstream`, we compute `d_net_in` via (`backward_netAct_to_netIn` in `Layer`) — this gets us to the `net_in` stage, like usual.
- Using `d_net_in`, we compute [`dprev_net_act`, `d_net_wts`, `d_b`] via `backward_netIn_to_prevLayer_netAct` in `Layer`, where `dprev_net_act` is the net_act gradient for layer beneath the current one (the `d_upstream` for the one layer down).
- `dprev_net_act` becomes `d_upstream` for the next layer down, and the process repeats...

We only need to "store" `d_b` and `d_net_wts` as instance variables in a layer because these are needed for weight updates during training/backprop, the other variables are just needed temporarily as a means to compute `d_b` and `d_net_wts` in areas downstream. 

#### Goal of backward pass

We need to compute these variables (`d_net_act` `d_net_wts` `d_net_in` `d_b`) for each network layer. Start working at the network level and drill down into the layer-specific implementations.

Implement the following:

- `backward` in `Network`
- `backward` in `Layer`
- `backward_netAct_to_netIn` in `Layer` (computes `d_net_in`)
- `backward_netIn_to_prevLayer_netAct` in `Dense` (computes [`dprev_net_act`, `d_net_wts`, `d_b`])
- `backward_netIn_to_prevLayer_netAct` in `Flatten` (computes `dprev_net_act`)
- `backward_netIn_to_prevLayer_netAct` in `MaxPool2D` (arguably the most challenging, so there are more detailed instructions)

(`backward_netIn_to_prevLayer_netAct` in `Conv2D` is already done for you :)

#### 4i. (i) Test backwards thru output (Dense) layer

In [116]:
rng = np.random.default_rng(3)

n_inputs = 5
n_hidden = 10
n_chans, img_y, img_x = 1, 3, 3
n_units_prev_layer = n_chans*img_y*img_x

# Define test inputs/net quantities
inputs = rng.random((n_inputs, n_chans, img_y, img_x))  # 5, 1, 3, 3
wts = rng.random((n_units_prev_layer, n_hidden))  # 9, 10
b = rng.random((n_hidden,))  # 10
d_upstream = rng.random((n_inputs, n_hidden))  # 5, 10

# Create layer and fill it with the test values
f_layer = Flatten(8, 'Flatten')
dense_layer = Dense(10, 'Output', units=n_hidden, n_units_prev_layer=n_units_prev_layer)
f_layer.input = inputs
f_layer.compute_net_in()
dense_layer.input = f_layer.net_in
dense_layer.wts = wts
dense_layer.b = b
dense_layer.verbose = False  # Toggle this on/off as needed
dense_layer.compute_net_in()
dense_layer.compute_net_act()

# Do the backwards pass thru the layer
dprev_net_act, d_wts, d_b = dense_layer.backward_netIn_to_prevLayer_netAct(d_upstream)
print(f'Shapes: d_b {d_b.shape}, d_wts {d_wts.shape}, and dprev_net_act {dprev_net_act.shape}')
print(f'Shapes should be: d_b (10,), d_wts (9, 10), and (5, 9)')

print()
print(f'Your d_b is\n{d_b}')
print()
print(f'Your d_wts is\n{d_wts}')
print()
print(f'Your dprev_net_act is\n{dprev_net_act}')
print()

Shapes: d_b (10,), d_wts (9, 10), and dprev_net_act (5, 9)
Shapes should be: d_b (10,), d_wts (9, 10), and (5, 9)

Your d_b is
[2.939972  1.5288632 1.5934582 2.5569741 3.42369   1.9033922 3.0180793
 2.0253668 3.0044666 3.421824 ]

Your d_wts is
[[1.022262  0.4203976 0.7783139 0.9918202 1.5201265 0.7833104 0.9345904
  0.9171647 1.1861436 1.2538697]
 [1.2864636 0.4407391 0.4169773 0.9978163 1.1788296 0.648836  1.1450934
  0.8800866 1.0098424 1.1541508]
 [1.5680805 0.8982268 1.0854391 1.4559452 1.6867111 1.2276175 1.6481592
  1.138053  1.8404759 1.8155044]
 [1.3733557 0.6643123 0.648199  1.187348  1.7787889 0.8814593 1.6552009
  0.9005457 1.3065905 1.8090585]
 [1.5314267 0.637941  0.4818006 1.1545242 1.4317624 0.6647081 1.1984389
  1.0025069 1.2153267 1.3285968]
 [1.7655459 1.050588  0.9767144 1.506954  1.9827554 1.0557073 1.5921494
  1.1796107 1.8556087 1.9352423]
 [2.3887655 1.3817035 1.2739924 2.0335991 2.9047506 1.3827122 2.2102931
  1.5998073 2.4492552 2.7442209]
 [1.460473  0.390876

**The above gradients should be:**

    Your d_b is
    [2.939972  1.5288632 1.5934582 2.5569741 3.42369   1.9033922 3.0180793
    2.0253668 3.0044666 3.421824 ]

    Your d_wts is
    [[1.022262  0.4203976 0.7783139 0.9918202 1.5201265 0.7833104 0.9345904
      0.9171647 1.1861436 1.2538697]
    [1.2864636 0.4407391 0.4169773 0.9978163 1.1788296 0.648836  1.1450934
      0.8800866 1.0098424 1.1541508]
    [1.5680805 0.8982268 1.0854391 1.4559452 1.6867111 1.2276175 1.6481592
      1.138053  1.8404759 1.8155044]
    [1.3733557 0.6643123 0.648199  1.187348  1.7787889 0.8814593 1.6552009
      0.9005457 1.3065905 1.8090585]
    [1.5314267 0.637941  0.4818006 1.1545242 1.4317624 0.6647081 1.1984389
      1.0025069 1.2153267 1.3285968]
    [1.7655459 1.050588  0.9767144 1.506954  1.9827554 1.0557073 1.5921494
      1.1796107 1.8556087 1.9352423]
    [2.3887655 1.3817035 1.2739924 2.0335991 2.9047506 1.3827122 2.2102931
      1.5998073 2.4492552 2.7442209]
    [1.460473  0.3908762 0.5608205 1.1835016 1.5073256 0.8063068 1.2858772
      1.1173971 1.1810884 1.3509113]
    [1.614364  0.9954862 0.9571274 1.4355557 1.9242129 1.1056847 1.7704159
      1.05157   1.7707872 2.0211374]]

    Your dprev_net_act is
    [[1.9059042 2.256193  2.3887436 1.8283934 2.1172395 1.4954141 1.9496901
      2.9960784 1.6087471]
    [3.6419362 3.2440551 3.5220766 3.1352212 2.8402101 2.2186481 2.3807716
      3.7879065 2.376959 ]
    [2.2950536 2.2463427 2.4182406 2.3614061 2.0504833 1.5159432 1.8678636
      2.8608441 1.8203996]
    [2.6963963 2.618827  3.0134393 2.6907741 2.7390353 1.8948155 2.5595726
      3.9260347 1.959954 ]
    [3.3462921 3.1563749 3.6705977 2.9847695 3.4139881 2.623484  2.8994443
      4.0945974 2.3683561]]



#### 4i. (ii) Test backwards thru Flatten layer

In [117]:
rng = np.random.default_rng(1)

n_inputs = 5
n_hidden = 10
n_chans, img_y, img_x = 1, 3, 3
n_units_prev_layer = n_chans*img_y*img_x

# Define test inputs/net quantities
inputs = rng.random((n_inputs, n_chans, img_y, img_x))  # 5, 1, 3, 3
wts = rng.random((n_units_prev_layer, n_hidden))  # 9, 10
b = rng.random((n_hidden,))  # 10
d_upstream = rng.random((n_inputs, n_hidden))  # 5, 10

# Create layer and fill it with the test values
f_layer = Flatten(7, 'Flatten')
dense_layer = Dense(8, 'Output', units=n_hidden, n_units_prev_layer=n_units_prev_layer)
f_layer.input = inputs
f_layer.compute_net_in()
dense_layer.input = f_layer.net_in
dense_layer.wts = wts
dense_layer.b = b
dense_layer.verbose = False  # Toggle this on/off as needed
dense_layer.compute_net_in()
dense_layer.compute_net_act()

# Do the backwards pass thru the layer
dprev_net_act, _, _ = dense_layer.backward_netIn_to_prevLayer_netAct(d_upstream)
dprev_net_act, _, _ = f_layer.backward_netIn_to_prevLayer_netAct(dprev_net_act)
print(f'Shapes: {dprev_net_act.shape=} and it should be (5, 1, 3, 3).')
print(f'The first chunk of your dprev_net_act looks like:\n{dprev_net_act[0,0]}')
print('and it should look like:')
print('''[[3.0649991 2.6812171 2.3775691]
 [2.7365463 1.5570601 2.853221 ]
 [2.2260891 2.4357624 2.2484772]]''')

Shapes: dprev_net_act.shape=(5, 1, 3, 3) and it should be (5, 1, 3, 3).
The first chunk of your dprev_net_act looks like:
[[3.0649991 2.6812171 2.3775691]
 [2.7365463 1.5570601 2.853221 ]
 [2.2260891 2.4357624 2.2484772]]
and it should look like:
[[3.0649991 2.6812171 2.3775691]
 [2.7365463 1.5570601 2.853221 ]
 [2.2260891 2.4357624 2.2484772]]


#### 4i. (iii) Test backwards thru output (MaxPool2D) layer

In [118]:
rng = np.random.default_rng(0)
n_inputs = 3

# Define test inputs/net quantities
inputs = rng.random((n_inputs, 3, 4, 4))
d_upstream = rng.random((n_inputs, 3, 2, 2))

pool_sz = 2
stride = 2

# Create layer and fill it with the test values
pool_layer = MaxPool2D(1, 'Pool', pool_size=pool_sz, strides=stride)

# Do the forward/backwards pass thru the layer
pool_layer.verbose = False
pool_layer.forward(inputs)
dprev_net_act, _, _ = pool_layer.backward(d_upstream, None)

print(f'Shape: {dprev_net_act.shape} and should be (3, 3, 4, 4).')
print()
print(f'Your d_net_in is\n{dprev_net_act}')

Shape: (3, 3, 4, 4) and should be (3, 3, 4, 4).

Your d_net_in is
[[[[0.        0.        0.        0.       ]
   [0.        0.4065103 0.        0.909959 ]
   [0.        0.0430669 0.8227063 0.       ]
   [0.        0.        0.        0.       ]]

  [[0.415384  0.        0.        0.       ]
   [0.        0.        0.829804  0.       ]
   [0.        0.        0.3650462 0.       ]
   [0.0099546 0.        0.        0.       ]]

  [[0.        0.        0.        0.       ]
   [0.        0.07863   0.6526146 0.       ]
   [0.        0.        0.        0.       ]
   [0.        0.2738491 0.        0.7026521]]]


 [[[0.        0.        0.1268171 0.       ]
   [0.9438014 0.        0.        0.       ]
   [0.        0.8647783 0.        0.       ]
   [0.        0.        0.        0.0594642]]

  [[0.        0.3807705 0.        0.4297741]
   [0.        0.        0.        0.       ]
   [0.        0.        0.        0.       ]
   [0.        0.4888495 0.9764623 0.       ]]

  [[0.7756912 0.      

**The above gradients should be:**
    
    Your d_net_in is
    [[[[0.        0.        0.        0.       ]
      [0.        0.4065103 0.        0.909959 ]
      [0.        0.0430669 0.8227063 0.       ]
      [0.        0.        0.        0.       ]]

      [[0.415384  0.        0.        0.       ]
      [0.        0.        0.829804  0.       ]
      [0.        0.        0.3650462 0.       ]
      [0.0099546 0.        0.        0.       ]]

      [[0.        0.        0.        0.       ]
      [0.        0.07863   0.6526146 0.       ]
      [0.        0.        0.        0.       ]
      [0.        0.2738491 0.        0.7026521]]]


    [[[0.        0.        0.1268171 0.       ]
      [0.9438014 0.        0.        0.       ]
      [0.        0.8647783 0.        0.       ]
      [0.        0.        0.        0.0594642]]

      [[0.        0.3807705 0.        0.4297741]
      [0.        0.        0.        0.       ]
      [0.        0.        0.        0.       ]
      [0.        0.4888495 0.9764623 0.       ]]

      [[0.7756912 0.        0.        0.       ]
      [0.        0.        0.        0.3088574]
      [0.        0.        0.        0.       ]
      [0.        0.2698368 0.8631202 0.       ]]]


    [[[0.        0.8813072 0.        0.       ]
      [0.        0.        0.        0.5107065]
      [0.        0.        0.        0.9949173]
      [0.        0.3442957 0.        0.       ]]

      [[0.        0.        0.        0.       ]
      [0.3159435 0.        0.1827124 0.       ]
      [0.        0.8800981 0.8123354 0.       ]
      [0.        0.        0.        0.       ]]

      [[0.        0.        0.9584136 0.       ]
      [0.6678894 0.        0.        0.       ]
      [0.9257146 0.        0.7482485 0.       ]
      [0.        0.        0.        0.       ]]]]

#### 4i. (iv) Test network full backwards pass

(phew)

In [119]:
rng = np.random.default_rng(10)
n_inputs = 2
X = rng.standard_normal((n_inputs, 3, 32, 32))
y = rng.integers(10, size=n_inputs)

# Do forwards and backwards pass thru network
net = ConvNet4(wt_scale=5e-2, r_seed=1)
loss = net.forward(X, y)
net.backward(y)

# Check various gradients in each layer
print('Output layer')
print('------------------------------------')
print(f'd_wts (1st chunk):\n{net.layers[-1].d_wts[0]}\n')
print(f'd_b (all):\n{net.layers[-1].d_b}\n')
print('------------------------------------')
print('Dense hidden layer')
print('------------------------------------')
print(f'd_wts (1st chunk):\n{net.layers[-2].d_wts[0]}\n')
print(f'd_b (all):\n{net.layers[-2].d_b}\n')
print('------------------------------------')
print('Conv2D layer')
print('------------------------------------')
print(f'd_wts (1st chunk):\n{net.layers[0].d_wts[0,0]}\n')
print(f'd_b (all):\n{net.layers[0].d_b}\n')
print('------------------------------------')

Output layer
------------------------------------
d_wts (1st chunk):
[ 0.3165798  0.1782949  0.0721525 -0.2767754  0.2875673  0.0554693
  0.0200577  0.0450502 -0.7083062  0.0099099]

d_b (all):
[ 0.2362607  0.1407872  0.0563604 -0.2678708  0.2053987  0.042177
  0.0159275  0.0332441 -0.4694663  0.0071815]

------------------------------------
Dense hidden layer
------------------------------------
d_wts (1st chunk):
[-0.0237098  0.0185573  0.        -0.0245772  0.0008611 -0.0354649
  0.0096263  0.012202   0.         0.0097156  0.         0.
  0.        -0.0008257 -0.0188472  0.0123615  0.         0.
  0.0160304  0.         0.0142881  0.0121159  0.0028966  0.012659
 -0.0051318 -0.0295967  0.0333496 -0.0047008 -0.0016876  0.0080984
  0.0090566  0.0450042  0.0347394  0.         0.0488393  0.0005306
  0.0522901  0.        -0.0278283  0.         0.0070284  0.0271142
  0.001595   0.        -0.0020228  0.        -0.        -0.0195113
  0.        -0.         0.0087875  0.         0.         0.0

**Above output should be:**

    Output layer
    ------------------------------------
    d_wts (1st chunk):
    [ 0.3165798  0.1782949  0.0721525 -0.2767754  0.2875673  0.0554693
      0.0200577  0.0450502 -0.7083062  0.0099099]

    d_b (all):
    [ 0.2362607  0.1407872  0.0563604 -0.2678708  0.2053987  0.042177
      0.0159275  0.0332441 -0.4694663  0.0071815]

    ------------------------------------
    Dense hidden layer
    ------------------------------------
    d_wts (1st chunk):
    [-0.0237098  0.0185573  0.        -0.0245772  0.0008611 -0.0354649
      0.0096263  0.012202   0.         0.0097156  0.         0.
      0.        -0.0008257 -0.0188472  0.0123615  0.         0.
      0.0160304  0.         0.0142881  0.0121159  0.0028966  0.012659
    -0.0051318 -0.0295967  0.0333496 -0.0047008 -0.0016876  0.0080984
      0.0090566  0.0450042  0.0347394  0.         0.0488393  0.0005306
      0.0522901  0.        -0.0278283  0.         0.0070284  0.0271142
      0.001595   0.        -0.0020228  0.         0.        -0.0195113
      0.         0.         0.0087875  0.         0.         0.0060993
    -0.0313341  0.         0.0038625  0.        -0.0264695 -0.0275798
      0.0271583  0.0085617  0.         0.        -0.0320055 -0.0290514
      0.0122436  0.         0.0098315  0.0100542  0.         0.0120326
      0.        -0.0080974 -0.0155733  0.         0.         0.0369122
      0.         0.         0.        -0.0002869  0.0199651 -0.0123693
      0.0068985  0.0059383  0.        -0.0012645  0.         0.
      0.0103865  0.0187316  0.0203199 -0.0282206  0.0250551  0.
      0.0097692  0.0065689  0.        -0.0128037]

    d_b (all):
    [-0.0319617  0.0229079  0.        -0.0315111  0.0004665 -0.0400804
      0.0140993  0.0178718  0.         0.0157856  0.         0.
      0.        -0.0005321 -0.0276048  0.0158184  0.         0.
      0.0234793  0.         0.0176377  0.0177458  0.0035757  0.0185412
    -0.0075163 -0.0368606  0.0399352 -0.0107246 -0.0027797  0.0118614
      0.0132649  0.0576509  0.0452603  0.         0.0664601  0.000655
      0.064549   0.        -0.0343524  0.         0.0044218  0.036849
      0.0037971  0.        -0.002497   0.         0.        -0.0295301
      0.         0.         0.0193303  0.         0.         0.0063513
    -0.03868    0.         0.0056573  0.        -0.0322584 -0.0344825
      0.0357836  0.0111612  0.         0.        -0.0468774 -0.0414528
      0.0186779  0.         0.0089871  0.009994   0.         0.0176237
      0.        -0.01186   -0.0197204  0.         0.         0.0455244
      0.         0.         0.         0.0001654  0.0292423 -0.0181169
      0.0097532  0.0073305  0.         0.0012459  0.         0.
      0.0083673  0.0242284  0.0213467 -0.036437   0.0292053  0.
      0.0104388  0.0050429  0.        -0.0172435]

    ------------------------------------
    Conv2D layer
    ------------------------------------
    d_wts (1st chunk):
    [[-0.182792   0.1736235  0.0098438  0.0037222 -0.1048032  0.1143987
      -0.0142714]
    [-0.1819231 -0.402189   0.0479303 -0.0156558  0.0550579  0.2623792
      0.1230332]
    [ 0.0897908  0.1647721 -0.1076144 -0.2968163 -0.0691064  0.1027714
      0.2315856]
    [ 0.0311     0.1674375  0.0197103 -0.0572587 -0.5249524 -0.2260723
      -0.0682909]
    [ 0.0682124  0.1102258 -0.045091  -0.1584036 -0.103004  -0.3712469
      -0.1213601]
    [ 0.1629225 -0.1253451 -0.0223373 -0.1040976  0.0881485  0.1211755
      -0.0062407]
    [-0.2631801 -0.0169998  0.1073822 -0.4687949 -0.0498018  0.1164087
      -0.0831877]]

    d_b (all):
    [ 0.1466253  0.1857728  0.0200845 -0.0353978  0.0724815 -0.2458079
      0.2931988  0.4281541 -0.2727701  0.102905   0.0486023  0.0725082
    -0.1016835 -0.0020042  0.3569887 -0.0063515  0.1889579  0.1298562
    -0.0415208 -0.2987216 -0.1146741  0.0079273 -0.0425284 -0.0282425
    -0.2826566 -0.1741347 -0.1274244 -0.0637546 -0.2428629 -0.0945485
      0.0168975  0.228136 ]

    ------------------------------------

### 4j. Dropout layer: forward pass

There are no Dropout layers in your ConvNet4 neural network, but you will make a version of the net with dropout next week.

To prepare for that, implement the `Dropout` layer class, starting with methods involved in the forward pass:
- constructor
- `set_mode(self, in_train_mode)`
- `compute_net_in(self)`

Test out your implementation below.

In [120]:
rng = np.random.default_rng(1)
inputs = rng.normal(loc=0, scale=0.25, size=(5, 3))

d_layer = Dropout(3, 'Drop', rate=0.8, r_seed=2)
print('Test: 1/2 (prediction):')
drop_netact = d_layer.forward(inputs.copy())
print('Your dropout netact:\n', drop_netact)
print('It should be:')
print(''' [[ 0.086396   0.2054045  0.0826093]
 [-0.3257893  0.226339   0.1115936]
 [-0.1342383  0.1452795  0.0911431]
 [ 0.0735331  0.0071056  0.1366782]
 [-0.1841135 -0.0407275 -0.1205298]]''')

Test: 1/2 (prediction):
Your dropout netact:
 [[ 0.086396   0.2054045  0.0826093]
 [-0.3257893  0.226339   0.1115936]
 [-0.1342383  0.1452795  0.0911431]
 [ 0.0735331  0.0071056  0.1366782]
 [-0.1841135 -0.0407275 -0.1205298]]
It should be:
 [[ 0.086396   0.2054045  0.0826093]
 [-0.3257893  0.226339   0.1115936]
 [-0.1342383  0.1452795  0.0911431]
 [ 0.0735331  0.0071056  0.1366782]
 [-0.1841135 -0.0407275 -0.1205298]]


In [121]:

print('Test: 2/2 (training):')
d_layer = Dropout(3, 'Drop', rate=0.8, r_seed=2)
d_layer.set_mode(True)
drop_netact = d_layer.forward(inputs.copy())
d_layer.set_mode(False)
print('Your dropout netact:\n', drop_netact)
print()
print('Depending on your implementation, your net_act above should only match ONE of the following options:')

print('OPTION 1:')
print('''[[ 0.         0.         0.       ]
 [-1.6289465  0.         0.       ]
 [-0.6711915  0.7263976  0.       ]
 [ 0.         0.         0.6833912]
 [-0.        -0.        -0.       ]]''')

print('\nOPTION 2:')
print(''' [[ 0.         0.         0.4130463]
 [-0.         0.         0.       ]
 [-0.         0.         0.       ]
 [ 0.         0.         0.       ]
 [-0.        -0.        -0.       ]]''')


Test: 2/2 (training):
Your dropout netact:
 [[ 0.         0.         0.4130463]
 [-0.         0.         0.       ]
 [-0.         0.         0.       ]
 [ 0.         0.         0.       ]
 [-0.        -0.        -0.       ]]

Depending on your implementation, your net_act above should only match ONE of the following options:
OPTION 1:
[[ 0.         0.         0.       ]
 [-1.6289465  0.         0.       ]
 [-0.6711915  0.7263976  0.       ]
 [ 0.         0.         0.6833912]
 [-0.        -0.        -0.       ]]

OPTION 2:
 [[ 0.         0.         0.4130463]
 [-0.         0.         0.       ]
 [-0.         0.         0.       ]
 [ 0.         0.         0.       ]
 [-0.        -0.        -0.       ]]


### 4k. Dropout layer: backward pass

`backward_netIn_to_prevLayer_netAct(self, d_upstream)`

In [122]:
rng = np.random.default_rng(1)
inputs = rng.normal(loc=0, scale=0.25, size=(5, 3))
d_upstream = rng.uniform(low=0, high=0.5, size=(5, 3))

# Test 1
d_layer = Dropout(3, 'Drop', rate=0.8, r_seed=2)
d_layer.set_mode(True)
drop_netact = d_layer.forward(inputs)
d_prev, _, _ = d_layer.backward_netIn_to_prevLayer_netAct(d_upstream)
d_layer.set_mode(False)
print('Your dropout d_prev:\n', d_prev)
print()
print('Depending on your implementation, your net_act above should only match ONE of the following pairs:')

print('OPTION 1:')
print(''' [[0.        0.        0.       ]
 [0.5086381 0.        0.       ]
 [0.7010219 1.2129774 0.       ]
 [0.        0.        1.3530671]
 [0.        0.        0.       ]]''')

print('\nOPTION 2:')
print(''' [[0.        0.        1.0077825]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]]''')

Your dropout d_prev:
 [[0.        0.        1.0077825]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]]

Depending on your implementation, your net_act above should only match ONE of the following pairs:
OPTION 1:
 [[0.        0.        0.       ]
 [0.5086381 0.        0.       ]
 [0.7010219 1.2129774 0.       ]
 [0.        0.        1.3530671]
 [0.        0.        0.       ]]

OPTION 2:
 [[0.        0.        1.0077825]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]
 [0.        0.        0.       ]]
