# Module 7: Network damage

In the example, we will try different ways to damage PMSP model.

Damage API (e.g., `model.cut_connection`) is only available in the models imported from `connectionist.models`.

The implementation details in this module is out of the scope of this course. Advanced users can refer to the source code for details.


1. Remove connections between any two layers: `model.cut_connections()`

2. Shrink the number of neurons in a layer: `model.shrink_layer()`

3. Add noise to a layer: `model.add_noise()`
    
4. Apply pressure to keep weights small: `model.apply_l2()`




In [None]:
!pip install connectionist

In [None]:
import tensorflow as tf
import numpy as np
from connectionist.data import ToyOP
from connectionist.models import PMSP

## Data

Identical to last module.

- Input: word representations, fixed across time. 
- Output: letter representations, changing across time.

In [None]:
data = ToyOP()
letters, words = data.letters, data.words
x_train, y_train = data.x_train, data.y_train

## Model creation and training

In [None]:
model = PMSP(tau=0.1, h_units=10, p_units=9, c_units=5)

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.05), 
    loss=tf.keras.losses.BinaryCrossentropy()
)

model.fit(x_train, y_train, epochs=100, batch_size=20)

## Model predictions

In [None]:
def decode_prediction(y, idx=None):
    """Decode the prediction."""

    decoded = ''.join([letters[np.argmax(v)] for v in y])
    return decoded

y_pred = model(x_train)

for i, word in enumerate(words):
    print(f"word: {word}; pred: {decode_prediction(y_pred['phonology'][i])}")

In [None]:
y_pred.keys()

## Damage I: Remove a connection (weights) between two layers

Implementation summary:

1. Remove the specified cut `connections` from the original `model.connections` -> `new_connections`
2. Rebuild empty new model with new `new_connections`
3. Transplant all weights and biases

In [None]:
# API interface draft
# Assignment to a new variable is required (due to scoping issues)

# Design choice: for simplicity I use singular weight here . User can use a simple for loop to cut multiple connections.
new_model = model.cut_connections(connections=['cp'])

You can see the trainable weights in the model includes `cp`, but the new one does not.

In [None]:
[w.name for w in model.trainable_weights]

In [None]:
[w.name for w in new_model.trainable_weights]

## Damage II: Zero out some units in a weight matrix

Steps:
1. Locate weight matrix
2. Create elementwise un-trainable mask according to the rate: 

    $p(v = 0) = rate$, $p(v = 1) = 1 - rate$, $rate \in [0, 1)$ where 
    $v$ is the value of the mask, and $rate$ is the rate of zeroing out. 

3. Zero out the actual weights by the mask (to get-rid of initialized random values):

    $w = w \times mask$

4. Mask the original weights in forward pass, i.e.: 

    $w_{masked} = w \times mask$


With this implementation, backward-pass will return zero gradient in the masked unit (due to step 4). More specifically, during back propagation, the gradient is calculated as:

$d_w = mask \times d_{w_{masked}}$

$mask = 0 \rightarrow d_w = 0$

No training signal can back-propagate through the mask.

In [None]:
new_model = model.zero_out(rates={'pp': 0.2})

We can see all the masks by calling:

In [None]:
[w.name for w in new_model.weights if 'zero_out_mask' in w.name]

Most of them are inclusive mask, with only values of 1.0, for example:

In [None]:
new_model.pmsp.cell.oh.zero_out_mask

More importantly, `pp` mask is the one we had set to zero out 20% of the weights:

In [None]:
new_model.pmsp.cell.pp.zero_out_mask

In [None]:
print(f"Actual zeroed-out rate = {1-tf.reduce_mean(new_model.pmsp.cell.pp.zero_out_mask).numpy():.03f}")

Also, the weights are zeroed out at where the mask is zero:

In [None]:
new_model.pmsp.cell.pp.kernel  # kernel means weights in KERAS terminology

To show the zeroed out weights are behaving like un-trainable weights, we can try to train the new model further:

In [None]:
new_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.05), 
    loss=tf.keras.losses.BinaryCrossentropy()
)
new_model.fit(x_train, y_train, epochs=100, batch_size=20, verbose=0)

They are indeed un-trainable.

In [None]:
new_model.pmsp.cell.pp.kernel

## Damage III: Remove a portion of neurons in a layer

Implementation summary:

1. Calculate the new number of units: `n = int((1-rate) * original units)`
1. Random sample `n` from range(n) to get a list of index `i`
1. Depending on the shrink_layer's `layer` argument, look for all incoming and outgoing connections (weights and biases), subset the axis that link to the target layer by `i`
1. Create new model and transplant 

In [None]:
new_model = model.shrink_layer(layer='hidden', rate=0.3)  # shrinking hidden layer by 30%

## Damage IV: Add noise to a layer

Notes:
- Noise apply during training and inference
- Noise is **added** instead of replacing to old noise. e.g., orignal p_noise = 0.1, running add_noise to phonology with 0.1 stddev will cause the new model to have p_noise = 0.2

In [None]:
new_model = model.add_noise(layer='phonology', stddev=0.1)

Manually locating phonology noise layer and test with some ones:

In [None]:
new_model.pmsp.cell.noise['phonology'](tf.ones((3,3)), training=True)

Note that inside `PMSPCell.call()`, the noise layer is locked to `training=True`, where noise is added in both training and inference.

## Damage V: Apply pressure to keep the weights small

Weight decay and L2 regularization. 

Useful details: [blog post about weight decay vs. L2 regularization](https://towardsdatascience.com/weight-decay-l2-regularization-90a9e17713cd)

Worth mentioning: Why lesioning sounds bad but regularization sounds good? Perhaps, things don't kill you makes you stronger (more fault tolerant).


In [None]:
new_model = model.apply_l2(l2=0.3)

Train the new model to see the impact of adding L2

In [None]:
new_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.05), 
    loss=tf.keras.losses.BinaryCrossentropy()
)
new_model.fit(x_train, y_train, epochs=20, batch_size=20)

Let's check if the weights' magnitude are decreased

In [None]:
[f"{w.name}: {tf.reduce_sum(w)}" for w in model.trainable_weights]

In [None]:
[f"{w.name}: {tf.reduce_sum(w)}" for w in new_model.trainable_weights]

Explaining the details using a custom training loop:


```python
# Define train step with l2
def train_step(self, x, y, l2=0.0):
    with tf.GradientTape() as tape:
        y_pred = self(x, training=True)
        loss = self.loss(y, y_pred)
        loss += l2 * tf.reduce_sum(tf.square(self.trainable_weights))  # Add L2 regularization manually
    grads = tape.gradient(loss, self.trainable_weights)
    self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
    return loss

# Train
for _ in range(epochs):
    train_step(x_train, y_train)

```