# Activation maximization

This notebook will guide you through the use of a widely used technique for studying the behaviour of convolutional neural networks. You will use a python package called `keras-vis` to apply a technique called `activation maximization`:

The idea behind activation maximization is simple: generate an input image that maximizes the output activations of a given unit (filter) in the network.

The `keras-vis` package computes the derivative of the ActivationMaximization loss with respect to the input, and uses this gradient to update the input image. ActivationMaximization loss simply outputs small values for large filter activations (the package minimizes losses during gradient descent iterations). This allows us to understand what sort of input patterns activate a particular filter. For example, for detecting the digit one (1) there could be a filter that activates for the presence of vertical lines within the input image.

For the experiments, you are going to use the `mnist` dataset from LeCun et al. 1998.

------------------------------------------------
# First part: Creating a model

## Loading the packages

In [1]:
import sys
sys.setrecursionlimit(10000)

In [2]:
import tensorflow as tf

tf.compat.v1.disable_eager_execution()

2022-12-06 09:13:53.425822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-06 09:13:53.556882: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-06 09:13:53.556900: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-06 09:13:53.581969: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-12-06 09:13:54.365167: W tensorflow/stream_executor/platform/de

In [3]:
!pip install tensorflow==2.7 keras==2.7
!pip install -I scipy==1.2.*

Defaulting to user installation because normal site-packages is not writeable
Collecting tensorflow==2.7
  Downloading tensorflow-2.7.0-cp38-cp38-manylinux2010_x86_64.whl (489.6 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m489.6/489.6 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting keras==2.7
  Downloading keras-2.7.0-py2.py3-none-any.whl (1.3 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m0:01[0m
[?25hCollecting wheel<1.0,>=0.32.0
  Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
Collecting tensorflow-estimator<2.8,~=2.7.0rc0
  Downloading tensorflow_estimator-2.7.0-py2.py3-none-any.whl (463 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m463.1/463.1 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
Collecting requests-oauthlib>=0.7.0
  Downloading requests_o

In [None]:
import numpy as np
from matplotlib import pyplot as pl

from keras.datasets import mnist
from keras.models import Model
from keras.layers.core import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import RMSprop
from keras.utils import np_utils
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Input
from sklearn import metrics as me
from scipy import stats
from tensorflow.keras import models
from tensorflow.keras import layers

%matplotlib inline

## Loading the data

Load the `mnist` dataset and normalize in the range [0, 1]

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

n_train, height, width = X_train.shape
n_test, _, _ = X_test.shape

X_train = X_train.reshape(n_train, height, width, 1).astype('float32')
X_test = X_test.reshape(n_test, height, width, 1).astype('float32')

X_train /= 255.0
X_test /= 255.0

n_classes = 10

print(n_train, 'train samples')
print(n_test, 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, n_classes)
Y_test = np_utils.to_categorical(y_test, n_classes)

## Creating the network

Create the CNN and show its architecture

In [None]:
l0 = Input(shape=(height, width, 1), name='l0')

l1 = Conv2D(9, (5, 5), padding='same', activation='relu', name='l1')(l0)
l1_mp = MaxPooling2D(pool_size=(2, 2), name='l1_mp')(l1)

l2 = Conv2D(9, (5, 5), padding='same', activation='relu', name='l2')(l1_mp)
l2_mp = MaxPooling2D(pool_size=(2, 2), name='l2_mp')(l2)

l3 = Conv2D(16, (3, 3), padding='same', activation='relu', name='l3')(l2_mp)
l3_mp = MaxPooling2D(pool_size=(2, 2), name='l3_mp')(l3)

flat = Flatten(name='flat')(l3_mp)

l4 = Dense(25, activation='relu', name='l4')(flat)

l5 = Dense(n_classes, activation='softmax', name='l5')(l4)

model = Model(inputs=l0, outputs=l5)
model.summary()

## Training the network

Define some constants and train de CNN

In [None]:
batch_size = 128
n_epoch = 10

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
history = model.fit(X_train, Y_train, batch_size=batch_size, epochs=n_epoch, verbose=1, validation_data=(X_test, Y_test))

Show the performance of the model

In [None]:
pl.plot(history.history['loss'], label='Training')
pl.plot(history.history['val_loss'], label='Testing')
pl.legend()
pl.grid()

score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Confusion matrix

In [None]:
pred = model.predict_on_batch(X_test)
pred = np.argmax(pred, axis=-1)
me.confusion_matrix(y_test, pred)

-------------------------
# Second part: maximizing activations

## Loading the packages

In [None]:
!pip install git+https://github.com/SimWalther/keras-vis.git -U

In [None]:
from __future__ import print_function
from vis.visualization import visualize_activation
from vis.utils import utils
from keras import activations


In [None]:
from vis.utils import utils
from vis.visualization import visualize_cam, overlay

## Activation maximization keeping the softmax activation at the output
Activation maximization does not work if the activation function is a Softmax. Let us see this behaviour

In [None]:
# select the last layer
layer_idx = utils.find_layer_idx(model, 'l5')

In [None]:
pl.figure(figsize=(12,10))
for output_idx in np.arange(10):
    img = visualize_activation(model, layer_idx, filter_indices=int(output_idx), input_range=(0.0, 1.0))
    pl.subplot(3, 4, output_idx+1)
    pl.title('Maximization of output {}'.format(output_idx))
    pl.imshow(img[..., 0])
pl.tight_layout()

It does not work! The reason is that when using Softmax as activation function, maximizing an output node can be done by minimizing other outputs. It is the only activation that depends on other node output(s) in the layer.

## Activation maximization without the softmax activation at the output

The following cell replaces the Softmax activation function by a Linear activation function

In [None]:
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)

Visualize the image that maximizes the first output (0) of the network

In [None]:
#img = visualize_activation(model, layer_idx, filter_indices=0, input_range=(0.0, 1.0), tv_weight=0)
#pl.imshow(img[..., 0])

# ADDED
tv_weight=[0.125, 0.25, 0.5, 1, 2, 4, 8, 16]
pl.figure(figsize=(12,10))
for i in range(8):
    img = visualize_activation(model, layer_idx, filter_indices=0, input_range=(0.0, 1.0), tv_weight=tv_weight[i])
    pl.subplot(2, 4, i+1)
    pl.title('tv_weight={}'.format(tv_weight[i]))
    pl.imshow(img[..., 0])
pl.tight_layout()

In [None]:
# CELL ADDED

best_tv_weight = 0.5 # seems to be the best tv_weight

pl.figure(figsize=(12,10))
for output_idx in np.arange(10):
    img = visualize_activation(model, layer_idx, filter_indices=int(output_idx), input_range=(0.0, 1.0), tv_weight=best_tv_weight)
    pl.subplot(3, 4, output_idx+1)
    pl.title('Maximization of output {}'.format(output_idx))
    pl.imshow(img[..., 0])
pl.tight_layout()

The last result was found without using the `tv_weight` regularizer (tv_weight=0.0). However, we know that using the parameter tv_weight makes the generated image more realistic.

## Questions

<div class="alert alert-block alert-info">
<ul>
    <li>Test different values of `tv_weight`</li>
    <ul>
        <li>Try values between 0.1 and 20 (for example: [0.125, 0.25, 0.5, 1, 2, 4, 8, 16])</li>
        <li>Select the regularization parameter that gives the best images (more realistic)</li>
        <li>Show the images that maximize each one of the outputs of the network</li>
    </ul>
    <li>Maximize two outputs at the same time (filter_indices=[f1, f2])</li>
    <ul>
        <li>Try two classes with similar shape like 1 and 7 or 4 and 9</li>
        <li>Try two classes with very different shapes like 0 and 1 or 7 and 8</li>
        <li>How activation maximization can be useful for understanding a deep neural network? Explain</li>
    </ul>
</ul>
</div>