# Activation Maximization on VGGNet

## Dense Layer Visualizations

To visualize activation over final dense layer outputs, we need to switch the `softmax` activation out for `linear` since gradient of output node will depend on all the other node activations. Doing this in keras is tricky, so we provide `utils.apply_modifications` to modify network parameters and rebuild the graph.

If this swapping is not done, the results might be suboptimal. We will start by swapping out 'softmax' for 'linear' and compare what happens if we dont do this at the end.

In [8]:
from keras.applications import VGG16
from vis.utils import utils
from keras import activations

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)

# Utility to search for layer index by name. 
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'predictions')

# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)

### Visualizing a specific output category

Lets try visualizing a specific output category. We will pick `ouzel` which corresponds to imagenet category `20`

In [9]:
from vis.visualization import visualize_activation

from matplotlib import pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (18, 6)

# 20 is the imagenet category for 'ouzel'
img = visualize_activation(model, layer_idx, filter_indices=20)
plt.imshow(img)

<matplotlib.image.AxesImage at 0x7f54187c57d0>

<matplotlib.figure.Figure at 0x7f546866b3d0>

Hmm, that sort of looks like a bird. Lets see if we can get better results with more iterations. This time, lets see the verbose output during the optimization process.

In [10]:
# 20 is the imagenet category for 'ouzel'
img = visualize_activation(model, layer_idx, filter_indices=20, max_iter=500, verbose=True)
plt.imshow(img)

Iteration: 1, named_losses: [('ActivationMax Loss', -0.23652837),
 ('L-6.0 Norm Loss', 0.063257359),
 ('TV(2.0) Loss', 6454.2188)], overall loss: 6454.04541016
Iteration: 2, named_losses: [('ActivationMax Loss', -0.38862073),
 ('L-6.0 Norm Loss', 0.062633291),
 ('TV(2.0) Loss', 3182.8979)], overall loss: 3182.57202148
Iteration: 3, named_losses: [('ActivationMax Loss', -0.80792999),
 ('L-6.0 Norm Loss', 0.062285963),
 ('TV(2.0) Loss', 1660.4106)], overall loss: 1659.66503906
Iteration: 4, named_losses: [('ActivationMax Loss', -1.3529515),
 ('L-6.0 Norm Loss', 0.062074915),
 ('TV(2.0) Loss', 848.22705)], overall loss: 846.936157227
Iteration: 5, named_losses: [('ActivationMax Loss', -1.9309249),
 ('L-6.0 Norm Loss', 0.061946813),
 ('TV(2.0) Loss', 414.09256)], overall loss: 412.223571777
Iteration: 6, named_losses: [('ActivationMax Loss', -2.6263728),
 ('L-6.0 Norm Loss', 0.0618737),
 ('TV(2.0) Loss', 215.12886)], overall loss: 212.564361572
Iteration: 7, named_losses: [('ActivationMax 

Iteration: 53, named_losses: [('ActivationMax Loss', -44.086586),
 ('L-6.0 Norm Loss', 0.061855465),
 ('TV(2.0) Loss', 29.958473)], overall loss: -14.0662574768
Iteration: 54, named_losses: [('ActivationMax Loss', -46.265251),
 ('L-6.0 Norm Loss', 0.061856702),
 ('TV(2.0) Loss', 30.644764)], overall loss: -15.558631897
Iteration: 55, named_losses: [('ActivationMax Loss', -46.109085),
 ('L-6.0 Norm Loss', 0.061856948),
 ('TV(2.0) Loss', 32.343224)], overall loss: -13.7040061951
Iteration: 56, named_losses: [('ActivationMax Loss', -46.154011),
 ('L-6.0 Norm Loss', 0.061859209),
 ('TV(2.0) Loss', 29.866631)], overall loss: -16.2255210876
Iteration: 57, named_losses: [('ActivationMax Loss', -48.1483),
 ('L-6.0 Norm Loss', 0.061861962),
 ('TV(2.0) Loss', 31.805061)], overall loss: -16.281375885
Iteration: 58, named_losses: [('ActivationMax Loss', -47.153053),
 ('L-6.0 Norm Loss', 0.061863355),
 ('TV(2.0) Loss', 32.083012)], overall loss: -15.0081787109
Iteration: 59, named_losses: [('Activa

Iteration: 109, named_losses: [('ActivationMax Loss', -67.502937),
 ('L-6.0 Norm Loss', 0.061970394),
 ('TV(2.0) Loss', 40.945988)], overall loss: -26.4949760437
Iteration: 110, named_losses: [('ActivationMax Loss', -66.85611),
 ('L-6.0 Norm Loss', 0.061974101),
 ('TV(2.0) Loss', 39.716278)], overall loss: -27.0778579712
Iteration: 111, named_losses: [('ActivationMax Loss', -68.598862),
 ('L-6.0 Norm Loss', 0.061975479),
 ('TV(2.0) Loss', 41.146049)], overall loss: -27.390838623
Iteration: 112, named_losses: [('ActivationMax Loss', -70.261124),
 ('L-6.0 Norm Loss', 0.061977986),
 ('TV(2.0) Loss', 39.597488)], overall loss: -30.6016540527
Iteration: 113, named_losses: [('ActivationMax Loss', -66.714874),
 ('L-6.0 Norm Loss', 0.061979216),
 ('TV(2.0) Loss', 41.231609)], overall loss: -25.4212837219
Iteration: 114, named_losses: [('ActivationMax Loss', -70.259338),
 ('L-6.0 Norm Loss', 0.061982453),
 ('TV(2.0) Loss', 40.249741)], overall loss: -29.9476165771
Iteration: 115, named_losses: 

Iteration: 165, named_losses: [('ActivationMax Loss', -85.390877),
 ('L-6.0 Norm Loss', 0.062107254),
 ('TV(2.0) Loss', 48.512482)], overall loss: -36.8162841797
Iteration: 166, named_losses: [('ActivationMax Loss', -89.280769),
 ('L-6.0 Norm Loss', 0.062109586),
 ('TV(2.0) Loss', 47.733635)], overall loss: -41.4850234985
Iteration: 167, named_losses: [('ActivationMax Loss', -86.773773),
 ('L-6.0 Norm Loss', 0.062111802),
 ('TV(2.0) Loss', 48.405296)], overall loss: -38.3063659668
Iteration: 168, named_losses: [('ActivationMax Loss', -89.266304),
 ('L-6.0 Norm Loss', 0.062114853),
 ('TV(2.0) Loss', 47.790852)], overall loss: -41.4133338928
Iteration: 169, named_losses: [('ActivationMax Loss', -85.502991),
 ('L-6.0 Norm Loss', 0.062117644),
 ('TV(2.0) Loss', 49.425331)], overall loss: -36.0155410767
Iteration: 170, named_losses: [('ActivationMax Loss', -85.752541),
 ('L-6.0 Norm Loss', 0.062120721),
 ('TV(2.0) Loss', 48.492397)], overall loss: -37.1980247498
Iteration: 171, named_losses

Iteration: 221, named_losses: [('ActivationMax Loss', -99.615746),
 ('L-6.0 Norm Loss', 0.062250301),
 ('TV(2.0) Loss', 54.42424)], overall loss: -45.1292572021
Iteration: 222, named_losses: [('ActivationMax Loss', -103.13307),
 ('L-6.0 Norm Loss', 0.062251575),
 ('TV(2.0) Loss', 54.386383)], overall loss: -48.6844406128
Iteration: 223, named_losses: [('ActivationMax Loss', -96.229507),
 ('L-6.0 Norm Loss', 0.062256515),
 ('TV(2.0) Loss', 56.048912)], overall loss: -40.1183395386
Iteration: 224, named_losses: [('ActivationMax Loss', -104.89292),
 ('L-6.0 Norm Loss', 0.062258039),
 ('TV(2.0) Loss', 55.30854)], overall loss: -49.5221252441
Iteration: 225, named_losses: [('ActivationMax Loss', -97.592621),
 ('L-6.0 Norm Loss', 0.062260509),
 ('TV(2.0) Loss', 55.895943)], overall loss: -41.6344146729
Iteration: 226, named_losses: [('ActivationMax Loss', -107.39828),
 ('L-6.0 Norm Loss', 0.062263034),
 ('TV(2.0) Loss', 55.099182)], overall loss: -52.2368392944
Iteration: 227, named_losses: 

Iteration: 277, named_losses: [('ActivationMax Loss', -113.67271),
 ('L-6.0 Norm Loss', 0.062397864),
 ('TV(2.0) Loss', 60.281197)], overall loss: -53.3291168213
Iteration: 278, named_losses: [('ActivationMax Loss', -113.04047),
 ('L-6.0 Norm Loss', 0.062401105),
 ('TV(2.0) Loss', 59.962502)], overall loss: -53.0155715942
Iteration: 279, named_losses: [('ActivationMax Loss', -110.9659),
 ('L-6.0 Norm Loss', 0.062404241),
 ('TV(2.0) Loss', 61.521461)], overall loss: -49.3820343018
Iteration: 280, named_losses: [('ActivationMax Loss', -115.2794),
 ('L-6.0 Norm Loss', 0.062406976),
 ('TV(2.0) Loss', 59.865814)], overall loss: -55.3511810303
Iteration: 281, named_losses: [('ActivationMax Loss', -113.83675),
 ('L-6.0 Norm Loss', 0.062409297),
 ('TV(2.0) Loss', 61.184658)], overall loss: -52.5896873474
Iteration: 282, named_losses: [('ActivationMax Loss', -113.69312),
 ('L-6.0 Norm Loss', 0.062412113),
 ('TV(2.0) Loss', 61.390404)], overall loss: -52.2403030396
Iteration: 283, named_losses: 

Iteration: 334, named_losses: [('ActivationMax Loss', -128.23158),
 ('L-6.0 Norm Loss', 0.062553898),
 ('TV(2.0) Loss', 66.244347)], overall loss: -61.9246749878
Iteration: 335, named_losses: [('ActivationMax Loss', -126.21046),
 ('L-6.0 Norm Loss', 0.062556483),
 ('TV(2.0) Loss', 68.55365)], overall loss: -57.59425354
Iteration: 336, named_losses: [('ActivationMax Loss', -131.94002),
 ('L-6.0 Norm Loss', 0.062559016),
 ('TV(2.0) Loss', 65.856522)], overall loss: -66.0209350586
Iteration: 337, named_losses: [('ActivationMax Loss', -121.91496),
 ('L-6.0 Norm Loss', 0.062561005),
 ('TV(2.0) Loss', 68.109398)], overall loss: -53.7430038452
Iteration: 338, named_losses: [('ActivationMax Loss', -127.00276),
 ('L-6.0 Norm Loss', 0.062565938),
 ('TV(2.0) Loss', 66.57309)], overall loss: -60.3671035767
Iteration: 339, named_losses: [('ActivationMax Loss', -128.661),
 ('L-6.0 Norm Loss', 0.062566891),
 ('TV(2.0) Loss', 68.095688)], overall loss: -60.502746582
Iteration: 340, named_losses: [('Ac

Iteration: 387, named_losses: [('ActivationMax Loss', -139.22398),
 ('L-6.0 Norm Loss', 0.062704168),
 ('TV(2.0) Loss', 74.081116)], overall loss: -65.0801696777
Iteration: 388, named_losses: [('ActivationMax Loss', -138.96829),
 ('L-6.0 Norm Loss', 0.062707596),
 ('TV(2.0) Loss', 70.89389)], overall loss: -68.0116882324
Iteration: 389, named_losses: [('ActivationMax Loss', -139.73682),
 ('L-6.0 Norm Loss', 0.062710412),
 ('TV(2.0) Loss', 72.549789)], overall loss: -67.1243133545
Iteration: 390, named_losses: [('ActivationMax Loss', -139.44196),
 ('L-6.0 Norm Loss', 0.06271337),
 ('TV(2.0) Loss', 71.712288)], overall loss: -67.6669540405
Iteration: 391, named_losses: [('ActivationMax Loss', -134.49084),
 ('L-6.0 Norm Loss', 0.06271641),
 ('TV(2.0) Loss', 72.558838)], overall loss: -61.8692932129
Iteration: 392, named_losses: [('ActivationMax Loss', -140.21024),
 ('L-6.0 Norm Loss', 0.062720843),
 ('TV(2.0) Loss', 72.034454)], overall loss: -68.113067627
Iteration: 393, named_losses: [(

Iteration: 438, named_losses: [('ActivationMax Loss', -147.75507),
 ('L-6.0 Norm Loss', 0.062860459),
 ('TV(2.0) Loss', 76.416542)], overall loss: -71.2756576538
Iteration: 439, named_losses: [('ActivationMax Loss', -146.35825),
 ('L-6.0 Norm Loss', 0.062862925),
 ('TV(2.0) Loss', 76.040337)], overall loss: -70.2550430298
Iteration: 440, named_losses: [('ActivationMax Loss', -148.93884),
 ('L-6.0 Norm Loss', 0.062865987),
 ('TV(2.0) Loss', 76.496346)], overall loss: -72.3796310425
Iteration: 441, named_losses: [('ActivationMax Loss', -149.18591),
 ('L-6.0 Norm Loss', 0.062868237),
 ('TV(2.0) Loss', 75.888054)], overall loss: -73.234992981
Iteration: 442, named_losses: [('ActivationMax Loss', -147.30473),
 ('L-6.0 Norm Loss', 0.062870204),
 ('TV(2.0) Loss', 76.196472)], overall loss: -71.0453948975
Iteration: 443, named_losses: [('ActivationMax Loss', -151.31589),
 ('L-6.0 Norm Loss', 0.062872902),
 ('TV(2.0) Loss', 75.758423)], overall loss: -75.4945983887
Iteration: 444, named_losses:

Iteration: 492, named_losses: [('ActivationMax Loss', -157.68198),
 ('L-6.0 Norm Loss', 0.063019305),
 ('TV(2.0) Loss', 78.353966)], overall loss: -79.2649917603
Iteration: 493, named_losses: [('ActivationMax Loss', -152.27164),
 ('L-6.0 Norm Loss', 0.063024096),
 ('TV(2.0) Loss', 80.214958)], overall loss: -71.9936599731
Iteration: 494, named_losses: [('ActivationMax Loss', -158.29787),
 ('L-6.0 Norm Loss', 0.063024633),
 ('TV(2.0) Loss', 78.70845)], overall loss: -79.5263977051
Iteration: 495, named_losses: [('ActivationMax Loss', -153.49835),
 ('L-6.0 Norm Loss', 0.063029215),
 ('TV(2.0) Loss', 81.051971)], overall loss: -72.3833465576
Iteration: 496, named_losses: [('ActivationMax Loss', -158.92822),
 ('L-6.0 Norm Loss', 0.063031986),
 ('TV(2.0) Loss', 78.715805)], overall loss: -80.1493835449
Iteration: 497, named_losses: [('ActivationMax Loss', -155.50137),
 ('L-6.0 Norm Loss', 0.063035131),
 ('TV(2.0) Loss', 81.072197)], overall loss: -74.3661422729
Iteration: 498, named_losses:

<matplotlib.image.AxesImage at 0x7f5418480e90>

<matplotlib.figure.Figure at 0x7f541879fcd0>

We can see that the loss appears to be converging. So more iterations definitely seem to give better output. One way to get crisper results is to use `Jitter` input_modifier. As the name suggests, `Jitter` moves pixels around in the image. Lets try this out.

In [11]:
from vis.input_modifiers import Jitter

# 20 is the imagenet category for 'ouzel'
# Jitter 16 pixels along all dimensions to during the optimization process.
img = visualize_activation(model, layer_idx, filter_indices=20, max_iter=500, input_modifiers=[Jitter(16)])
plt.imshow(img)

<matplotlib.image.AxesImage at 0x7f5418100150>

<matplotlib.figure.Figure at 0x7f54683b88d0>

Look at that! Not only has the conv net captured what it means to be an ouzel, but it also seems to encode for different orientations and scales, a further proof of rotational and scale invariance. 

Lets try this for a bunch of other random categories. This will take a while. Go grab a nice cup of coffee and prepare to be amused :)

In [12]:
import numpy as np
categories = np.random.permutation(1000)[:15]

vis_images = []
image_modifiers = [Jitter(16)]
for idx in categories:    
    img = visualize_activation(model, layer_idx, filter_indices=idx, max_iter=500, input_modifiers=image_modifiers)
    
    # Reverse lookup index to imagenet label and overlay it on the image.
    img = utils.draw_text(img, utils.get_imagenet_label(idx))
    vis_images.append(img)

# Generate stitched images with 5 cols (so it will have 3 rows).
plt.rcParams['figure.figsize'] = (50, 50)
stitched = utils.stitch_images(vis_images, cols=5)
plt.axis('off')
plt.imshow(stitched)
plt.show()

<matplotlib.figure.Figure at 0x7f54b190d150>

Some of them make sense if you stare at ot for a while. There are ways of improving this. We will cover some ideas for this in the next section. You can come back here and try those out as an exercise.

## Visualizing Conv filters

In a CNN, each Conv layer has several learned *template matching* filters that maximize their output when a similar 
template pattern is found in the input image. First Conv layer is easy to interpret; simply visualize the weights as an image. To see what the Conv layer is doing, a simple option is to apply the filter over raw input pixels. 
Subsequent Conv filters operate over the outputs of previous Conv filters (which indicate the presence or absence 
of some templates), making them hard to interpret.

One way of interpreting them is to generate an input image that maximizes the filter output. This allows us to generate an input that activates the filter.

Lets start by visualizing the second conv layer of vggnet (named as 'block1_conv2'). Here is the VGG16 model for reference.

In [13]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [14]:
from vis.visualization import get_num_filters

# The name of the layer we want to visualize
# You can see this in the model definition.
layer_name = 'block1_conv2'
layer_idx = utils.find_layer_idx(model, layer_name)

# Visualize all filters in this layer.
filters = np.arange(get_num_filters(model.layers[layer_idx]))

# Generate input image for each filter.
vis_images = []
for idx in filters:
    img = visualize_activation(model, layer_idx, filter_indices=idx)
    
    # Utility to overlay text on image.
    img = utils.draw_text(img, 'Filter {}'.format(idx))    
    vis_images.append(img)

# Generate stitched image palette with 8 cols.
stitched = utils.stitch_images(vis_images, cols=8)    
plt.axis('off')
plt.imshow(stitched)
plt.title(layer_name)
plt.show()

<matplotlib.figure.Figure at 0x7f53cdb67fd0>

They mostly seem to match for specific color and directional patterns. Lets try a bunch of other layers.
We will randomly visualize 10 filters within various layers.

In [15]:
selected_indices = []
for layer_name in ['block2_conv2', 'block3_conv3', 'block4_conv3', 'block5_conv3']:
    layer_idx = utils.find_layer_idx(model, layer_name)

    # Visualize all filters in this layer.
    filters = np.random.permutation(get_num_filters(model.layers[layer_idx]))[:10]
    selected_indices.append(filters)

    # Generate input image for each filter.
    vis_images = []
    for idx in filters:
        img = visualize_activation(model, layer_idx, filter_indices=idx)

        # Utility to overlay text on image.
        img = utils.draw_text(img, 'Filter {}'.format(idx))    
        vis_images.append(img)

    # Generate stitched image palette with 5 cols so we get 2 rows.
    stitched = utils.stitch_images(vis_images, cols=5)    
    plt.figure()
    plt.axis('off')
    plt.imshow(stitched)
    plt.show()

<matplotlib.figure.Figure at 0x7f53ca126490>

<matplotlib.figure.Figure at 0x7f53c6f71c90>

<matplotlib.figure.Figure at 0x7f53cc80ac10>

<matplotlib.figure.Figure at 0x7f53cd2e0510>

We can see how filters evolved to look for simple -> complex abstract patterns.

We also notice that some of the filters in `block5_conv3` (the last one) failed to converge.  This is usually because regularization losses (total variation and LP norm) are overtaking activation maximization loss (set verbose=True to observe). There are a couple of options to make this work better,

- Different regularization weights.
- Increase number of iterations.
- Add `Jitter` input_modifier.
- Try with 0 regularization weights, generate a converged image and use that as `seed_input` with regularization enabled.

I will show a subset of these ideas here. Lets start by adidng Jitter and disabling total variation.

In [16]:
layer_idx = utils.find_layer_idx(model, 'block5_conv3')

# We need to select the same random filters in order to compare the results.
filters = selected_indices[-1]
selected_indices.append(filters)

# Generate input image for each filter.
vis_images = []
for idx in filters:
    # We will jitter 5% relative to the image size.
    img = visualize_activation(model, layer_idx, filter_indices=idx, 
                               tv_weight=0.,
                               input_modifiers=[Jitter(0.05)])

    # Utility to overlay text on image.
    img = utils.draw_text(img, 'Filter {}'.format(idx))    
    vis_images.append(img)

# Generate stitched image palette with 5 cols so we get 2 rows.
stitched = utils.stitch_images(vis_images, cols=5)    
plt.figure()
plt.axis('off')
plt.imshow(stitched)
plt.show()

<matplotlib.figure.Figure at 0x7f53c8a3e450>

We can see how previously unconverged filters show something this time. Lets take a specific output from here and use it as a `seed_input` with total_variation enabled this time.

In [17]:
# Generate input image for each filter.
new_vis_images = []
for i, idx in enumerate(filters):
    # We will seed with optimized image this time.
    img = visualize_activation(model, layer_idx, filter_indices=idx, 
                               seed_input=vis_images[i],
                               input_modifiers=[Jitter(0.05)])

    # Utility to overlay text on image.
    img = utils.draw_text(img, 'Filter {}'.format(idx))    
    new_vis_images.append(img)

# Generate stitched image palette with 5 cols so we get 2 rows.
stitched = utils.stitch_images(new_vis_images, cols=5)    
plt.figure()
plt.axis('off')
plt.imshow(stitched)
plt.show()

<matplotlib.figure.Figure at 0x7f53c20a9290>

And that, folks, is how we roll :)
This trick works pretty well to get those stubborn filters to converge.

## Other fun stuff

The API to `visualize_activation` accepts `filter_indices`. This is generally meant for *multi label* classifiers, but nothing prevents us from having some fun. 

By setting `filter_indices`, to multiple output categories, we can generate an input that the network thinks is both those categories. Maybe we can generate a cool looking crab fish. I will leave this as an exersice to the reader. You mgith have to experiment with regularization weights a lot.

Ideally, we can use a GAN trained on imagenet and use the discriminator loss as a regularizer. This is easily done using `visualize_activations_with_losses` API. If you ever do this, please consider submitting a PR :)

## Visualizations without swapping softmax

As alluded at the beginning of the tutorial, we want to compare and see what happens if we didnt swap out softmax for linear activation.

Lets try the `ouzel` visualization again.

In [21]:
layer_idx = utils.find_layer_idx(model, 'predictions')

# Swap linear back with softmax
model.layers[layer_idx].activation = activations.softmax
model = utils.apply_modifications(model)

img = visualize_activation(model, layer_idx, filter_indices=20, input_modifiers=[Jitter(16)])
plt.rcParams['figure.figsize'] = (18, 6)
plt.imshow(img)

<matplotlib.image.AxesImage at 0x7f53c40eae90>

<matplotlib.figure.Figure at 0x7f53c48293d0>

It does not work! The reason is that maximizing an output node can be done by minimizing other outputs. Softmax is weird that way. It is the only activation that depends on other node output(s) in the layer.