## Gifsplanation Implementation

Gifsplanation is a method proposed as a means of generating counterfactuals by shifting the latent representation in the direction that causes the most change in class. In this way, by making use of an encoder and decoder, we can generate images that show the visual changes necessary to change from a class to the target class.

We already have an encoder which will be the backbone from our trained model. There are some benefits to using this model, primarily that the CAVs we have generated will have meaning in the latent space of the model. This means that we can hopefully pass the CAVs through and influence the activations to cause changes in the image in the direction of the CAVs. This will hopefully show us what visual features that the CAVs represent.

So we will need to allow for values to be added to the activations in the layers that we used as bottleneck layers for our automated concept extraction method. These changes were made to the Torchvision library files and they are added to the repo. The specific files changes were:

-

The primary changes was to ensure that a passed tensor could be added to the activations passing through the network at the correct point.

In [1]:
import torch
from torch.utils.data import DataLoader
from torchvision.models.resnet import resnet50

import Utils.ACE.ace_helpers as ace_helpers
from Utils.Training.dataset import MidogDataset, transforms
import Utils.Training.utils as utils

### Loading in the trained model

We want to make use of the backbone from our trained model, so we will load it in and isolate the backbone and take the weights into a new ResNet50 model. This new model will have no Feature Pyramid Network, but this is not important as we don't need activations for multiple layers to aid in detection, we only need to final set of activations for use in a decoder.

In [24]:
# Load the old model in
bottleneck_layers = ['backbone.body.layer4.2.conv1']

# Create the model variable and set it to evaluate.
mymodel = ace_helpers.MyModel("mitotic", bottleneck_layers)
mymodel.model.eval()
mymodel.model.model

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): BottleneckInsertValues(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=T

In [3]:
# Take the transform for our original model, this will ensure that activations are the same.
transform = mymodel.model.model.transform

# Take the backbone from our trained model.
backbone = mymodel.model.model.backbone.body

In [4]:
# Take a new ResNet50 model.
resnet_backbone = resnet50()

In [5]:
# Load the state from the backbone from our model.
resnet_backbone.load_state_dict(backbone.state_dict(), strict=False)

# NOTE: We are missing the fully connected layer weights and biases, but this is okay as we will just take the activations from
# the last bottleneck layer.

_IncompatibleKeys(missing_keys=['fc.weight', 'fc.bias'], unexpected_keys=[])

### Loading in the Dataset

We will be making use of the tiles from our dataset as we want to see how well we can generate counterfactuals for each detection.

In [6]:
# Create a Dataset object with the given root path to the training data and a defined transformation.
midog = MidogDataset("D:/DS/DS4/Project/Training_mitotic_figures", transforms)

In [7]:
# Create a DataLoader with the dataset with a batch size of 2, no shuffling and use a custom defined collate_fn to batch
# the output as desired.
data_loader = DataLoader(midog, batch_size=1, shuffle=False, collate_fn=utils.collate_fn)

In [8]:
output = next(iter(data_loader))

In [9]:
output[0].shape

torch.Size([1, 3, 512, 512])

In [18]:
imgs, targets = transform(output[0], output[1])

In [22]:
results = resnet_backbone(imgs.tensors)

In [23]:
results.shape

torch.Size([1, 1000])

We can see that the backbone we have has output values as we expect. These are passed through the fully connected layer currently, but once we specify what tensor we would like to add to the activations we will be returned values from the last bottleneck layer.

### Creating the Encoder for Reconstruction

Now we need to take the latent representation from the encoder and create a decoder that can reconstruct the original image. This will need to be trained using a reconstruction loss, or cross entropy