# Deep Learning Based Semantic Segmentation to Enhance Local Surrogate Models

## Summary:  
- Image segmentation algorithms used as the first step in surrogate model based model interpretability algorithms don’t create very interpretable segmentation.
- We believe replacing the image segmentation step with deep learning based semantic segmentation can create much more interpretable models.


## Problem Description: 
- Using local surrogate models based interpretability algorithms such as LIME on image data work by creating interpretable model-agnostic explanations by first segmenting with image processing algorithms such as slic, watershed, chan_vese, etc. (see https://scikit-image.org/docs/dev/api/skimage.segmentation.html)  
- The problem with this kind of image segmentation from a model interpretability standpoint is that even when important predictive features of images are identified they are not mapped to any semantic meaning.   
- In the example below the local surrogate model interpretability LIME, using slic to segment the image, identifies the red regions as important for a making prediction of an astronaut or not.  Unfortunately those regions have no semantic meaning. They may be part of the helmet, or the space suit but what one really wants to know is which semantic features are important.   
- Is the space suit or helmet or face or some combination is used to make the prediction?
<img src='Image/suit.png'>

## Deep Learning Based Semantic Segmentation:  
Deep learning based semantic segmentation has been shown to be able segment images into regions much more suited for model interpretability. Rather than segmenting on color and brightness they identify and segment the objects in an image.  In the example below, we can see the deep learning based semantic segmentation has identified the regions of the image corresponding to the people, table and TV.  

- In this project we propose replacing the image processing algorithms based on color and other statistical properties of images with deep learning based semantic segmentation for the segmentation step in surrogate models based interpretability algorithms such as LIME on images.

<img src='Image/person.png'>

## Some background:
### 1. Knowledge structure about Segmentation:  
<img src='Image/Concept.jpg'>

## 3. Why Semantic Segmentation:  
- Semantic segmentation is unlike classification where the end result of the very deep network is the only important thing, not only requires <font color='red'>discrimination at pixel level</font> but also a mechanism to <font color='red'>project the discriminative features</font> learnt at different stages of the encoder onto the pixel space.  

- __A general semantic segmentation architecture can be broadly thought of as an encoder network followed by a decoder network:__  
The encoder is usually is a pre-trained classification network like VGG/ResNet followed by a decoder network.  
The task of the decoder is to semantically project the discriminative features (lower resolution) learnt by the encoder onto the pixel space (higher resolution) to get a dense classification.

## 4. VGG16:  
<img src='Image/vgg16.png'>  
#### VGG16 Architecture:  
<img src='Image/vgg16-neural-network.jpg'>  
<img src='Image/Capture-564x570.jpg'>
####  Hidden layers:  
   - All hiddeng layers are equipped with the rectification (ReLU) non-linearity.   
   
#### Fully-Connected (FC) layers:  
- 3 follow a stack of convolutional layers (which has a different depth in different architectures): 
    - the 1, 2 have 4096 channels each.  
    - the 3rd performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class).  
    - The final layer is the soft-max layer.   
    - The configuration of the fully connected layers is the same in all networks.  


In [1]:
#import tensorflow as tf

import os
import sys
import tensorflow as tf
import skimage.io as io
import numpy as np

In [2]:
print(tf.__version__)

1.13.1


In [3]:
def get_kernel_size(factor):
    """
    Find the kernel size given the desired factor of upsampling.
    """
    return 2 * factor - factor % 2


In [4]:
def upsample_filt(size):
    """
    Make a 2D bilinear kernel suitable for upsampling of the given (h, w) size.
    """
    factor = (size + 1) // 2
    if size % 2 == 1:
        center = factor - 1
    else:
        center = factor - 0.5
    og = np.ogrid[:size, :size]
    return (1 - abs(og[0] - center) / factor) * \
           (1 - abs(og[1] - center) / factor)

In [5]:
def bilinear_upsample_weights(factor, number_of_classes):
    """
    Create weights matrix for transposed convolution with bilinear filter
    initialization.
    """    
    filter_size = get_kernel_size(factor)
    
    weights = np.zeros((filter_size,
                        filter_size,
                        number_of_classes,
                        number_of_classes), dtype=np.float32)
    
    upsample_kernel = upsample_filt(filter_size)
    
    for i in range(number_of_classes):
        
        weights[:, :, i, i] = upsample_kernel
    
    return weights

In [6]:
from __future__ import division
%matplotlib inline

os.environ["CUDA_VISIBLE_DEVICES"] = '1'
sys.path.append("C:\\Projects\\(Deep_Learning)Deep_Learning_Based_Semantic_Segmentation_to_Enhance_Local_Surrogate_Models\\Model\\slim")#/home/dpakhom1/workspace/my_models/slim/
checkpoints_dir = 'C:\\Projects\\(Deep_Learning)Deep_Learning_Based_Semantic_Segmentation_to_Enhance_Local_Surrogate_Models\\Model\\checkpoint' #checkpoint


In [7]:
print(sys.path)

['C:\\Projects\\(Deep_Learning)Deep_Learning_Based_Semantic_Segmentation_to_Enhance_Local_Surrogate_Models', 'c:\\python37\\python37.zip', 'c:\\python37\\DLLs', 'c:\\python37\\lib', 'c:\\python37', '', 'c:\\python37\\lib\\site-packages', 'c:\\python37\\lib\\site-packages\\setuptools-41.4.0-py3.7.egg', 'c:\\python37\\lib\\site-packages\\win32', 'c:\\python37\\lib\\site-packages\\win32\\lib', 'c:\\python37\\lib\\site-packages\\Pythonwin', 'c:\\python37\\lib\\site-packages\\IPython\\extensions', 'C:\\Users\\Zi Wei Fan\\.ipython', 'C:\\Projects\\(Deep_Learning)Deep_Learning_Based_Semantic_Segmentation_to_Enhance_Local_Surrogate_Models\\Model\\slim']


In [8]:

image_filename = 'Image/train/cat.jpg'
annotation_filename = 'Image/train/cat_annotation.png'

image_filename_placeholder = tf.placeholder(tf.string)
annotation_filename_placeholder = tf.placeholder(tf.string)
is_training_placeholder = tf.placeholder(tf.bool)

feed_dict_to_use = {image_filename_placeholder: image_filename,
                    annotation_filename_placeholder: annotation_filename,
                    is_training_placeholder: True}

image_tensor = tf.read_file(image_filename_placeholder)
annotation_tensor = tf.read_file(annotation_filename_placeholder)

image_tensor = tf.image.decode_jpeg(image_tensor, channels=3)
annotation_tensor = tf.image.decode_png(annotation_tensor, channels=1)

# Get ones for each class instead of a number -- we need that
# for cross-entropy loss later on. Sometimes the groundtruth
# masks have values other than 1 and 0. 
class_labels_tensor = tf.equal(annotation_tensor, 1)
background_labels_tensor = tf.not_equal(annotation_tensor, 1)

# Convert the boolean values into floats -- so that
# computations in cross-entropy loss is correct
bit_mask_class = tf.to_float(class_labels_tensor)
bit_mask_background = tf.to_float(background_labels_tensor)

combined_mask = tf.concat(axis=2, values=[bit_mask_class,
                                                bit_mask_background])

# Lets reshape our input so that it becomes suitable for 
# tf.softmax_cross_entropy_with_logits with [batch_size, num_classes]
flat_labels = tf.reshape(tensor=combined_mask, shape=(-1, 2))

import numpy as np
import tensorflow as tf
import sys
import os
from matplotlib import pyplot as plt

from nets import vgg
from preprocessing import vgg_preprocessing

# Load the mean pixel values and the function
# that performs the subtraction from each pixel
from preprocessing.vgg_preprocessing import (_mean_image_subtraction,
                                            _R_MEAN, _G_MEAN, _B_MEAN)
try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

Instructions for updating:
Use tf.cast instead.

For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.



__Note__:  
   - TypeError: concat() got an unexpected keyword argument 'concat_dim': Seems not compatible with current TensorFlow
modify concat_dim to axis.

In [9]:
fig_size = [15, 4]
plt.rcParams["figure.figsize"] = fig_size

slim = tf.contrib.slim

In [10]:
upsample_factor = 32
number_of_classes = 2
log_folder = 'Log'

vgg_checkpoint_path = os.path.join(checkpoints_dir, 'vgg_16.ckpt')

In [11]:
# Convert image to float32 before subtracting the
# mean pixel value
image_float = tf.to_float(image_tensor, name='ToFloat')

# Subtract the mean pixel value from each pixel
mean_centered_image = _mean_image_subtraction(image_float,
                                          [_R_MEAN, _G_MEAN, _B_MEAN])

processed_images = tf.expand_dims(mean_centered_image, 0)

upsample_filter_np = bilinear_upsample_weights(upsample_factor,
                                               number_of_classes)

upsample_filter_tensor = tf.constant(upsample_filter_np)


In [12]:
# Define the model that we want to use -- specify to use only two classes at the last layer
with slim.arg_scope(vgg.vgg_arg_scope()):
    
    logits, end_points = vgg.vgg_16(processed_images,
                           num_classes=2,
                           is_training=is_training_placeholder,
                           spatial_squeeze=False,
                           fc_conv_padding='SAME')

downsampled_logits_shape = tf.shape(logits)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [13]:

# Calculate the ouput size of the upsampled tensor
upsampled_logits_shape = tf.stack([
                                  downsampled_logits_shape[0],
                                  downsampled_logits_shape[1] * upsample_factor,
                                  downsampled_logits_shape[2] * upsample_factor,
                                  downsampled_logits_shape[3]
                                 ])

# Perform the upsampling
upsampled_logits = tf.nn.conv2d_transpose(logits, upsample_filter_tensor,
                                 output_shape=upsampled_logits_shape,
                                 strides=[1, upsample_factor, upsample_factor, 1])

# Flatten the predictions, so that we can compute cross-entropy for
# each pixel and get a sum of cross-entropies.
flat_logits = tf.reshape(tensor=upsampled_logits, shape=(-1, number_of_classes))

cross_entropies = tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits,
                                                          labels=flat_labels)

cross_entropy_sum = tf.reduce_sum(cross_entropies)

# Tensor to get the final prediction for each pixel -- pay 
# attention that we don't need softmax in this case because
# we only need the final decision. If we also need the respective
# probabilities we will have to apply softmax.
pred = tf.argmax(upsampled_logits, dimension=3)

probabilities = tf.nn.softmax(upsampled_logits)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Instructions for updating:
Use the `axis` argument instead


In [14]:

# Here we define an optimizer and put all the variables
# that will be created under a namespace of 'adam_vars'.
# This is done so that we can easily access them later.
# Those variables are used by adam optimizer and are not
# related to variables of the vgg model.

# We also retrieve gradient Tensors for each of our variables
# This way we can later visualize them in tensorboard.
# optimizer.compute_gradients and optimizer.apply_gradients
# is equivalent to running:
# train_step = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cross_entropy_sum)
with tf.variable_scope("adam_vars"):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.0001)
    gradients = optimizer.compute_gradients(loss=cross_entropy_sum)
    
    for grad_var_pair in gradients:
        
        current_variable = grad_var_pair[1]
        current_gradient = grad_var_pair[0]
        
        # Relace some characters from the original variable name
        # tensorboard doesn't accept ':' symbol
        gradient_name_to_save = current_variable.name.replace(":", "_")
        
        # Let's get histogram of gradients for each layer and
        # visualize them later in tensorboard
        tf.summary.histogram(gradient_name_to_save, current_gradient) 
    
    train_step = optimizer.apply_gradients(grads_and_vars=gradients)
    
# Now we define a function that will load the weights from VGG checkpoint
# into our variables when we call it. We exclude the weights from the last layer
# which is responsible for class predictions. We do this because 
# we will have different number of classes to predict and we can't
# use the old ones as an initialization.
vgg_except_fc8_weights = slim.get_variables_to_restore(exclude=['vgg_16/fc8', 'adam_vars'])

# Here we get variables that belong to the last layer of network.
# As we saw, the number of classes that VGG was originally trained on
# is different from ours -- in our case it is only 2 classes.
vgg_fc8_weights = slim.get_variables_to_restore(include=['vgg_16/fc8'])

adam_optimizer_variables = slim.get_variables_to_restore(include=['adam_vars'])

# Add summary op for the loss -- to be able to see it in
# tensorboard.
tf.summary.scalar('cross_entropy_loss', cross_entropy_sum)

# Put all summary ops into one op. Produces string when
# you run it.
merged_summary_op = tf.summary.merge_all()

# Create the summary writer -- to write all the logs
# into a specified file. This file can be later read
# by tensorboard.
summary_string_writer = tf.summary.FileWriter(log_folder)

# Create the log folder if doesn't exist yet
if not os.path.exists(log_folder):
    os.makedirs(log_folder)

# Create an OP that performs the initialization of
# values of variables to the values from VGG.
read_vgg_weights_except_fc8_func = slim.assign_from_checkpoint_fn(
                                   vgg_checkpoint_path,
                                   vgg_except_fc8_weights)

# Initializer for new fc8 weights -- for two classes.
vgg_fc8_weights_initializer = tf.variables_initializer(vgg_fc8_weights)

# Initializer for adam variables
optimization_variables_initializer = tf.variables_initializer(adam_optimizer_variables)


In [15]:
with tf.Session() as sess:
    
    # Run the initializers.
    read_vgg_weights_except_fc8_func(sess)
    sess.run(vgg_fc8_weights_initializer)
    sess.run(optimization_variables_initializer)
    
    train_image, train_annotation = sess.run([image_tensor, annotation_tensor],
                                              feed_dict=feed_dict_to_use)
    
    f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
    ax1.imshow(train_image)
    ax1.set_title('Input image')
    probability_graph = ax2.imshow(np.dstack((train_annotation,)*3)*100)
    ax2.set_title('Input Ground-Truth Annotation')
    plt.show()
    
    # Let's perform 10 interations
    for i in range(10):
        
        loss, summary_string = sess.run([cross_entropy_sum, merged_summary_op],
                                        feed_dict=feed_dict_to_use)
        
        sess.run(train_step, feed_dict=feed_dict_to_use)
        
        pred_np, probabilities_np = sess.run([pred, probabilities],
                                              feed_dict=feed_dict_to_use)
        
        summary_string_writer.add_summary(summary_string, i)
        
        cmap = plt.get_cmap('bwr')
        
        f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
        ax1.imshow(np.uint8(pred_np.squeeze() != 1), vmax=1.5, vmin=-0.4, cmap=cmap)
        ax1.set_title('Argmax. Iteration # ' + str(i))
        probability_graph = ax2.imshow(probabilities_np.squeeze()[:, :, 0])
        ax2.set_title('Probability of the Class. Iteration # ' + str(i))
        
        plt.colorbar(probability_graph)
        plt.show()
        
        print("Current Loss: " +  str(loss))
    
    feed_dict_to_use[is_training_placeholder] = False
    
    final_predictions, final_probabilities, final_loss = sess.run([pred,
                                                                   probabilities,
                                                                   cross_entropy_sum],
                                                         feed_dict=feed_dict_to_use)
    
    
    f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
    
    ax1.imshow(np.uint8(final_predictions.squeeze() != 1),
               vmax=1.5,
               vmin=-0.4,
               cmap=cmap)
    
    ax1.set_title('Final Argmax')
    
    
    probability_graph = ax2.imshow(final_probabilities.squeeze()[:, :, 0])
    ax2.set_title('Final Probability of the Class')
    plt.colorbar(probability_graph)
    
    plt.show()
    
    print("Final Loss: " +  str(final_loss))   

summary_string_writer.close()

Instructions for updating:
Use standard file APIs to check for files with this prefix.


ValueError: The passed save_path is not a valid checkpoint: C:\Projects\(Deep_Learning)Deep_Learning_Based_Semantic_Segmentation_to_Enhance_Local_Surrogate_Models\Model\checkpoint\vgg_16.ckpt

### Reference:  
- https://nanonets.com/blog/how-to-do-semantic-segmentation-using-deep-learning/  
- https://blog.goodaudience.com/using-convolutional-neural-networks-for-image-segmentation-a-quick-intro-75bd68779225  
- https://towardsdatascience.com/image-segmentation-using-pythons-scikit-image-module-533a61ecc980
- http://www.cs.toronto.edu/~frossard/post/vgg16/  
- https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
- https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/  
- http://warmspringwinds.github.io/tensorflow/tf-slim/2016/12/18/image-segmentation-with-tensorflow-using-cnns-and-conditional-random-fields/ #!!!!!!!!
- https://nbviewer.jupyter.org/github/warmspringwinds/tensorflow_notes/blob/master/image_segmentation_conditional_random_fields.ipynb #!!!!!!!!!!!!!