<a href="https://colab.research.google.com/github/yaoshiang/MobileNetV2-CIFAR-Cleverhans/blob/master/MobileNetV2_for_CIFAR_for_PGD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Applying Cleverhans to a version of MobileNetV2.

We specifically choose a modern architecture to analyze. A CNN built solely on convolutions and pooling are by now far out of date. We specifically wanted to analyze a CNN with batchnorm, bottlenecks, and residual blocks. 

But we can't afford to train on ImageNet so we decided build for CIFAR10. This required us to shorten MobileNetV2 to prevent overstriding down of the feature maps down to 1x1 degenerate maps. 

https://arxiv.org/abs/1911.09665

How Does Batch Normalization Help Optimization?
https://arxiv.org/pdf/1805.11604.pdf



### Setup libraries.


In [0]:
import os
import numpy as np

%tensorflow_version 1.14
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow
print(tensorflow.__version__)

`%tensorflow_version` only switches the major version: `1.x` or `2.x`.
You set: `1.14`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.
1.15.0


In [0]:
# import keras
# print(keras.__version__)
print(tensorflow.keras.__version__)

2.2.4-tf


In [0]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16914428360906805188
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 13257041665067616516
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 8111800764023474361
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 14912199066
locality {
  bus_id: 1
  links {
  }
}
incarnation: 17830433992289389595
physical_device_desc: "device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5"
]


In [0]:
!pip install -qq -e git+http://github.com/yaoshiang/cleverhans.git#egg=cleverhans
import sys
sys.path.append('/content/src/cleverhans')
import cleverhans

In [0]:
!nvidia-smi

Tue Jan 21 21:36:58 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   69C    P0    30W /  70W |    111MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

In [0]:
import cleverhans_tutorials as ct
import cleverhans_tutorials.cifar10_tutorial_tf as ctc




### Create a cleverhans wrapper for the Keras version of MobileNet v2.

We use MobileNet V2 as an example of "modern" image classifier architecture, which includes Residual Blocks and batchnorms. The next generation after that such as MobileNetV3 are generally discovered via AutoML searches. 

### First lift out of a lot of Keras source code and modify the Keras implementation of MobileNetV2. 

In [0]:
def correct_pad(backend, inputs, kernel_size):
    """Returns a tuple for zero-padding for 2D convolution with downsampling.
    # Arguments
        input_size: An integer or tuple/list of 2 integers.
        kernel_size: An integer or tuple/list of 2 integers.
    # Returns
        A tuple.
    """
    img_dim = 2 if backend.image_data_format() == 'channels_first' else 1
    input_size = backend.int_shape(inputs)[img_dim:(img_dim + 2)]

    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)

    if input_size[0] is None:
        adjust = (1, 1)
    else:
        adjust = (1 - input_size[0] % 2, 1 - input_size[1] % 2)

    correct = (kernel_size[0] // 2, kernel_size[1] // 2)

    return ((correct[0] - adjust[0], correct[0]),
            (correct[1] - adjust[1], correct[1]))
    

    """MobileNet v2 models for Keras.
MobileNetV2 is a general architecture and can be used for multiple use cases.
Depending on the use case, it can use different input layer size and
different width factors. This allows different width models to reduce
the number of multiply-adds and thereby
reduce inference cost on mobile devices.
MobileNetV2 is very similar to the original MobileNet,
except that it uses inverted residual blocks with
bottlenecking features. It has a drastically lower
parameter count than the original MobileNet.
MobileNets support any input size greater
than 32 x 32, with larger image sizes
offering better performance.
The number of parameters and number of multiply-adds
can be modified by using the `alpha` parameter,
which increases/decreases the number of filters in each layer.
By altering the image size and `alpha` parameter,
all 22 models from the paper can be built, with ImageNet weights provided.
The paper demonstrates the performance of MobileNets using `alpha` values of
1.0 (also called 100 % MobileNet), 0.35, 0.5, 0.75, 1.0, 1.3, and 1.4
For each of these `alpha` values, weights for 5 different input image sizes
are provided (224, 192, 160, 128, and 96).
The following table describes the performance of
MobileNet on various input sizes:
------------------------------------------------------------------------
MACs stands for Multiply Adds
 Classification Checkpoint| MACs (M) | Parameters (M)| Top 1 Accuracy| Top 5 Accuracy
--------------------------|------------|---------------|---------|----|-------------
| [mobilenet_v2_1.4_224]  | 582 | 6.06 |          75.0 | 92.5 |
| [mobilenet_v2_1.3_224]  | 509 | 5.34 |          74.4 | 92.1 |
| [mobilenet_v2_1.0_224]  | 300 | 3.47 |          71.8 | 91.0 |
| [mobilenet_v2_1.0_192]  | 221 | 3.47 |          70.7 | 90.1 |
| [mobilenet_v2_1.0_160]  | 154 | 3.47 |          68.8 | 89.0 |
| [mobilenet_v2_1.0_128]  | 99  | 3.47 |          65.3 | 86.9 |
| [mobilenet_v2_1.0_96]   | 56  | 3.47 |          60.3 | 83.2 |
| [mobilenet_v2_0.75_224] | 209 | 2.61 |          69.8 | 89.6 |
| [mobilenet_v2_0.75_192] | 153 | 2.61 |          68.7 | 88.9 |
| [mobilenet_v2_0.75_160] | 107 | 2.61 |          66.4 | 87.3 |
| [mobilenet_v2_0.75_128] | 69  | 2.61 |          63.2 | 85.3 |
| [mobilenet_v2_0.75_96]  | 39  | 2.61 |          58.8 | 81.6 |
| [mobilenet_v2_0.5_224]  | 97  | 1.95 |          65.4 | 86.4 |
| [mobilenet_v2_0.5_192]  | 71  | 1.95 |          63.9 | 85.4 |
| [mobilenet_v2_0.5_160]  | 50  | 1.95 |          61.0 | 83.2 |
| [mobilenet_v2_0.5_128]  | 32  | 1.95 |          57.7 | 80.8 |
| [mobilenet_v2_0.5_96]   | 18  | 1.95 |          51.2 | 75.8 |
| [mobilenet_v2_0.35_224] | 59  | 1.66 |          60.3 | 82.9 |
| [mobilenet_v2_0.35_192] | 43  | 1.66 |          58.2 | 81.2 |
| [mobilenet_v2_0.35_160] | 30  | 1.66 |          55.7 | 79.1 |
| [mobilenet_v2_0.35_128] | 20  | 1.66 |          50.8 | 75.0 |
| [mobilenet_v2_0.35_96]  | 11  | 1.66 |          45.5 | 70.4 |
The weights for all 16 models are obtained and
translated from the Tensorflow checkpoints
from TensorFlow checkpoints found [here]
(https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md).
# Reference
This file contains building code for MobileNetV2, based on
[MobileNetV2: Inverted Residuals and Linear Bottlenecks]
(https://arxiv.org/abs/1801.04381) (CVPR 2018)
Tests comparing this model to the existing Tensorflow model can be
found at [mobilenet_v2_keras]
(https://github.com/JonathanCMitchell/mobilenet_v2_keras)
"""
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division

import os
import warnings
import numpy as np

# TODO Change path to v1.1
BASE_WEIGHT_PATH = ('https://github.com/JonathanCMitchell/mobilenet_v2_keras/'
                    'releases/download/v1.1/')

backend = None
layers = None
models = None
keras_utils = None


_KERAS_BACKEND = None
_KERAS_LAYERS = None
_KERAS_MODELS = None
_KERAS_UTILS = None


def get_submodules_from_kwargs(kwargs):
    backend = kwargs.get('backend', _KERAS_BACKEND)
    layers = kwargs.get('layers', _KERAS_LAYERS)
    models = kwargs.get('models', _KERAS_MODELS)
    utils = kwargs.get('utils', _KERAS_UTILS)
    for key in kwargs.keys():
        if key not in ['backend', 'layers', 'models', 'utils']:
            raise TypeError('Invalid keyword argument: %s', key)
    return backend, layers, models, utils



# This function is taken from the original tf repo.
# It ensures that all layers have a channel number that is divisible by 8
# It can be seen here:
# https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py


def _make_divisible(v, divisor, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

from tensorflow.keras import backend
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import utils

def keras_modules_injection(base_fun):

    def wrapper(*args, **kwargs):
        kwargs['backend'] = backend
        kwargs['layers'] = layers
        kwargs['models'] = models
        kwargs['utils'] = utils
        return base_fun(*args, **kwargs)

    return wrapper

@keras_modules_injection
def MobileNetV2(input_shape=None,
                alpha=1.0,
                include_top=True,
                weights='imagenet',
                input_tensor=None,
                pooling=None,
                classes=1000,
                **kwargs):
    """Instantiates the MobileNetV2 architecture.
    # Arguments
        input_shape: optional shape tuple, to be specified if you would
            like to use a model with an input img resolution that is not
            (224, 224, 3).
            It should have exactly 3 inputs channels (224, 224, 3).
            You can also omit this option if you would like
            to infer input_shape from an input_tensor.
            If you choose to include both input_tensor and input_shape then
            input_shape will be used if they match, if the shapes
            do not match then we will throw an error.
            E.g. `(160, 160, 3)` would be one valid value.
        alpha: controls the width of the network. This is known as the
        width multiplier in the MobileNetV2 paper, but the name is kept for
        consistency with MobileNetV1 in Keras.
            - If `alpha` < 1.0, proportionally decreases the number
                of filters in each layer.
            - If `alpha` > 1.0, proportionally increases the number
                of filters in each layer.
            - If `alpha` = 1, default number of filters from the paper
                 are used at each layer.
        include_top: whether to include the fully-connected
            layer at the top of the network.
        weights: one of `None` (random initialization),
              'imagenet' (pre-training on ImageNet),
              or the path to the weights file to be loaded.
        input_tensor: optional Keras tensor (i.e. output of
            `layers.Input()`)
            to use as image input for the model.
        pooling: Optional pooling mode for feature extraction
            when `include_top` is `False`.
            - `None` means that the output of the model
                will be the 4D tensor output of the
                last convolutional block.
            - `avg` means that global average pooling
                will be applied to the output of the
                last convolutional block, and thus
                the output of the model will be a
                2D tensor.
            - `max` means that global max pooling will
                be applied.
        classes: optional number of classes to classify images
            into, only to be specified if `include_top` is True, and
            if no `weights` argument is specified.
    # Returns
        A Keras model instance.
    # Raises
        ValueError: in case of invalid argument for `weights`,
            or invalid input shape or invalid alpha, rows when
            weights='imagenet'
    """
    global backend, layers, models, keras_utils
    backend, layers, models, keras_utils = get_submodules_from_kwargs(kwargs)

    if not (weights in {'imagenet', None} or os.path.exists(weights)):
        raise ValueError('The `weights` argument should be either '
                         '`None` (random initialization), `imagenet` '
                         '(pre-training on ImageNet), '
                         'or the path to the weights file to be loaded.')

    if weights == 'imagenet' and include_top and classes != 1000:
        raise ValueError('If using `weights` as `"imagenet"` with `include_top` '
                         'as true, `classes` should be 1000')

    # Determine proper input shape and default size.
    # If both input_shape and input_tensor are used, they should match
    if input_shape is not None and input_tensor is not None:
        try:
            is_input_t_tensor = backend.is_keras_tensor(input_tensor)
        except ValueError:
            try:
                is_input_t_tensor = backend.is_keras_tensor(
                    keras_utils.get_source_inputs(input_tensor))
            except ValueError:
                raise ValueError('input_tensor: ', input_tensor,
                                 'is not type input_tensor')
        if is_input_t_tensor:
            if backend.image_data_format == 'channels_first':
                if backend.int_shape(input_tensor)[1] != input_shape[1]:
                    raise ValueError('input_shape: ', input_shape,
                                     'and input_tensor: ', input_tensor,
                                     'do not meet the same shape requirements')
            else:
                if backend.int_shape(input_tensor)[2] != input_shape[1]:
                    raise ValueError('input_shape: ', input_shape,
                                     'and input_tensor: ', input_tensor,
                                     'do not meet the same shape requirements')
        else:
            raise ValueError('input_tensor specified: ', input_tensor,
                             'is not a keras tensor')

    # If input_shape is None, infer shape from input_tensor
    if input_shape is None and input_tensor is not None:

        try:
            backend.is_keras_tensor(input_tensor)
        except ValueError:
            raise ValueError('input_tensor: ', input_tensor,
                             'is type: ', type(input_tensor),
                             'which is not a valid type')

        if input_shape is None and not backend.is_keras_tensor(input_tensor):
            default_size = 224
        elif input_shape is None and backend.is_keras_tensor(input_tensor):
            if backend.image_data_format() == 'channels_first':
                rows = backend.int_shape(input_tensor)[2]
                cols = backend.int_shape(input_tensor)[3]
            else:
                rows = backend.int_shape(input_tensor)[1]
                cols = backend.int_shape(input_tensor)[2]

            if rows == cols and rows in [96, 128, 160, 192, 224]:
                default_size = rows
            else:
                default_size = 224

    # If input_shape is None and no input_tensor
    elif input_shape is None:
        default_size = 224

    # If input_shape is not None, assume default size
    else:
        if backend.image_data_format() == 'channels_first':
            rows = input_shape[1]
            cols = input_shape[2]
        else:
            rows = input_shape[0]
            cols = input_shape[1]

        if rows == cols and rows in [96, 128, 160, 192, 224]:
            default_size = rows
        else:
            default_size = 224

    input_shape = _obtain_input_shape(input_shape,
                                      default_size=default_size,
                                      min_size=32,
                                      data_format=backend.image_data_format(),
                                      require_flatten=include_top,
                                      weights=weights)

    if backend.image_data_format() == 'channels_last':
        row_axis, col_axis = (0, 1)
    else:
        row_axis, col_axis = (1, 2)
    rows = input_shape[row_axis]
    cols = input_shape[col_axis]

    if weights == 'imagenet':
        if alpha not in [0.35, 0.50, 0.75, 1.0, 1.3, 1.4]:
            raise ValueError('If imagenet weights are being loaded, '
                             'alpha can be one of `0.35`, `0.50`, `0.75`, '
                             '`1.0`, `1.3` or `1.4` only.')

        if rows != cols or rows not in [96, 128, 160, 192, 224]:
            rows = 224
            warnings.warn('`input_shape` is undefined or non-square, '
                          'or `rows` is not in [96, 128, 160, 192, 224].'
                          ' Weights for input shape (224, 224) will be'
                          ' loaded as the default.')

    if input_tensor is None:
        img_input = layers.Input(shape=input_shape)
    else:
        if not backend.is_keras_tensor(input_tensor):
            img_input = layers.Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor

    channel_axis = 1 if backend.image_data_format() == 'channels_first' else -1

    first_block_filters = _make_divisible(32 * alpha, 8)
    x = layers.ZeroPadding2D(padding=correct_pad(backend, img_input, 3),
                             name='Conv1_pad')(img_input)
    x = layers.Conv2D(first_block_filters,
                      kernel_size=3,
                      strides=(2, 2),
                      padding='valid',
                      use_bias=False,
                      name='Conv1')(x)
    x = layers.BatchNormalization(axis=channel_axis,
                                  epsilon=1e-3,
                                  momentum=0.999,
                                  name='bn_Conv1')(x)
    x = layers.ReLU(6., name='Conv1_relu')(x)

    x = _inverted_res_block(x, filters=16, alpha=alpha, stride=1,
                            expansion=1, block_id=0)

    x = _inverted_res_block(x, filters=24, alpha=alpha, stride=2,
                            expansion=6, block_id=1)
    x = _inverted_res_block(x, filters=24, alpha=alpha, stride=1,
                            expansion=6, block_id=2)

    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=2,
                            expansion=6, block_id=3)
    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
                            expansion=6, block_id=4)
    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
                            expansion=6, block_id=5)

    # x = _inverted_res_block(x, filters=64, alpha=alpha, stride=2,
    #                         expansion=6, block_id=6)
    # x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
    #                         expansion=6, block_id=7)
    # x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
    #                         expansion=6, block_id=8)
    # x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
    #                         expansion=6, block_id=9)

    # x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
    #                         expansion=6, block_id=10)
    # x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
    #                         expansion=6, block_id=11)
    # x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
    #                         expansion=6, block_id=12)

    # x = _inverted_res_block(x, filters=160, alpha=alpha, stride=2,
    #                         expansion=6, block_id=13)
    # x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
    #                         expansion=6, block_id=14)
    # x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
    #                         expansion=6, block_id=15)

    # x = _inverted_res_block(x, filters=320, alpha=alpha, stride=1,
    #                         expansion=6, block_id=16)

    # no alpha applied to last conv as stated in the paper:
    # if the width multiplier is greater than 1 we
    # increase the number of output channels
    if alpha > 1.0:
        last_block_filters = _make_divisible(1280 * alpha, 8)
    else:
        last_block_filters = 1280

    x = layers.Conv2D(last_block_filters,
                      kernel_size=1,
                      use_bias=False,
                      name='Conv_1')(x)
    x = layers.BatchNormalization(axis=channel_axis,
                                  epsilon=1e-3,
                                  momentum=0.999,
                                  name='Conv_1_bn')(x)
    x = layers.ReLU(6., name='out_relu')(x)

    if include_top:
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dense(classes, activation='softmax',
                         use_bias=True, name='Logits')(x)
    else:
        if pooling == 'avg':
            x = layers.GlobalAveragePooling2D()(x)
        elif pooling == 'max':
            x = layers.GlobalMaxPooling2D()(x)

    # Ensure that the model takes into account
    # any potential predecessors of `input_tensor`.
    if input_tensor is not None:
        inputs = keras_utils.get_source_inputs(input_tensor)
    else:
        inputs = img_input

    # Create model.
    model = models.Model(inputs, x,
                         name='mobilenetv2_%0.2f_%s' % (alpha, rows))

    # Load weights.
    if weights == 'imagenet':
        if include_top:
            model_name = ('mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' +
                          str(alpha) + '_' + str(rows) + '.h5')
            weight_path = BASE_WEIGHT_PATH + model_name
            weights_path = keras_utils.get_file(
                model_name, weight_path, cache_subdir='models')
        else:
            model_name = ('mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' +
                          str(alpha) + '_' + str(rows) + '_no_top' + '.h5')
            weight_path = BASE_WEIGHT_PATH + model_name
            weights_path = keras_utils.get_file(
                model_name, weight_path, cache_subdir='models')
        model.load_weights(weights_path)
    elif weights is not None:
        model.load_weights(weights)

    return model


def _inverted_res_block(inputs, expansion, stride, alpha, filters, block_id):
    channel_axis = 1 if backend.image_data_format() == 'channels_first' else -1

    in_channels = backend.int_shape(inputs)[channel_axis]
    pointwise_conv_filters = int(filters * alpha)
    pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
    x = inputs
    prefix = 'block_{}_'.format(block_id)

    if block_id:
        # Expand
        x = layers.Conv2D(expansion * in_channels,
                          kernel_size=1,
                          padding='same',
                          use_bias=False,
                          activation=None,
                          name=prefix + 'expand')(x)
        x = layers.BatchNormalization(axis=channel_axis,
                                      epsilon=1e-3,
                                      momentum=0.999,
                                      name=prefix + 'expand_BN')(x)
        x = layers.ReLU(6., name=prefix + 'expand_relu')(x)
    else:
        prefix = 'expanded_conv_'

    # Depthwise
    if stride == 2:
        x = layers.ZeroPadding2D(padding=correct_pad(backend, x, 3),
                                 name=prefix + 'pad')(x)
    x = layers.DepthwiseConv2D(kernel_size=3,
                               strides=stride,
                               activation=None,
                               use_bias=False,
                               padding='same' if stride == 1 else 'valid',
                               name=prefix + 'depthwise')(x)
    x = layers.BatchNormalization(axis=channel_axis,
                                  epsilon=1e-3,
                                  momentum=0.999,
                                  name=prefix + 'depthwise_BN')(x)

    x = layers.ReLU(6., name=prefix + 'depthwise_relu')(x)

    # Project
    x = layers.Conv2D(pointwise_filters,
                      kernel_size=1,
                      padding='same',
                      use_bias=False,
                      activation=None,
                      name=prefix + 'project')(x)
    x = layers.BatchNormalization(axis=channel_axis,
                                  epsilon=1e-3,
                                  momentum=0.999,
                                  name=prefix + 'project_BN')(x)

    if in_channels == pointwise_filters and stride == 1:
        return layers.Add(name=prefix + 'add')([inputs, x])
    return x



def _obtain_input_shape(input_shape,
                        default_size,
                        min_size,
                        data_format,
                        require_flatten,
                        weights=None):
    """Internal utility to compute/validate a model's input shape.
    # Arguments
        input_shape: Either None (will return the default network input shape),
            or a user-provided shape to be validated.
        default_size: Default input width/height for the model.
        min_size: Minimum input width/height accepted by the model.
        data_format: Image data format to use.
        require_flatten: Whether the model is expected to
            be linked to a classifier via a Flatten layer.
        weights: One of `None` (random initialization)
            or 'imagenet' (pre-training on ImageNet).
            If weights='imagenet' input channels must be equal to 3.
    # Returns
        An integer shape tuple (may include None entries).
    # Raises
        ValueError: In case of invalid argument values.
    """
    if weights != 'imagenet' and input_shape and len(input_shape) == 3:
        if data_format == 'channels_first':
            if input_shape[0] not in {1, 3}:
                warnings.warn(
                    'This model usually expects 1 or 3 input channels. '
                    'However, it was passed an input_shape with ' +
                    str(input_shape[0]) + ' input channels.')
            default_shape = (input_shape[0], default_size, default_size)
        else:
            if input_shape[-1] not in {1, 3}:
                warnings.warn(
                    'This model usually expects 1 or 3 input channels. '
                    'However, it was passed an input_shape with ' +
                    str(input_shape[-1]) + ' input channels.')
            default_shape = (default_size, default_size, input_shape[-1])
    else:
        if data_format == 'channels_first':
            default_shape = (3, default_size, default_size)
        else:
            default_shape = (default_size, default_size, 3)
    if weights == 'imagenet' and require_flatten:
        if input_shape is not None:
            if input_shape != default_shape:
                raise ValueError('When setting `include_top=True` '
                                 'and loading `imagenet` weights, '
                                 '`input_shape` should be ' +
                                 str(default_shape) + '.')
        return default_shape
    if input_shape:
        if data_format == 'channels_first':
            if input_shape is not None:
                if len(input_shape) != 3:
                    raise ValueError(
                        '`input_shape` must be a tuple of three integers.')
                if input_shape[0] != 3 and weights == 'imagenet':
                    raise ValueError('The input must have 3 channels; got '
                                     '`input_shape=' + str(input_shape) + '`')
                if ((input_shape[1] is not None and input_shape[1] < min_size) or
                   (input_shape[2] is not None and input_shape[2] < min_size)):
                    raise ValueError('Input size must be at least ' +
                                     str(min_size) + 'x' + str(min_size) +
                                     '; got `input_shape=' +
                                     str(input_shape) + '`')
        else:
            if input_shape is not None:
                if len(input_shape) != 3:
                    raise ValueError(
                        '`input_shape` must be a tuple of three integers.')
                if input_shape[-1] != 3 and weights == 'imagenet':
                    raise ValueError('The input must have 3 channels; got '
                                     '`input_shape=' + str(input_shape) + '`')
                if ((input_shape[0] is not None and input_shape[0] < min_size) or
                   (input_shape[1] is not None and input_shape[1] < min_size)):
                    raise ValueError('Input size must be at least ' +
                                     str(min_size) + 'x' + str(min_size) +
                                     '; got `input_shape=' +
                                     str(input_shape) + '`')
    else:
        if require_flatten:
            input_shape = default_shape
        else:
            if data_format == 'channels_first':
                input_shape = (3, None, None)
            else:
                input_shape = (None, None, 3)
    if require_flatten:
        if None in input_shape:
            raise ValueError('If `include_top` is True, '
                             'you should specify a static `input_shape`. '
                             'Got `input_shape=' + str(input_shape) + '`')
    return input_shape

### Generate model. We need to compile this model and eventually train for one loop to workaround what may be a possible bug around weights initialization. 

In [0]:
# from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from cleverhans.utils_keras import KerasModelWrapper

def get_mnv2(alpha=.35):
  # All parameters are default except weights = None because we will be training this ourselves on cifar10, not ImageNet, classes=10. From the docs:
  # https://keras.io/applications/#mobilenetv2
  
  # x = tensorflow.keras.layers.Input(shape=(96,96,3))
  model = MobileNetV2(input_shape=(32,32,3), alpha=alpha, include_top=True, weights=None, input_tensor=None, pooling=None, classes=10)

  opt = tensorflow.keras.optimizers.RMSprop(learning_rate=0.001, rho=1)

  model.compile(optimizer='RMSProp',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

  # for layer in model.layers:
  # print(layer.trainable)

  # x = tensorflow.keras.layers.Input(shape=(32,32,3))
  # y = tensorflow.keras.layers.Flatten()(x)
  # y = tensorflow.keras.layers.Dense(10, activation='softmax')(y)
  
  # model = tensorflow.keras.models.Model(inputs=x, outputs=y)
  return model

model = get_mnv2()
print(model.summary())


Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "mobilenetv2_0.35_32"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D)       (None, 33, 33, 3)    0           input_1[0][0]                    
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 16, 16, 16)   432         Conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 16, 16, 16)   64          Conv1[0][0]                      

### Train mobilenetv2, run pgd on it. 

This is mostly from the Cleverhans tutorial on Keras models. 

There is one unexpected trick here. The model needs to be trained in one loop of Keras training before handing it over to the Cleverhans training loop. It may be that weigths are not initialized properly, or, an initial run may be needed to get the TF variables listed out properly. 

The truncated MobileNetV2 model with these stats:

<pre>
Total params: 54,074
Trainable params: 49,962
Non-trainable params: 4,112
</pre>

Reaches 65% top-1 accuracy on CIFAR-10 after about 15 epochs. 

<pre>
[INFO 2020-01-21 22:06:11,473 cleverhans] Epoch 14 took 11.071139812469482 seconds
Test accuracy on legitimate examples: 0.644500
[INFO 2020-01-21 22:06:23,392 cleverhans] Epoch 15 took 11.085895776748657 seconds
Test accuracy on legitimate examples: 0.646700
[INFO 2020-01-21 22:06:35,024 cleverhans] Epoch 16 took 10.808518648147583 seconds
Test accuracy on legitimate examples: 0.653600
[INFO 2020-01-21 22:06:47,575 cleverhans] Epoch 17 took 11.663419008255005 seconds
Test accuracy on legitimate examples: 0.646000
[INFO 2020-01-21 22:06:59,908 cleverhans] Epoch 18 took 11.490140199661255 seconds
Test accuracy on legitimate examples: 0.643300
[INFO 2020-01-21 22:07:11,687 cleverhans] Epoch 19 took 10.94323444366455 seconds
Test accuracy on legitimate examples: 0.658200
</pre>

Adversarial retraining with Madry e8 increases both clean and adversarial accuracy.

<pre>
[INFO 2020-01-22 01:08:49,993 cleverhans] Epoch 0 took 112.98240733146667 seconds
Test accuracy on legitimate examples: 0.4187
Test accuracy on adversarial examples: 0.2815
[INFO 2020-01-22 01:11:07,334 cleverhans] Epoch 1 took 110.0533185005188 seconds
Test accuracy on legitimate examples: 0.4538
Test accuracy on adversarial examples: 0.3147
...
[INFO 2020-01-22 01:47:44,696 cleverhans] Epoch 18 took 109.45361185073853 seconds
Test accuracy on legitimate examples: 0.5932
Test accuracy on adversarial examples: 0.3619
[INFO 2020-01-22 01:49:53,164 cleverhans] Epoch 19 took 109.46263790130615 seconds
Test accuracy on legitimate examples: 0.6000
Test accuracy on adversarial examples: 0.3624
<cleverhans.utils.AccuracyReport at 0x7f5ec5400eb8>
</pre>



In [0]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import os

import tensorflow as tf
from tensorflow import keras
import numpy as np

# from cleverhans.attacks import FastGradientMethod
from cleverhans.attacks import MadryEtAl
from cleverhans.compat import flags
# from cleverhans.dataset import MNIST
from cleverhans.dataset import CIFAR10
from cleverhans.loss import CrossEntropy
from cleverhans.train import train
from cleverhans.utils import AccuracyReport
from cleverhans.utils_keras import cnn_model
from cleverhans.utils_keras import KerasModelWrapper
from cleverhans.utils_tf import model_eval

def cifar_pgd(     train_start=   0, 
                   train_end=     50000, 
                   test_start=    0,
                   test_end=      10000, 
                   nb_epochs=     20,
                   batch_size=    64,
                   learning_rate= 0.001, 
                   train_dir=     ".",
                   filename=      "./cleverhansout",
                   load_model=    False,
                   testing=       True, 
                   label_smoothing=0.0):
  """
  Derived from MNIST CleverHans tutorial

  Adjusted to use Keras MobileNet V2 model, CIFAR, and PGD.

  :param train_start: index of first training set example
  :param train_end: index of last training set example
  :param test_start: index of first test set example
  :param test_end: index of last test set example
  :param nb_epochs: number of epochs to train model
  :param batch_size: size of training batches
  :param learning_rate: learning rate for training
  :param train_dir: Directory storing the saved model
  :param filename: Filename to save model under
  :param load_model: True for load, False for not load
  :param testing: if true, test error is calculated
  :param label_smoothing: float, amount of label smoothing for cross entropy
  :return: an AccuracyReport object
  """
  tf.keras.backend.set_learning_phase(0)

  # Object used to keep track of (and return) key accuracies
  report = AccuracyReport()

  # Set TF random seed to improve reproducibility
  tf.set_random_seed(1234)

  if keras.backend.image_data_format() != 'channels_last':
    raise NotImplementedError("this tutorial requires keras to be configured to channels_last format")

  # Create TF session and set as Keras backend session
  with tf.Session() as sess:

  # sess = tf.Session()
    keras.backend.set_session(sess)

    # Get test data
    cifar = CIFAR10(train_start=train_start, train_end=train_end,
                  test_start=test_start, test_end=test_end)
    x_train, y_train = cifar.get_set('train')
    x_test, y_test = cifar.get_set('test')

    # Obtain Image Parameters
    img_rows, img_cols, nchannels = x_train.shape[1:4]
    nb_classes = y_train.shape[1]

    # Define input TF placeholder
    x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
                                          nchannels))
    y = tf.placeholder(tf.float32, shape=(None, nb_classes))

    # Define TF model graph
    # model = cnn_model(img_rows=img_rows, img_cols=img_cols,
    #                   channels=nchannels, nb_filters=64,
    #                   nb_classes=nb_classes)
    model = get_mnv2()
    preds = model(x)
    print("Defined TensorFlow model graph.")

    def evaluate():
      # Evaluate the accuracy of the model on legitimate test examples
      eval_params = {'batch_size': batch_size}
      acc = model_eval(sess, x, y, preds, x_test, y_test, args=eval_params)
      report.clean_train_clean_eval = acc
    #        assert X_test.shape[0] == test_end - test_start, X_test.shape
      print('Test accuracy on legitimate examples: %0.6f' % acc)

    # Train model
    train_params = {
        'nb_epochs': nb_epochs,
        'batch_size': batch_size,
        'learning_rate': learning_rate,
        'train_dir': train_dir,
        'filename': filename
    }

    rng = np.random.RandomState([2017, 8, 30])
    if not os.path.exists(train_dir):
      os.mkdir(train_dir)

    ckpt = tf.train.get_checkpoint_state(train_dir)
    print(train_dir, ckpt)
    ckpt_path = False if ckpt is None else ckpt.model_checkpoint_path
    wrap = KerasModelWrapper(model)

    if load_model and ckpt_path:
      saver = tf.train.Saver()
      print(ckpt_path)
      saver.restore(sess, ckpt_path)
      print("Model loaded from: {}".format(ckpt_path))
      evaluate()
    else:
      print("Model was not loaded, training from scratch.")
      loss = CrossEntropy(wrap, smoothing=label_smoothing)

      # Do one cycle of Keras training. This may be to work around a 
      # possible bug in weight non-initialization.
      #
      model.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=1, verbose=1, 
                callbacks=None, validation_split=0.0, validation_data=None, 
                shuffle=True, class_weight=None, sample_weight=None, 
                initial_epoch=0, steps_per_epoch=None, validation_steps=None, 
                validation_freq=1, max_queue_size=10, workers=1, 
                use_multiprocessing=False)

      rms = tf.train.RMSPropOptimizer(learning_rate=learning_rate)

      train(sess, loss, x_train, y_train, evaluate=evaluate,
            # optimizer=rms,
            args=train_params, rng=rng)

    # Calculate training error
    if testing:
      eval_params = {'batch_size': batch_size}
      acc = model_eval(sess, x, y, preds, x_train, y_train, args=eval_params)
      report.train_clean_train_clean_eval = acc

    # Initialize the attack object and graph
    madry = MadryEtAl(wrap, sess=sess)
    # TODO: Check to see that these params are application to Madry. These were lifted from FGSM.
    madry_params = {'eps': 8./255., # 0.3,
                    'eps_iter': 2./255.,
                    'clip_min': 0.,
                    'clip_max': 1.}
    adv_x = madry.generate(x, **madry_params)
    # Consider the attack to be constant
    adv_x = tf.stop_gradient(adv_x)
    preds_adv = model(adv_x)

    # Evaluate the accuracy of the MNIST model on adversarial examples
    eval_par = {'batch_size': batch_size}
    acc = model_eval(sess, x, y, preds_adv, x_test, y_test, args=eval_par)
    print('Test accuracy on adversarial examples: %0.6f\n' % acc)
    report.clean_train_adv_eval = acc

    # Calculating train error
    if testing:
      eval_par = {'batch_size': batch_size}
      acc = model_eval(sess, x, y, preds_adv, x_train,
                        y_train, args=eval_par)
      report.train_clean_train_adv_eval = acc

    print("Repeating the process, using adversarial training")
    # Redefine TF model graph
    # Ensue that the model is new. 
    model_2 = get_mnv2()
    wrap_2 = KerasModelWrapper(model_2)
    preds_2 = model_2(x)
    madry2 = MadryEtAl(wrap_2, sess=sess)

    model_2.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=1, verbose=1, 
              callbacks=None, validation_split=0.0, validation_data=None, 
              shuffle=True, class_weight=None, sample_weight=None, 
              initial_epoch=0, steps_per_epoch=None, validation_steps=None, 
              validation_freq=1, max_queue_size=10, workers=1, 
              use_multiprocessing=False)

    def attack(x):
      return madry2.generate(x, **madry_params)

    preds_2_adv = model_2(attack(x))
    loss_2 = CrossEntropy(wrap_2, smoothing=label_smoothing, attack=attack)

    def evaluate_2():
      # Accuracy of adversarially trained model on legitimate test inputs
      eval_params = {'batch_size': batch_size}
      accuracy = model_eval(sess, x, y, preds_2, x_test, y_test,
                            args=eval_params)
      print('Test accuracy on legitimate examples: %0.4f' % accuracy)
      report.adv_train_clean_eval = accuracy

      # Accuracy of the adversarially trained model on adversarial examples
      accuracy = model_eval(sess, x, y, preds_2_adv, x_test,
                            y_test, args=eval_params)
      print('Test accuracy on adversarial examples: %0.4f' % accuracy)
      report.adv_train_adv_eval = accuracy

    # Perform and evaluate adversarial training
    train(sess, loss_2, x_train, y_train, evaluate=evaluate_2,
          args=train_params, rng=rng)

    # Calculate training errors
    if testing:
      eval_params = {'batch_size': batch_size}
      accuracy = model_eval(sess, x, y, preds_2, x_train, y_train,
                            args=eval_params)
      report.train_adv_train_clean_eval = accuracy
      accuracy = model_eval(sess, x, y, preds_2_adv, x_train,
                            y_train, args=eval_params)
      report.train_adv_train_adv_eval = accuracy
    # end with tf.Session():

  return report

cifar_pgd(batch_size=32, nb_epochs=20)

x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Defined TensorFlow model graph.
. None
Model was not loaded, training from scratch.
Train on 50000 samples
num_devices:  1


[INFO 2020-01-22 01:00:06,494 cleverhans] Epoch 0 took 13.329843997955322 seconds


Test accuracy on legitimate examples: 0.409200


[INFO 2020-01-22 01:00:22,064 cleverhans] Epoch 1 took 11.162869930267334 seconds


Test accuracy on legitimate examples: 0.436500


[INFO 2020-01-22 01:00:34,220 cleverhans] Epoch 2 took 11.305679559707642 seconds


Test accuracy on legitimate examples: 0.504700


[INFO 2020-01-22 01:00:46,403 cleverhans] Epoch 3 took 11.287597894668579 seconds


Test accuracy on legitimate examples: 0.521500


[INFO 2020-01-22 01:00:59,029 cleverhans] Epoch 4 took 11.679455757141113 seconds


Test accuracy on legitimate examples: 0.551100


[INFO 2020-01-22 01:01:11,470 cleverhans] Epoch 5 took 11.579076051712036 seconds


Test accuracy on legitimate examples: 0.566100


[INFO 2020-01-22 01:01:23,558 cleverhans] Epoch 6 took 11.23734450340271 seconds


Test accuracy on legitimate examples: 0.570100


[INFO 2020-01-22 01:01:35,985 cleverhans] Epoch 7 took 11.573017358779907 seconds


Test accuracy on legitimate examples: 0.583000


[INFO 2020-01-22 01:01:48,313 cleverhans] Epoch 8 took 11.442841053009033 seconds


Test accuracy on legitimate examples: 0.600100


[INFO 2020-01-22 01:02:00,694 cleverhans] Epoch 9 took 11.335247993469238 seconds


Test accuracy on legitimate examples: 0.604100


[INFO 2020-01-22 01:02:12,878 cleverhans] Epoch 10 took 11.326795816421509 seconds


Test accuracy on legitimate examples: 0.610800


[INFO 2020-01-22 01:02:25,159 cleverhans] Epoch 11 took 11.423813104629517 seconds


Test accuracy on legitimate examples: 0.617900


[INFO 2020-01-22 01:02:37,542 cleverhans] Epoch 12 took 11.50550389289856 seconds


Test accuracy on legitimate examples: 0.627000


[INFO 2020-01-22 01:02:50,778 cleverhans] Epoch 13 took 12.32450246810913 seconds


Test accuracy on legitimate examples: 0.623100


[INFO 2020-01-22 01:03:03,126 cleverhans] Epoch 14 took 11.471078395843506 seconds


Test accuracy on legitimate examples: 0.639000


[INFO 2020-01-22 01:03:15,640 cleverhans] Epoch 15 took 11.653642654418945 seconds


Test accuracy on legitimate examples: 0.640100


[INFO 2020-01-22 01:03:27,849 cleverhans] Epoch 16 took 11.315950632095337 seconds


Test accuracy on legitimate examples: 0.643800


[INFO 2020-01-22 01:03:40,263 cleverhans] Epoch 17 took 11.558930158615112 seconds


Test accuracy on legitimate examples: 0.648400


[INFO 2020-01-22 01:03:52,851 cleverhans] Epoch 18 took 11.73654317855835 seconds


Test accuracy on legitimate examples: 0.650400


[INFO 2020-01-22 01:04:05,492 cleverhans] Epoch 19 took 11.744518280029297 seconds


Test accuracy on legitimate examples: 0.655100




Test accuracy on adversarial examples: 0.122500

Repeating the process, using adversarial training
Train on 50000 samples




num_devices:  1


[INFO 2020-01-22 01:08:49,993 cleverhans] Epoch 0 took 112.98240733146667 seconds


Test accuracy on legitimate examples: 0.4187
Test accuracy on adversarial examples: 0.2815


[INFO 2020-01-22 01:11:07,334 cleverhans] Epoch 1 took 110.0533185005188 seconds


Test accuracy on legitimate examples: 0.4538
Test accuracy on adversarial examples: 0.3147


[INFO 2020-01-22 01:13:16,218 cleverhans] Epoch 2 took 110.19014978408813 seconds


Test accuracy on legitimate examples: 0.4923
Test accuracy on adversarial examples: 0.3319


[INFO 2020-01-22 01:15:26,400 cleverhans] Epoch 3 took 111.28938484191895 seconds


Test accuracy on legitimate examples: 0.5096
Test accuracy on adversarial examples: 0.3456


[INFO 2020-01-22 01:17:39,408 cleverhans] Epoch 4 took 113.27765417098999 seconds


Test accuracy on legitimate examples: 0.5219
Test accuracy on adversarial examples: 0.3402


[INFO 2020-01-22 01:19:50,991 cleverhans] Epoch 5 took 111.40648746490479 seconds


Test accuracy on legitimate examples: 0.5369
Test accuracy on adversarial examples: 0.3417


[INFO 2020-01-22 01:22:00,382 cleverhans] Epoch 6 took 110.04383587837219 seconds


Test accuracy on legitimate examples: 0.5235
Test accuracy on adversarial examples: 0.3455


[INFO 2020-01-22 01:24:09,576 cleverhans] Epoch 7 took 110.36229038238525 seconds


Test accuracy on legitimate examples: 0.5437
Test accuracy on adversarial examples: 0.3628


[INFO 2020-01-22 01:26:17,250 cleverhans] Epoch 8 took 109.18047666549683 seconds


Test accuracy on legitimate examples: 0.5330
Test accuracy on adversarial examples: 0.3532


[INFO 2020-01-22 01:28:27,253 cleverhans] Epoch 9 took 110.86108469963074 seconds


Test accuracy on legitimate examples: 0.5364
Test accuracy on adversarial examples: 0.3605


[INFO 2020-01-22 01:30:36,888 cleverhans] Epoch 10 took 109.8416965007782 seconds


Test accuracy on legitimate examples: 0.5592
Test accuracy on adversarial examples: 0.3617


[INFO 2020-01-22 01:32:45,251 cleverhans] Epoch 11 took 109.24990940093994 seconds


Test accuracy on legitimate examples: 0.5813
Test accuracy on adversarial examples: 0.3538


[INFO 2020-01-22 01:34:53,952 cleverhans] Epoch 12 took 109.58411264419556 seconds


Test accuracy on legitimate examples: 0.5782
Test accuracy on adversarial examples: 0.3481


[INFO 2020-01-22 01:37:01,865 cleverhans] Epoch 13 took 109.00561594963074 seconds


Test accuracy on legitimate examples: 0.5888
Test accuracy on adversarial examples: 0.3629


[INFO 2020-01-22 01:39:10,675 cleverhans] Epoch 14 took 110.08999133110046 seconds


Test accuracy on legitimate examples: 0.5782
Test accuracy on adversarial examples: 0.3642


[INFO 2020-01-22 01:41:18,472 cleverhans] Epoch 15 took 109.10942268371582 seconds


Test accuracy on legitimate examples: 0.5840
Test accuracy on adversarial examples: 0.3563


[INFO 2020-01-22 01:43:26,587 cleverhans] Epoch 16 took 108.9611828327179 seconds


Test accuracy on legitimate examples: 0.5950
Test accuracy on adversarial examples: 0.3529


[INFO 2020-01-22 01:45:36,246 cleverhans] Epoch 17 took 110.32626533508301 seconds


Test accuracy on legitimate examples: 0.5838
Test accuracy on adversarial examples: 0.3578


[INFO 2020-01-22 01:47:44,696 cleverhans] Epoch 18 took 109.45361185073853 seconds


Test accuracy on legitimate examples: 0.5932
Test accuracy on adversarial examples: 0.3619


[INFO 2020-01-22 01:49:53,164 cleverhans] Epoch 19 took 109.46263790130615 seconds


Test accuracy on legitimate examples: 0.6000
Test accuracy on adversarial examples: 0.3624


<cleverhans.utils.AccuracyReport at 0x7f5ec5400eb8>