In [1]:
import numpy as np
import scipy
from scipy.linalg import fractional_matrix_power as fracpow
from numpy.linalg import matrix_power as matpow
from numpy.linalg import eig as eigvv
np.set_printoptions(threshold=np.nan)

# Brief Overview of MagNet <br> Adversarial Defense by Meng & Chen
## Zigfried Hampel-Arias

### Ideas for NIPS-2018 Competition
15 October, 2018

### Slides
https://zhampel.github.io/magnet-brief-overview

### Contact

Find me on:

<a href="https://zhampel.github.io/">
<img src="images/octocat.png" alt="Go My GitHub" width="60" height="60" border="0"> </a>

<a href="https://www.linkedin.com/in/zhampel-arias/">
<img src="images/LinkedIn-InBug-2C.png" alt="Go to my LinkedIn" width="60" height="60" border="0">
</a>


## Overview

- Brief Terminology
- MagNet Architecture & Functionality
- Results on MNIST & CIFAR-10 by Meng & Chen (2017)

## Notation

- $\mathcal{S}$ is the set of all images
- $\mathcal{N}_t$ is the set of natural training images ($|\mathcal{N}_t| << |\mathcal{S}|$)
- $\mathcal{S} \backslash \mathcal{N}_t$ is the set of adversarial images

<table>
    <td> <img src="images/Set_full.png" width="200px" height="200px"> </td>
    <td> <img src="images/Set_S_exc_N.png" width="200px" height="200px"> </td>
</table>


## Magnet Architecture
![magnet_arch](images/MagNet_arch.png)

## Detector

The **detector** determines whether an image is *clearly* adversarial.
- Autoencoder $ae(x)$ that compares input and training set similarity
- Is trained **only** on *training* dataset

- User defined threshold built on
    - **reconstruction error** $||x-ae(x)||_p$
    - **probability divergence** comparing logits from classifier (don't fully understand this one...)
- Best for examples far from $\mathcal{N}_t$

- Decision action:
    - Yes -> attack identified
    - No -> feed to *reformer*

## Reformer

The **reformer** *smooths* an image into the space of admissible inputs to the classifier.
- Autoencoder $ae(x)$ producing a re-representation of x
- If $x \notin \mathcal{N}_t$, can we bring image closer to the dataset space?
$$
ae(x) : \mathcal{S} \to \mathcal{N}_t
$$

- Is trained only on *training* dataset
- Won't affect normal examples, will move adversarial example towards a normal one
- Effectively a denoiser for examples near $\mathcal{N}_t$

## Teamwork

![set_det_ref_regions_zoom](images/Set_det_ref_regions_zoom.png)

## Teamwork
- Green circles = images s.t. $x \in \mathcal{N}_t$
- Red crosses = images s.t. $x \in \mathcal{S} \backslash \mathcal{N}_t$
<img src="images/det_ref_func.png" width="400px">

## MagNet Architecture

- Feed through set of detectors, see if any get set off
- If not clearly an attack, feed through reformer
    - If $x \in \mathcal{N}_t$, $ae(x) = x$
    - If $x \notin \mathcal{N}_t$, $ae(x) \approx x' \in \mathcal{N}_t $

<img src="images/MagNet_arch.png" width="500px">

## Meng & Chen Results

Example adversarial MNIST images

<img src="images/mnist_examples.png" width="400px">

## Classification Accuracies

- Build model with accuracies of
    - 99.4% on MNIST
    - 90.6% on CIFAR-10

| ![mnist_table](images/mnist_table.png)  | ![mnist_table](images/cifar_table.png) |
|:---:|:---:|
| MNIST | CIFAR-10 |


## Classification Accuracies

Testing with Carlini's $L_2$ attack

![acc_figure](images/acc_figures.png)


## Graybox Attack

Graybox attack: knowledge of the architecture is known, but not the parameters.
- Not quite sure why 'Random' row is not the average...

<img src="images/graybox_attack.png" width="400px">


## My MNIST Results
- Ran their test code
- Slightly different behavior, but better than in the paper
- Had access only to coarser pre-trained attacks

<img src="images/my_defense_performance.pdf" width="400px">

## Updating MagNet

- Submitted PR making code compatible with updated TF
- And some spelling mistakes they had...
- Specifically a call to load_model using a custom loss function when... loading models<br>
    `load_model(classifier_path, custom_objects={'custom_loss': custom_loss})`

# Summary

- Agnostic to both the **classifier model** & the **adversarial attack**

- For NIPS purposes, reformer is probably sufficient
    - We know there is an attack...
    - Potentially improve upon reformer structure
![magnet_arch_truncated](images/MagNet_NIPS_Truncated_arch.png)

# Summary

- Agnostic to both the **classifier model** & the **adversarial attack**
- For NIPS purposes, reformer is probably sufficient
    - We know there is an attack...
    - Potentially improve upon reformer structure
- Could propose second reformer (based on inference time) chosen at random excluding first reformer arch.
- Need to test inference timing of reformer + classifier

- Made changes to update MagNet

- Example attacks in MagNet are for MNIST only!
- Could work on getting Carlini's code into a more general adversarial attack package for us to use

# Thank you for your attention!

<br>

![final](images/sunset-gsl.jpg)

## Any Further Questions?

# Backups