In [None]:
from deepview import DeepView
import matplotlib.pyplot as plt
import numpy as np
import time
# ---------------------------
import demo_utils as demo

%load_ext autoreload
%autoreload 2
%matplotlib qt

In [None]:
# matplotlib qt seems to be a bit buggy with notebooks, so we execute it multiple times
%matplotlib qt

# Table of contents

<br>

<font size="+1"><b>
    
 0. [Usage Instructions](#DeepView-Usage-Instructions)
 0. [Setting up DeepView](#Demo-with-Torch-model)
 0. [Tuning $\lambda$ Hyperparameter](#Tuning-the-$\lambda$-Hyperparameter)
 
</b></font>

## DeepView Usage Instructions

 1. Create a wrapper funktion like ```pred_wrapper``` which receives a numpy array of samples and returns according class probabilities from the classifier as numpy arrays
 2. Initialize DeepView-object and pass the created method to the constructor
 3. Run your code and call ```add_samples(samples, labels)``` at any time to add samples to the visualization together with the ground truth labels.
    * The ground truth labels will be visualized along with the predicted labels
    * The object will keep track of a maximum number of samples specified by ```max_samples``` and it will throw away the oldest samples first
 4. Call the ```show``` method to render the plot

The following parameters with "(!)" must be specified on initialization:

| <p align="left">Variable               | <p align="left">Meaning           |
|------------------------|-------------------|
| <p align="left">(!)```pred_wrapper```     | <p align="left">Wrapper function allowing DeepView to use your model. Expects a single argument, which should be a batch of samples to classify. Returns (valid / softmaxed) prediction probabilities for this batch of samples. |
| <p align="left">(!)```classes```          | <p align="left">Names of all different classes in the data. |
| <p align="left">(!)```max_samples```      | <p align="left">The maximum amount of samples that DeepView will keep track of. When more samples are added, the oldest samples are removed from DeepView. |
| <p align="left">(!)```batch_size```       | <p align="left">The batch size used for classification |
| <p align="left">(!)```data_shape```       | <p align="left">Shape of the input data (complete shape; excluding the batch dimension) |
| <p align="left">```resolution```       | <p align="left">x- and y- Resolution of the decision boundary plot. A high resolution will compute significantly longer than a lower resolution, as every point must be classified, default 100. |
| <p align="left">```cmap```             | <p align="left">Name of the colormap that should be used in the plots, default 'tab10'. |
| <p align="left">```interactive```      | <p align="left">When ```interactive``` is True, this method is non-blocking to allow plot updates. When ```interactive``` is False, this method is blocking to prevent termination of python scripts, default True. |
| <p align="left">```title```            | <p align="left">Title of the deepview-plot. |
| <p align="left">```data_viz```         | <p align="left">DeepView has a reactive plot, that responds to mouse clicks and shows the according data sample, when it is clicked. You can pass a custom visualization function, if ```data_viz``` is None, DeepView will try to show each sample as an image, if possible. (optional, default None)  |
| <p align="left">```mapper```           | <p align="left">An object that maps samples from the data space to 2D space. Normally UMAP is used for this, but you can pass a custom mapper as well. (optional)  |
| <p align="left">```inv_mapper```       | <p align="left">An object that maps samples from the 2D space to the data space. Normally ```deepview.embeddings.InvMapper``` is used for this, but you can pass a custom inverse mapper as well. (optional)  |
| <p align="left">```kwargs```       | <p align="left">Configuration for the embeddings in case they are not specifically given in mapper and inv_mapper. Defaults to ```deepview.config.py```.  (optional)  |
    
    
The hyperparameters that influence the visualization are

    
|Variable            |Meaning          |
|--------------------|------------------|
| ```lam```          | (Fisher metric parameter) Controls the amount of euclidean regularization of the Fisher metric, the larger the more. Between 0 and 1, default 0.65. |
| ```n_neighbors```  | (UMAP parameter) Number of neighbors used in UMAP. Determines how many points are considered to be close for each data point, roughly speaking. Default is 30. |
| ```a```            | (inverse mapping parameter) Determines the nonlinearity of the inverse mapping. Large values correspond to nonlinear functions. Compared to the definition in the paper, we additionally devide by the range of the data, such that the letter can be ignored in setting this parameter. Default is 500. 
| ```min_dist```     | (UMAP parameter) The minimum distance between embedded points. Smaller values cause more clustered or clumped visualizations. Interdepends with ```spread```. Default is 0.1. |
| ```spread```       | (UMAP parameter) The scale of embedded points. Together with ```min_dist``` causes more/less clustering. Default is 1.0.|
|```n```             | (Fisher metric parameter) Number of interpolation steps for distance calculation between two points. In the paper, this is also called n, default 5. |
| ```random_state``` | (UMAP parameter) Seed used by UMAP. |
    
  
In this notebook, we will focus on the most important paramter ```lam``` (corresponds to $\lambda$ in the paper)
    

## Demo with Torch model

1. Initialize a pretrained torch model
2. Create CIFAR10 dataset
3. Write the wrapper function ```pred_wrapper```
    1. (optional) Create a visualization function if you want to visualize single examples by clicking in the DeepView plot.
4. Initialize a ```DeepView``` object
5. Add samples to DeepView and call ```deepview.show```

In [None]:
import torch

# device will be detected automatically
# Set to 'cpu' or 'cuda:0' to set the device manually
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# get torch model
torch_model = demo.create_torch_model(device)
# get CIFAR-10 data
testset = demo.make_cifar_dataset()

print('\nUsing device:', device)

In [None]:
# softmax operation to use in pred_wrapper
softmax = torch.nn.Softmax(dim=-1)

# this is the prediction wrapper, it encapsulates the call to the model
# and does all the casting to the appropriate datatypes
def pred_wrapper(x):
    with torch.no_grad():
        x = np.array(x, dtype=np.float32)
        tensor = torch.from_numpy(x).to(device)
        logits = torch_model(tensor)
        probabilities = softmax(logits).cpu().numpy()
    return probabilities

def visualization(image, point2d, pred, label=None, title=None):
    f, a = plt.subplots()
    a.set_title(title)
    a.imshow(image.transpose([1, 2, 0]))

# the classes in the dataset to be used as labels in the plots
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# --- Deep View Parameters ----
batch_size = 512
max_samples = 500
data_shape = (3, 32, 32)
lam = .65 # default parameter
title = 'ResNet-20 - CIFAR10'

deepview = DeepView(pred_wrapper, classes, max_samples, batch_size, 
                    data_shape, lam=lam, title=title)

In [None]:
n_samples = 200
sample_ids = np.random.choice(len(testset), n_samples)
X = np.array([ testset[i][0].numpy() for i in sample_ids ])
Y = np.array([ testset[i][1] for i in sample_ids ])

t0 = time.time()
deepview.add_samples(X, Y)
deepview.show()

print('Time to calculate visualization for %d samples: %.2f sec' % (n_samples, time.time() - t0))

> The visualization seems reasonable with the default $\lambda$ = 0.65. But how can we evaluate which value is good for our data set?


## Tuning the $\lambda$ Hyperparameter

For $\lambda = 0$, the DeepView plot will be organized according to the fisher metric, induced by the certainty of the model. This tends to create collapsed clusters, since variation inside a region of one class (as viewed by the classifier) is strongly reduced.

<img alt="img_collapsed_clusters" width=400px src="https://user-images.githubusercontent.com/30961397/90417945-be51fa00-e0b4-11ea-8951-d0183432c90b.png">

For $\lambda = 1$, the DeepView plot will be organized according to structural properties of the datapoints as measured by the euclidean metric, ignoring the predictions of the model. So for more complex datasets, the points will be scattered all over the place, as the euclidean metric is an insufficient indicator for sample class.

<img alt="img_diffused_clusters" width=400px src="https://user-images.githubusercontent.com/30961397/90418110-fc4f1e00-e0b4-11ea-8c85-fa3b53a8e531.png">

A balance must be found here in order to obtain a discriminative visualization (a visualization that is dominated by the class structure as induced by the classifier) that also locally incorporates as much structure of the original data as possible.

**The [DeepView-paper](https://www.ijcai.org/Proceedings/2020/0319.pdf) suggests the following tuning method for $\lambda$:**

As a metric for how well the embedding $\pi$ is representing the view of the model, the consistency of the point clusters in the embedding with the models predictions can be used:
> [...] points that are close to each other should be points that are classified similarity by the classification model.

To evaluate on this, a KNN model is trained on the embedded samples with the model predictions as labels.
Thus, the leave-one-out error $Q_{kNN}$ of this classifier is the according metric. In order to tune $\lambda$,

> [...] we evaluate $\pi$ for $Q_{kNN}$ with $\lambda \in [0.2;0.8]$ and choose the largest one that does not degrade $Q_{kNN}$ significantly.[...]

This is applied here for 6 values linearly distributed accross $[0; 1]$

In [None]:
from deepview.evaluate import evaluate_umap

print('Evaluation of DeepView: %s\n' % deepview.title)

for l in np.linspace(0., 1., 6):
    deepview.verbose = False
    deepview.set_lambda(l)
    q_knn = evaluate_umap(deepview, True)
    print('Lambda: %.2f - Q_kNN: %.3f' % (l, q_knn))

In [None]:
for l in np.linspace(0.6, 0.8, 6):
    deepview.verbose = False
    deepview.set_lambda(l)
    q_knn = evaluate_umap(deepview, True)
    print('Lambda: %.2f - Q_kNN: %.3f' % (l, q_knn))

**Results**

These values show, that around  $\lambda = 0.64$ a jump seems to occur (this varies a bit due to the randomly selected data subset). Hence, a value around this is reasonable here.

**Plot after $\lambda$-tuning**

<img alt="img_good_clusters" width=500px src="https://user-images.githubusercontent.com/30961397/90418547-8a2b0900-e0b5-11ea-8ae2-2b4a76c2e6ce.png">

In [None]:
deepview.close()