
## Visual debugging
### How to understand and improve your model

Visual debugging is important method to know what is working and what is not in your model. I.e. confusion matrix is just first step, but you could do much more: for example, project features to 2D space with help of such methods as PCA, tSNE, or UMAP. Color them by class label. Show actual image for current point in tooltip. Change your mind and color each point by position in confusion matrix. Show actual image for this point + image with [LIME](https://lime-ml.readthedocs.io/en/latest/) explanation of result, etc. So below I gona show some of these steps.


Just short reminder: we've already have trained model and data in file "features_backup.csv" (see [Step 2](step2.ipynb)). Each row in this file corresponds to one image  and consists of numerical features (2048 values), path to specific image, and class label - ie "positive" or "negative". 

To run code below without errors, you need directory __"./datasets/positive/"__ with examples of images with amber mining, and  __"./datasets/negative/"__ with images without such patterns.



In [2]:
import tensorflow as tf
import numpy as np

import pandas as pd

import time
from os import listdir, environ
from os.path import isfile, join

from sklearn.manifold import TSNE
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, f1_score, recall_score
from sklearn.externals import joblib

from xgboost import XGBClassifier


from bokeh.plotting import figure, output_file, output_notebook, show, ColumnDataSource
from bokeh.models import HoverTool


### Load features & trained classifier

In [6]:
df = pd.read_csv("features_backup.csv")
clf = joblib.load( 'versions/xgb_model.pkl')

### Time for the modelling!


In [7]:
X = df.iloc[:,0:2048].values  # numeric feature values for each image
Y = df.iloc[:, 2048].values   # labels for each image
tiles = df["path"].values     # path to image files 

# split each list to test and train parts 
X_train, X_test, Y_train, Y_test, tiles_train, tiles_test = train_test_split(X, Y, tiles, test_size = 0.3, 
                                                                             random_state = 43)


### Model evaluation

In [8]:
# Classify
Y_pred = clf.predict(X_test) 
# Lets see confusion matrix
print(confusion_matrix(Y_test, Y_pred))

[[277   2]
 [ 13  53]]


So we have 277 true negatives, 53 true positives, 13 false negatives, 2 false positives. Lets look at them on tSNE scatterplot. Main question: which images was classified incorrectly? (images for false negatives and false positives)  

In [9]:
# Helper function for bokeh charts

def colors(labels_true, labels_predicted):
    """ Color of point(for each image) depend on position in confusion matrix """
    colors = list() 
    labels = list()
    for l_true, l_predicted in zip(labels_true, labels_predicted):
        if l_true == "positive":
            if l_predicted == l_true:
                color = "#f4b760"
                class_result = "True positive"
            else: 
                color = "black"
                class_result = "False negative"
        else:
            if l_predicted == l_true: 
                color = "#1E90FF"
                class_result = "True negative"
            else:
                color = "red" 
                class_result = "False positive"
                
        colors.append(color)
        labels.append(class_result)
        
    return (colors, labels)    
 
             
 

# helper to show interactive 2d scatterplot
def model_scatterplot( df, Y_predicted):
           
    point_colors, point_labels = colors(df['label'], Y_predicted)
    source = ColumnDataSource(data=dict(
        x = df['x'],
        y = df['y'],
        color = point_colors, 
        label = point_labels,
        imgs = df['path']

    ))

    hover = HoverTool( tooltips="""
        <div>
            <div>
                <img
                    src="@imgs" height="200" alt="@imgs" width="200"
                    style="float: left; margin: 0px 15px 15px 0px;"
                    border="0"
                ></img>
            </div>

        </div>
        """
    )

    p = figure(plot_width=1000, plot_height=1000, tools=[hover],
           title="tSNE plot of image features. Mouse over the dots to see corresponding images", toolbar_location="above")

    p.circle('x', 'y', size=8, fill_color = 'color', alpha=0.5, line_color = None, source = source, legend='label')
    return p


#### We use tSNE as a methos to show high-dimensions vectors on 2d scatterplot 

So each point on scatterplot below is representation of image. You can see which image if you move mouse over that point. points colored after results of classification. Lets calculate tSNE projection first

In [12]:
RS = 2018

#time_start = time.time()

tsne = TSNE(n_components=2, verbose=0, perplexity=20, n_iter=1000, random_state=RS)
tsne_results = tsne.fit_transform(X_test)

#print 't-SNE done! Time elapsed: {} seconds'.format(time.time()-time_start)


d = {'x':tsne_results[:,0],
     'y':tsne_results[:,1],
     'label':Y_test,
     'path':tiles_test
    }

df_tsne = pd.DataFrame(d)

#### Make scatterplot

In [13]:
html_name = None

if html_name:
    output_file(html_name)
else:
    output_notebook()


show(model_scatterplot( df_tsne, Y_pred))

#### Interpretation of plot

If you hover on group of closely located points, you could see that images from tooltips indeed are very similar. Most interested for us are black points(false negatives) and red points(fasle positives). From looking on image you can see that in case of upper point, it's a cemetery (with graves digged in ground). 

So, the only difference between wholes digged for amber is that in latter case pattern is not so regular as in former. In such cases, during my work for production version of model, I'd rather add more examples of cemeteries to a negative set. 

Indeed, if you remade plot above with production version of classifier from xgb_model_v003.pkl, then image from cemetery will became correctly classified as "negative", ie without amber mining. 

Due to use of visual debugging (updating positive and negative data sets) I've increased f1 score by 3%. But more important point of such a charts is your understanding of how model works - by which visual features images were classified? 




#### ... add a little LIME to recipe
Besides showing original image it's better to look at parts of this image with most impact on our classificator's decision. We can do it with a [LIME](https://lime-ml.readthedocs.io/en/latest/) library (one of the methods to interpret a machine models) 

#### Further development
One can imagine chart that is similar to above, for each step of model development: i.e. sequence of frames with scatterplot, with changes in point colors when we are changing model parameters or data set.