## Label image data in plate 180528_Plate3
*Vladislav Kim*


* [Introduction](#1)
* [Initial training set](#2)

<a id="1"></a> 
## Introduction
The idea of this notebook series is to train a pseudo-online random forest classifier for AML vs stroma cell classification. From selected plates we (for now) sample 6 DMSO wells with the highest Calcein cell count, generate predictions and correct misclassified instances and check in live ("online") mode how the predictions improve as we add more data. Note that the classifier is not truly an online classifier as we don't update the model as we go, but completely retrain the RF classfier in multicore mode.

In general we can implement targeted online learning strategy: we can select a number of wells that are of interest to us (target wells), e.g. DMSO control wells or wells with certain high-priority drugs, the accuracy of which we want to improve, in the first place. We sample from these target wells from selected plates and evaluate the classification accuracy as we go (pseudo online learning).


<a id="2"></a>
## Initial Training Set: 180528_Plate3
At first we will re-train the classifier on the plate `180528_Plate3`, as it manifests a very striking contrast between mono- and co-cultures. We want to rule out the fact that this could be a segmentation (classification in this case) artefact 

In [None]:
# load third-party Python modules
import javabridge
import bioformats as bf
import skimage
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import pandas as pd
import os
import sys
sys.path.append('../../..')

javabridge.start_vm(class_path=bf.JARS)

In [None]:
from base.utils import load_imgstack
imgstack = load_imgstack(fname="../../data/AML_trainset/180528_Plate3/r02c14.tiff")

# remove a 'dummy' z-axis
img = np.squeeze(imgstack)

# nuclei
hoechst = img[:,:,0]**0.3

In [None]:
df = pd.read_csv('../../data/AML_trainset/180528_Plate3/r02c14.csv')

In [None]:
from segment.tools import read_bbox
rmax, cmax = hoechst.shape

bbox = read_bbox(df=df, rmax=rmax, cmax=cmax)

In [None]:
from base.plot import show_bbox
#show_bbox(hoechst, bbox)

**Plotly visualization works!**

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

In [None]:
from extra.viz import plotly_viz

In [None]:
from skimage.exposure import equalize_adapthist
gamma = 0.3
img_g = img**gamma
mip_rgb = equalize_adapthist(np.dstack((img_g[:,:,1],
                                        img_g[:,:,2],
                                        img_g[:,:,0])))

In [None]:
layout, cells = plotly_viz(mip_rgb, bb=bbox)

In [None]:
from extra.viz import plotly_predictions
ypred = np.zeros(len(bbox), dtype=np.int)
labels = ['cells']
layout, cells = plotly_predictions(img=mip_rgb, bb=bbox,
                                  ypred=ypred, labels=labels)

In [None]:
#iplot(dict(data=cells, layout=layout))

In [None]:
def get_train_instance(path, fname, pad=0):
    imgstack = load_imgstack(fname=os.path.join(path, fname + ".tiff"),
                            verbose=False)
    img = np.squeeze(imgstack)
    df = pd.read_csv(os.path.join(path, fname + ".csv"))
    rmax, cmax, _ = img.shape
    bbox = read_bbox(df=df, rmax=rmax,
                     cmax=cmax, pad=pad)
    return img, bbox

In [None]:
img, bbox = get_train_instance(path='../../data/AML_trainset/180528_Plate3',
                              fname='r02c14', pad=0)

In [None]:
cellbb = [img[x[2]:x[3], x[0]:x[1],0] for x in bbox]
cell = cellbb[5]

In [None]:
from transform.process import threshold_img
bw = threshold_img(cell, method='otsu', binary=True)

In [None]:
from base.future_versions import regionprops_table

In [None]:
keys = ['area',
 'bbox',
 'bbox_area',
 'centroid',
 'convex_area',
 'eccentricity',
 'equivalent_diameter',
 'euler_number',
 'filled_area',
 'major_axis_length',
 'minor_axis_length',
 'moments',
 'moments_central',
 'moments_hu',
 'moments_normalized',
 'orientation',
 'perimeter',
 'solidity']

In [None]:
from skimage.feature import greycomatrix, greycoprops
from skimage.util import img_as_ubyte


## create a data frame with texture features
glcm_props = ['contrast', 'dissimilarity',
              'ASM', 'energy', 'correlation']

def glcm_to_dataframe(glcm, prop):
    '''Compute GLCM property
    '''
    mat = greycoprops(glcm, prop=prop)
    columns = ['-'.join([prop, str(i)]) for i in range(len(mat.ravel()))]
    return pd.DataFrame(mat.ravel().reshape(1,-1),
                       columns=columns)


def compute_region_props(cell, keys,
                         distances=[3, 5, 7],
                         angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]):
    bw = threshold_img(cell, method='otsu', binary=True)
    df = pd.DataFrame(regionprops_table(bw.astype('int'),
                                        cell, properties=keys))
    
    glcm = greycomatrix(img_as_ubyte(cell),
                        distances=distances,
                        angles=angles)
    texture_df = pd.concat([glcm_to_dataframe(glcm, prop=p) for p in glcm_props],axis=1)
    
    
    return pd.concat([df, texture_df], axis=1)

In [None]:
df = compute_region_props(cell=cell, keys=keys)

Here we assume that an img is a 3D array with the last axis corresponding to color information. Thus for our leukemia data the size of each image array is: `(2160, 2160, 3)`. **Need to carefully compute features for this case** Should the properties be computed for each channel separately? **Also the functions should work for greyscale images**

In [None]:
from skimage.color import rgb2gray

class ImgX:
    def __init__(self, img, bbox, y=None):
        self.img = img
        self.bbox = bbox
        self.y = y
        
        self.data = None
        self.target_names = None
        
    def __setattr__(self, name, value):
        self.__dict__[name] = value

    def set_param(self, **kwargs):
        for k in kwargs.keys():
            self.__setattr__(k, kwargs[k])
        return self
    
    def _get_features(self, img):
        # compute features for all the bboxes
        return self
        
    def compute_props(self, n_chan, split=True):
        # split=True means that the color channels will be split and the
        # properties will be computed for each channel separately
        if n_chan > 1 and split:
            for c in range(n_chan):
                self._get_features(img=img[:,:,c])
        
        else:
            img_gray = rgb2gray(img)
            self._get_features(img=img_gray)
        return self

In [None]:
# old version of the function
def compute_props(self):
        X_prop_list =  [OT.get_regionprop_feats(mip_rgb=cbb,
                                                exclude=exclude) for cbb in cellbb_train]
        X_train_prop = np.vstack(X_prop_list)


        cellbb_norm = [resize(cb, (w, h), anti_aliasing=True) for cb in cellbb_train]
        X_train_norm = np.array([cbn.ravel() for cbn in cellbb_norm])
        # compute PCA of the image data set
        n_components = 150
        pca = PCA(n_components=n_components, svd_solver='randomized',
                  whiten=True).fit(X_train_norm)

        # project the train data
        X_train_pca = pca.transform(X_train_norm)

        X_train_all = np.append(X_train_pca, X_train_prop, axis=1)
        
        return self

**Modify `IncrementalClassifier` class to adapt to our use**

In [None]:
# incremental ("online") classifier
clf_incr = OT.IncrementalClassifier(path=path, featdir=featdir,
                                 select_well=select_inst[0],
                                 target_names=target_names,
                                 X_train_norm=X_train_norm,
                                 X_train_prop=X_train_prop,
                                 y_train=y_train
                                )

In [None]:
clf_incr = (clf_incr.load_img().
            train_classifier().
           generate_predictions().
           set_scene())

In [None]:
#clf_incr.plot()

In [None]:
newlabels = np.array([[45,2], [91,5], [85,0], [2,2]])

In [None]:
clf_incr = (clf_incr.
            add_instances(newlabels=newlabels).
            train_classifier().
            generate_predictions().update_scene())