## Label image data in plate 181109_Plate3
*Vladislav Kim*


* [Introduction](#1)
* [Initial training set](#2)

<a id="1"></a> 
## Introduction
The idea of this notebook series is to train a pseudo-online random forest classifier for AML vs stroma cell classification. From selected plates we (for now) sample 6 DMSO wells with the highest Calcein cell count, generate predictions and correct misclassified instances and check in live ("online") mode how the predictions improve as we add more data. Note that the classifier is not truly an online classifier as we don't update the model as we go, but completely retrain the RF classfier in multicore mode.

In general we can implement targeted online learning strategy: we can select a number of wells that are of interest to us (target wells), e.g. DMSO control wells or wells with certain high-priority drugs, the accuracy of which we want to improve, in the first place. We sample from these target wells from selected plates and evaluate the classification accuracy as we go (pseudo online learning).


<a id="2"></a>
## Initial Training Set: 181109_Plate3
At first we will re-train the classifier on the plate `181109_Plate3`, as it manifests a very striking contrast between mono- and co-cultures. We want to rule out the fact that this could be a segmentation (classification in this case) artefact 

In [None]:
# load third-party Python modules
import javabridge
import bioformats as bf
import skimage
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import pandas as pd
import os
import sys
import h5py
sys.path.append('../../..')

javabridge.start_vm(class_path=bf.JARS)

In [None]:
hf = h5py.File('trainset.h5', 'r')
ytrain = np.asarray(hf.get('plate3/ytrain'))
Xtrain = np.asarray(hf.get('plate3/Xtrain'))
columns = np.asarray(hf.get('plate3/columns'))
Xtrain_df = pd.DataFrame(Xtrain, columns=columns)
hf.close()

In [None]:
from base.utils import load_imgstack
from segment.tools import read_bbox
from sklearn.preprocessing import label_binarize

def get_train_instance(path, fname, pad=0):
    imgstack = load_imgstack(fname=os.path.join(path, fname + ".tiff"),
                            verbose=False)
    img = np.squeeze(imgstack)
    df = pd.read_csv(os.path.join(path, fname + ".csv"))
    rmax, cmax, _ = img.shape
    bbox = read_bbox(df=df, rmax=rmax,
                     cmax=cmax, pad=pad)
    return img, bbox

def make_labels(arr, label=1):
    return np.vstack((arr, label * np.ones(arr.shape, dtype=np.int))).T

In [None]:
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

In [None]:
path = '../../data/AML_trainset/181109_Plate3'

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r01c04', pad=20)

**Load `ImgX` and `IncrementalClassifier`.** `IncrementalClassifier` is a class that has `ImgX` instance and can accumulate training data.

In [None]:
sys.path.append('../../../../')
from bioimg.classify import ImgX, IncrementalClassifier

In [None]:
clf_incr = IncrementalClassifier()
clf_incr.ytrain = label_binarize(ytrain, classes=range(3))
clf_incr.Xtrain = Xtrain_df

In [None]:
gamma = 0.4
# adjust brightness by gamma correction
img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx
clf_incr.classes = ['apoptotic', 'viable', 'other']

In [None]:
clf_incr.set_classifier().train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

**Try loading a new image and replacing `imgx` in `IncrementalClassifier`**

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r05c14', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

Update the `imgx` in `clf_incr`:

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

Load the next image:

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r08c02', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

Update the `imgx` in `clf_incr`:

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
other = np.array([82, 117])

In [None]:
newlabels = make_labels(other, label=2)

In [None]:
clf_incr = clf_incr.add_instances(newlabels=newlabels)

In [None]:
clf_incr.train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

Load the next image:

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r08c04', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
other = np.array([11, 29, 70])
apoptotic = np.array([47,38,51,44])

In [None]:
newlabels = np.concatenate((make_labels(apoptotic, label=0),
                            make_labels(other, label=2)), axis=0)

In [None]:
clf_incr = clf_incr.add_instances(newlabels=newlabels)

In [None]:
clf_incr.train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r10c18', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
other = np.array([44,76])

In [None]:
newlabels = make_labels(other, label=2)

In [None]:
clf_incr = clf_incr.add_instances(newlabels=newlabels)

In [None]:
clf_incr.train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r11c02', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
other = np.array([40,50])

In [None]:
newlabels = make_labels(other, label=2)

In [None]:
clf_incr = clf_incr.add_instances(newlabels=newlabels)

In [None]:
clf_incr.train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r11c08', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

Add the last image:

In [None]:
img, bbox = get_train_instance(path=path,
                              fname='r12c24', pad=20)

img_g = img**gamma
# sort color channels in'RGB' order
img_rgb = np.dstack((img_g[:,:,1],
                     img_g[:,:,2],
                     img_g[:,:,0]))
# initialize 'ImgX' class
imgx = ImgX(img=img_rgb, bbox=bbox, n_chan=['Lysosomal', 'Calcein', 'Hoechst'])

In [None]:
clf_incr.imgx = imgx

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
viable = np.array([55])
other = np.array([3])

In [None]:
newlabels = np.concatenate((make_labels(viable, label=1),
                            make_labels(other, label=2)),
                           axis=0)

In [None]:
clf_incr = clf_incr.add_instances(newlabels=newlabels)

In [None]:
clf_incr.train_classifier()

In [None]:
clf_incr.generate_predictions()

In [None]:
#clf_incr.plot_predictions()

In [None]:
clf_incr.Xtrain.shape

**Output this training set. Train set for each plate is saved in a separate subgroup of the HDF5 file**

In [None]:
clf_incr.h5_write(fname='trainset.h5', group='final')