# Project: MEMA organization-pattern automation
Script: Chun-Han Lin  
Data collection: Tiina Jokela  
Email: walkon302@gmail.com  
Update: 4/23/2018  
Update: 5/9/2018
Update: 5/11/2018

# Purpose and the use cases

MEMA can be used to study impacts of microenvironment on many cellular responses, including cellular morphological changes, drug responses, and cellular organization. MEMA images are processed either by CellProfiler or Python and generate the spreadsheet containing all the measurements. While most of the measurements are single-cell level data, which can be analyzed directly; cellular organization patterns are multiple-cells level information, which cannot be readily included in the result of the MEMA images analysis. 

It takes time to manually classify the MEMA images to different organization patterns and label each image with either well-organized or disorganized. Thus, the goal of this project is to develope a method to automate the process of labeling images with organization patterns for facilitating further MEMA analysis.

This automation can be intergrated into MEMA analysis pipeline or be used individually to access the cellular organization patterns in other experiments.

## Use case 1: For the MEMA experiments
After applying this method, the MEMA images are classified into different organization patterns. We can then use this information to study which microenvironmental components play important roles in cellular organization.

## Use case 2: For other experiments
After applying this method, we will be able to quicktly quantify the ratio of organization patterns in the experiment.

# Overview
There are two major goals:
1. Image classification:  
Let users use trained model to do image classification.
2. Model fine-tunning:  
Let users to fine-tune the model for better results on different cell strains.

## Image classification
Once the model has been trained using well-defined images, the trained model can be used directly to classify organization patterns. This part of scripts will include the following functions:  

Data Loading:  
It takes one folder containing MEMA images, pre-processes them for prediction.  

Prediction:  
It predicts the labels for each image in the folder from the previous step.  

Report:  
It generates csv or spreadsheet for downstream analysis and a summary html report.

## Model fine-tunning
Since the model is trained by certain cell strains, and the organization patterns may vary accross different cell strains, the model need to have the flexibility to be fine-tunned by the images from the new cell strais to do better job on classification. This part of scripts will include the following functions:  

Data loading:  
It takes two folders, one with well-organized cell images, and another with disorganized cell images, preprocesses them for tuning the model.

Fine-tunning:  
It feeds the trained model with data from previous step to fine-tune the parameters of the model.

Output the model and report:  
It outputs the tunned-model for image classification and a summary html report with cross validation accuracy before and after model tunning.

# Plan and deliverable
1. I will draft the scripts and use Jennifer's previous data to build a simple model. First goal is to build a working pipeline that Tiina can run locally without technical issues, including loading images, processing, and outputting a spreadsheet as well as html summary report. I aim to deliver this by next Friday (5/4).

# How to use

* Open terminal, the following procedures are all in the terminal

## Install Anaconda
* install anaconda https://repo.anaconda.com/archive/Anaconda2-5.1.0-MacOSX-x86_64.pkg  
-- install the anaconda  

## Type in the following commands in the terminal directly
* conda create -n mema python=2.7 anaconda  
-- create a new virtual environment called mema for running the script without
altering the original system  

* source activate mema  
-- activate the virtual environment  

* conda install -c https://conda.anaconda.org/menpo opencv3  
-- install proper version of opencv  

* pip install grpcio==1.9.1 https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py2-none-any.whl  
-- install proper version of tensorflow  

## Download the script
* Download the script from
https://github.com/walkon302/MEMA_organization/archive/master.zip  
-- download the file and unzip it on the desktop. Rename it for a shorter
name, e.g., 'mema' since the original name 'MEMA_organization-master' is too
long.  

## Type in the commands in the terminal directly
* cd ./desktop/mema  
-- enter the main folder of the script  

* mkdir input  
-- generate a folder named, 'input'  

* cd input  
-- enter the subfolder, 'input'  

* mkdir train_organized  
* mkdir train_disorganized    
* mkdir eval_disorganized  
* mkdir eval_disorganized  
* mkdir predict  
-- generate five folders, two for training, two for evaluation, the last one is
for new images that need to be classified.  

## Then move images and model into this mema folder
* Put images into those five folders accordingly.  
* Put the images that need to be classified into predict folder.  
* If there is pre-trained model, put that model in the mema folder.

## Type in the commands
* cd..  
-- go back to main folder, mema  

* cd src  
-- enter src folder  


## Ready to use, type in the commands.
### For training the model with new sets of data.
* python main.py train 10000
-- Train the model with images from train_organized and train_disorganized
folders for 10000 iterations.  
-- The number of iterations, 10000, can be changed.  

### For evaluating the accuracy of the model.
* python main.py eval  
-- Evaluate the performance.  

### For classifying the new set of images.
* python main.py predict result  
-- Classify the images in the predict folder and output a file named result in
the output folder.  
-- The name of the output file, result, can be changed.  

# The trained model
Once executing the command for training the model, a new folder named MEMA_model will be generated. This folder contains the model with several files and several versions if train multiple times. Except the latest model, other older models can be deleted to save some storage room.  
Make sure the model contains files as below:  
* model.ckpt-10120.data-00000-of-00001 (The latesed iteration number)
* model.ckpt-10120.index
* model.ckpt-10120.meta
* checkpoint

# Future direction of modification
## Model can be improved. 
This version only use the model from tensorflow MNIST example. A pre-trained model or a better-built model can be used for better result in the future. The new model can be plugged in the model.py file.  

## Image augmentation can be improved.
This version only utilize some of the techniques to augment the training data. This can be expanded with more methods to generate more data that generalize the model better. The new methods can be added in the ImageAugmentation class in data_prepared.py. Also, for the rotation, this version only rotates once with 90 degree. More rotated images can be generated to augmente the training data.

## Extensibility of the script.
This version only takes two folders and it hides many arguments from users. A extensible version of this script can be made that takes arbituary number of folders as training sets and uses them for training the model and making the classification.  

## Standalone version
Pyinstaller did not work on this script. A webapp version of this might be built and launched on cloud for internal usage.  