# Data Prep for Crowd Annotation Pipeline

1. Collect raw data 
2. Adjust contrast of images
3. Chop up images into manageable pieces
4. Make into montages
5. Upload to Figure8

Files are named by these scripts such that the code blocks can run back-to-back with minimal input. For this reason, it is recommended that users run through the whole pipeline before processing another set of images.

In [None]:
# import statements
from __future__ import absolute_import

import os

from ipywidgets import fixed, interactive
from skimage.io import imread

%matplotlib inline

from dcde.pre_annotation.montage_makers import montage_maker, multiple_montage_maker
from dcde.pre_annotation.overlapping_chopper import overlapping_crop_dir
from dcde.pre_annotation.aws_upload import aws_upload, upload
from dcde.pre_annotation.montage_to_csv import csv_maker
from dcde.pre_annotation.fig_eight_upload import fig_eight
from dcde.pre_annotation.contrast_adjustment import adjust_folder, adjust_overlay

from dcde.utils.io_utils import get_img_names
from dcde.utils.widget_utils import choose_img, edit_image, choose_img_pair, overlay_images

In [None]:
#sometimes raw images are in .tif stacks, not individual .tif files
#optional code block for turning into individual slices

## 2. Adjust contrast of images
Before doing anything else, we need to adjust the contrast of the raw data. The following section of this notebook allows the user to interactively choose how the raw images will be processed. The user should adjust the images to make them the most clear for annotators; these contrast-adjustment images will only be used for annotation.

### Option 1: Annotations of images, no overlays
Some images, such as those of fluorescent nuclei, are relatively easy to annotate. Use the following code blocks to adjust the contrast of those images and save them. For more difficult data, such as cytoplasmic images, you may overlay two images (such as phase and fluorescence) to help guide annotators. To overlay images for annotation, skip to option 2.

This widget will allow the user to adjust the following settings, then apply them to a directory of images:
 - "blur" changes a gaussian filter that blurs or sharpens the image
 - "sobel_toggle" determines if a sobel filter is applied on top of the original image; if on, the edges of objects in the image will have the highest contrast
 - "sobel_factor" changes how heavily the sobel filter is applied to the original image, if "sobel_toggle" is on
 - "invert_img"  inverts the intensity range of the image, so that the maximum value becomes the minimum, and vice versa
 - "gamma_adjust" changes the overall brightness of the image without interfering with histogram normalization of the image

 - "equalize_hist" - uses histogram equalization of the whole image to rescale pixel values
 - "equalize_adapthist" - uses histogram equalization applied to local regions of the image to rescale pixel values

In [None]:
# Define path to desired raw directory
base_dir = "/home/gnv/data/example"
raw_folder = "raw"
identifier = "notebook_example"

raw_dir = os.path.join(base_dir, raw_folder)

In [None]:
# Choose which raw image you would like to use to test on the contrast adjustment
choose_raw = interactive(choose_img, name=get_img_names(raw_dir), dirpath =fixed(raw_dir));
choose_raw

In [None]:
# Test with choosen image to fix adjustment parameters
img = imread(choose_raw.result)
edit_raw = interactive(edit_image, image=fixed(img), blur=(0.0,4,0.1), gamma_adjust=(0.1,4,0.1), sobel_factor=(10,10000,100));
edit_raw

In [None]:
# With choosen parameters, process all the raw data in the folder
sigma = edit_raw.kwargs['blur']
hist = edit_raw.kwargs['equalize_hist']
adapthist = edit_raw.kwargs['equalize_adapthist']
gamma = edit_raw.kwargs['gamma_adjust']
sobel_option = edit_raw.kwargs['sobel_toggle']
sobel = edit_raw.kwargs['sobel_factor']
invert = edit_raw.kwargs['invert_img']

adjust_folder(base_dir, raw_folder, identifier, sigma, hist, adapthist, gamma, sobel_option, sobel, invert)

### Option 2: Overlay two images types for annotation
First, define the folders where your images can be found. This assumes that the images you want to overlay are in separate subfolders. The directory the contains these subfolders, "base_dir", is where contrast adjusted images and subsequent processed images will be saved (each in an appropriate subfolder). The subfolders should contain the same number of images; they are expected to be different channels of the same original image.

Next, a widget will load that allows you to scroll through the images contained in the source subfolders. The user can select a pair of images that are representative of the data set.

Next, a widget will load that allows the user to adjust image processing settings for the first image in the pair (the "raw" image). After you are happy with the image, move on to the next code block; the settings you have determined will be saved.
 
Next, a similar widget will load that allows the user to adjust the image that will be overlaid on the "raw" image. Once you are satisfied with this image, move on to the next code block; the settings you have determined will be saved.
 
Next, a widget will load that allows the user to adjust how the images are overlaid. The user can specify the weighting of the overlay, and change the brightness settings of the final image to increase contrast. The two images to be overlaid can be readjusted individually if they need to be, by going back to the previous widgets and changing the settings. Just re-run the overlay widget and the new settings will be loaded.
 
Finally, when you are satisfied with the adjustments made to the individual and overlaid images, running "adjust_overlay" will take the last-used settings from each widget, apply them to each image in the subfolders specified, and create an overlaid image. The adjusted images will be saved in a new folder; the original images will not be modified. The folder for the adjusted images will be named {raw}\_overlay\_{overlay} to indicate which source folders were combined.

In [None]:
# Define path to desired raw and overlay directories
base_dir = "/home/gnv/data/example"
raw_folder = "FITC"
overlay_folder = "phase"
identifier = "overlay_example"

raw_path = os.path.join(base_dir, raw_folder)
overlay_path = os.path.join(base_dir, overlay_folder)

In [None]:
#pick a matched pair of images to adjust contrast
#choose representative images for best results
max_frame = len(get_img_names(raw_path))

choose_pair = interactive(choose_img_pair, frame = (0, max_frame, 1), raw_dir = fixed(raw_path), overlay_dir = fixed(overlay_path), continuous_update = False);
choose_pair

In [None]:
#adjust raw image
raw_img = imread(choose_pair.result[0])
edit_raw = interactive(edit_image, image=fixed(raw_img), blur=(0.0,4,0.1), gamma_adjust=(0.1,4,0.1), sobel_factor=(10,10000,100));
edit_raw

In [None]:
#adjust overlay image
overlay_img = imread(choose_pair.result[1])
edit_overlay = interactive(edit_image, image=fixed(overlay_img), blur=(0.0,4,0.1), gamma_adjust=(0.1,4,0.1), sobel_factor=(10,10000,100));
edit_overlay

In [None]:
#overlay images
raw_adjusted = edit_raw.result
overlay_adjusted = edit_overlay.result
edit_combination = interactive(overlay_images, raw_img = fixed(raw_adjusted), overlay_img =fixed(overlay_adjusted), prop_raw =(0,1.0, 0.1), v_min = (0, 255, 1), v_max = (0, 255, 1))
edit_combination

In [None]:
#apply overlay settings to all images in folder
#modified images are saved to new folder and do not overwrite originals
raw_settings = edit_raw.kwargs
overlay_settings = edit_overlay.kwargs
combined_settings = edit_combination.kwargs

In [None]:
adjust_overlay(base_dir, raw_folder, overlay_folder, identifier, raw_settings, overlay_settings, combined_settings)

## 3. Chop up images into manageable pieces

Each full-size image usually has many cells in it. This makes them difficult to fully annotate! For ease of annotation (and better results), each frame is chopped up into smaller, overlapping frames, ultimately creating a set of movies. 

These smaller movies can be made with overlapping edges, making it easier to stitch annotations together into one large annotated movie (in the post-annotation pipeline). A large overlap will result in redundant annotations.

Even if you want to process the full-sized image, run the chopper with num_segments of 1. The montage makers are written to run on the output of the chopper.

In [None]:
image_input_folder = "FITC_overlay_phase"
image_input_dir = os.path.join(base_dir, image_input_folder)

num_x_segments = 4
num_y_segments = 4
overlap_perc = 10

In [None]:
overlapping_crop_dir(image_input_dir, identifier, num_x_segments, num_y_segments, overlap_perc)

## 4. Make into montages
multiple_montage_maker is written to run on the output of the chopper, ie the folder where each chopped movie folder is saved. It will make montages of each subfolder according to the variables specified. It will make more than one montage per subfolder if there are enough frames to do so.

 - "chopped_dir" is the path to the folder where chopped images were saved; this folder was created by overlapping_crop_dir, so the user should not change this variable if they are going straight through the pipeline
 - "save_dir" is the path to a folder where the montages will be saved; this folder will be created by multiple_montage_maker, and the user generally will not need to change this path
 - "log_dir" is where the json log containing information about the inputs to the montage maker will be saved
 - "montage_len" specifies how many frames the montage will hold
 - "row_length" specifies how many frames will be in each row of the montage
 - "x_buffer" specifies how many pixels of padding will separate each column of images
 - "y_buffer" specifies how many pixels of padding will separate each row of images

The variables used in multiple_montage_maker are saved in a JSON file so they can be reused in post-annotation processing.

In [None]:
chopped_dir = image_input_dir + "_chopped_" + str(num_x_segments) + "_" + str(num_y_segments)
#chopped_dir = "/home/gnv/data/example/FITC_overlay_phase_chopped_4_4"

save_dir = os.path.join(base_dir, identifier + "_montages_" + str(num_x_segments) + "_" + str(num_y_segments))
#save_dir = "/home/gnv/data/example/montages"

log_dir = os.path.join(base_dir, "json_logs")

montage_len = 10
row_length = 10
x_buffer = 20
y_buffer = 20

In [None]:
multiple_montage_maker(montage_len, chopped_dir, save_dir, identifier, 
                       num_x_segments, num_y_segments, row_length, x_buffer, y_buffer, log_dir)

## 5. Upload to Figure Eight
Now that the images are processed into montages, they need to be uploaded to an AWS bucket and submitted to Figure Eight. This involves uploading the files to AWS, making a CSV file with the links to the uploaded images, and using that CSV file to create a Figure Eight job.

### Upload files to AWS
aws_upload will look for image files in the specified directory (folder_to_upload, set by default to be wherever the output of multiple_montage_maker was saved) and upload them into a bucket. If you don't want to include all of the montages you have made in the figure eight job, move the montages of interest to a new folder and upload that.

For the Van Valen lab, the default bucket is "figure-eight-deepcell" and keys (aws_folder + file names) correspond to the file structure of our data server.

aws_upload returns a list of the urls to which images were uploaded.

In [None]:
bucket_name = "figure-eight-deepcell" #default
aws_folder = "gnv/data/example"
folder_to_upload = save_dir
#folder_to_upload = "/home/gnv/data/example/only_some_of_the_montages"

uploaded_montages = aws_upload(bucket_name, aws_folder, folder_to_upload)

### Make CSV file
Figure Eight jobs can be created easily by using a CSV file where each row contains information about one task. For our jobs, each row has the link to the location of one montage, and information about that montage (currently, just the "identifier" specified at the beginning of the pipeline). The CSV file is saved as "identifier".csv in a folder that only holds CSVs. CSV folders are usually in cell-type directories, so identifiers should be able to distinguish between sets, parts, etc.

In [None]:
csv_dir = os.path.join(base_dir, "CSV")

In [None]:
csv_maker(uploaded_montages, identifier, csv_dir)

### Create Figure Eight job
The Figure Eight API allows us to create a new job and upload data to it from this notebook. However, since our jobs don't include required test questions, editing job information such as the title of the job must be done via the website. This section of the notebook uses the API to create a job and upload data to it, then reminds the user to finish editing the job on the website.

Some sample job IDs to copy are provided below.

In [None]:
#job_id_to_copy = 1344258 #Elowitz timelapse RFP pilot
job_id_to_copy = 1346216 #Deepcell MouseBrain 3x5
#job_id_to_copy = 1306431 #Deepcell overlapping Mibi
#job_id_to_copy = 1292179 #Deepcell HEK
#job_id_to_copy = 1363594 #3T3 cytoplasm

In [None]:
fig_eight(csv_dir, identifier, job_id_to_copy)