# Data Processing for Crowd Annotation Pipeline

1. Download job report from Figure Eight
2. Download annotated images from report
3. Clean up montages (remove small objects, fill small holes, sequentially label the annotations)
4. Process montages into movies
5. Process raw data into corresponding movies
6. Combine raw and annotated movies into npz format
7. Detect divisions (?)

Files are named by these scripts such that the code blocks can run back-to-back with minimal input. For this reason, it is recommended that users run through the whole pipeline before processing another set of images. The user can specify a few directory names and the "identifier" used in pre-annotation and run all cells in the notebook; alternate folder names can be used but this is not recommended.

To function properly, your working folder should contain subfolders:
- json_logs  
    - log from overlapping_chopper ({identifier}_overlapping_chopper_log.json)
    - log from montage_maker ({identifier}_montage_maker.json)
- raw images (can be named "raw" or something else)

The user will also need to supply:
- job ID for the data to download from figure eight
- API key for figure eight
- "identifier" to access correct json logs and name files correctly

If the default folder names are used, by the end of this pipeline, the working folder (base_dir) will contain subfolders named:

- CSV  
    - data that was uploaded to figure eight in pre-annotation notebook  
    - job report downloaded from figure eight

- annotations  
    - downloaded montages from figure eight, cleaned

- movies  
    - subfolders for different parts  
        - subfolders for each subsection of image  
            - subfolders for holding raw and annotated data  
                - images

In [None]:
#import statements
import os

from dcde.post_annotation.download_csv import download_and_unzip, save_annotations_from_csv
from dcde.post_annotation.clean_montages import clean_montages, relabel_montages, convert_grayscale_all
from dcde.post_annotation.montages_to_movies import raw_movie_maker, all_montages_chopper

In [None]:
#set working directory
#base_dir = "/base/directory/path/here"
#raw_dir = "/base/directory/path/here/folder_with_fullsize_raw_images"

base_dir = "/home/geneva/Desktop/Files_to_work_through/Data_for_crowd"
raw_dir = "/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/raw_slices_s7"

#identifier given during pre-annotation pipeline; if you're not sure, it's also in the job report csv
identifier = "MouseBrain_s7_nuc"

## 1. Download job report from Figure Eight
By default, this script will download, unzip, and rename the full report from Figure Eight as a .csv file. However, the user can change the report type if one of the other report options is more suitable for their use. (support for other report types not guaranteed with version 0 of this notebook)

The user can specify where the zip file should be downloaded and the .csv extracted; by default, the .csv file will be put into a subfolder named CSV (likely the same folder that contained the input data; the CSV files are named to prevent confusion). The report CSV will be renamed "job_{job_number}_{type of report}_report.csv".

#### From Figure Eight website:
full - Returns the Full report containing every judgment

aggregated - Returns the Aggregated report containing the aggregated response for each row

json - Returns the JSON report containing the aggregated response, as well as the individual judgments

gold_report - Returns the Test Question report

workset - Returns the Contributor report

source - Returns a CSV of the source data uploaded to the job

In [None]:
job_id_to_download = 1352262
job_type = "full"

In [None]:
download_and_unzip(job_id_to_download, base_dir)

## 2. Use report to download annotations
This script uses the information in the report to download each annotation. Montage annotations will be saved in the "annotations" subfolder (it will be created for you by the script).

Future update: more informative .csv files to facilitate multi-dataset downloads? not supported in this notebook

In [None]:
csv_dir = os.path.join(base_dir, "CSV")
csv_path = os.path.join(csv_dir, "job_" + str(job_id_to_download) + "_" + job_type + "_report.csv")

#csv_path = "/example/path/CSV/job_number_full_report.csv"

annotation_save = os.path.join(base_dir, "annotations")

In [None]:
save_annotations_from_csv(csv_path, annotation_save)

## 3. Clean up the montages
First, the RGB montage annotation is converted into grayscale, simplifying downstream use of the annotation.

Next, small changes to the morphology of the image are made. Sometimes during annotation, small holes or stray annotations will be submitted, as artifacts of the annotation process. However, these holes or stray pixels don't correspond to what should be annotated, so in this step, we use sci-kit image to fix these small mistakes.

Currently uses the old "clean_montage" function; this may change in future versions of notebook.

After cleaning the montage, user can optionally run "relabel_montages" block, which will relabel the annotations sequentially (eg, perhaps the annotator decided to use the labels 3, 5, and 7 to label cells; this code block would remake the image with labels 1, 2, and 3).

The cleaned and relabled annotations will overwrite the downloaded annotations.

In [None]:
annotations_folder = annotation_save
#annotations_folder = "/base/directory/path/here/wherever_you_moved_the_annotations"

In [None]:
convert_grayscale_all(annotations_folder)

In [None]:
clean_montages(annotations_folder)

In [None]:
#optional
relabel_montages(annotations_folder)

## 4. Process montages into movies

Each montage is composed of frames of a timelapse (or sometimes, a z-stack) that have been placed next to each other. This is useful for annotators, but we want to use these images frame by frame in movies. This section of the notebook takes montages, as well as the parameters used to make the montage (such as spacing between frames) to chop one montage into its constituent frames. These sequential frames will then be saved in subfolders together.

By default, these will be saved in a "movies" folder containing subfolders corresponding to the crop location of each montage (eg, x_1_y_0). Each subfolder will then contain a folder for the annotations of that position. The annotations folder will contain the image files for each frame.

In [None]:
all_montages_chopper(base_dir, identifier)

In [None]:
raw_movie_maker(base_dir, raw_dir, identifier)