# Caliban Fig8 Upload Pipeline
This pipeline creates a Figure Eight job and uploads data files to an S3 bucket for data curation.

- Resize npz into smaller pieces (image dimensions) if needed
- Shorten length of npz if needed
- Upload files to AWS
- Create job csv and upload it to figure 8

**Note: if you need to start a Caliban job to correct the results of a previous Caliban job, please set "base_dir" then skip to the end of the notebook.**

In [1]:
# import statements
from __future__ import absolute_import

import math
import numpy as np
import os
import skimage.io as io

from deepcell_toolbox.pre_annotation.npz_preprocessing import reshape_npz, slice_npz_batches, slice_npz_folder, relabel_npzs_folder, crop_npz
from deepcell_toolbox.post_annotation.npz_postprocessing import reconstruct_npz
from deepcell_toolbox.pre_annotation.aws_upload import aws_caliban_upload
from deepcell_toolbox.pre_annotation.caliban_csv import initial_csv_maker, create_next_CSV
from deepcell_toolbox.pre_annotation.fig_eight_upload import fig_eight

## Load data for model training
We'll specify which channels will be used to generate preliminary labels from the model


In [None]:
# load data code here

## Run the data through the network to produce labels

In [None]:
# deepcell upload code here

## Postprocess the deepcell labels

In [2]:
# code to postprocess labels, select appropriate parameters

## Create combined image stack

combine labels with imaging channels

In [1]:
base_dir = '../example_data/'

In [None]:
# if the segmentation masks and channel TIFs are in different xarrays, we'll combine them together
channel_xr = xr.open_dataarray(os.path.join(base_dir, "segmentation_channels.xr"))
labels_xr = xr.open_dataarray(os.path.join(base_dir, "segmentation_labels.xr"))

combined_xr = data_utils.combine_xarrays((channel_xr, labels_xr), axis=-1)
combined_xr.to_netcdf(os.path.join(base_dir, "combined_xr.xr"))

## Split npz into pieces
The idea is that you'd be starting from one huge npz and breaking it into managable pieces. This part will probably include:
- reshape npz (size of each piece is smaller, but same number of frames)
- break up each npz into fewer frames (annotator does not necessarily need to do all 30+ frames of a movie at once)
- save these reshaped pieces as individual npz files, so they can be uploaded and worked on separately
- relabel the npzs as needed (choose between no relabel, relabel each cell in each frame to have unique label, or relabel to have unique labels but preserve 3D relationships)

These pieces will need specific names so that we can put them back together again if needed (especially putting frames back into longer contiguous movies).

In [2]:
base_dir = '../example_data/'

### Reshape npz (y and x dimensions) if needed

In [None]:
full_npz_path = '/data/figure_eight/HeLa-S3_cyto_movies/set7/HeLa_movie_s7_uncorrected_fullsize_all_channels.npz'
x_size = 320
y_size = 270
reshaped_save_dir = os.path.join(base_dir, 'reshaped')

In [None]:
reshape_npz(full_npz_path, x_size, y_size, save_dir = reshaped_save_dir)

### Crop npz into smaller x and y pieces
optionally takes an npz file and splits it into many smaller npzs, each of which can be submitted a separate Figure8 job

### Slice (t or z dimension) single npz if needed
#### Use this option if you do not need to reshape y and x in your npz before beginning work.
This option may be useful in the future for something like curation of 2D npzs. In that case, each slice is independent, so only one slice would be launched at a time. With only one frame to annotate, the annotation time per image allows for a larger image (annotating a larger image would also reduce the number of annotations on the edge, which can be tougher since there is less context to judge cell boundaries).

However, this option has limited use until Caliban supports zooming in and out.

In [None]:
# full_npz_path = ''
# batch_size = 1
# sliced_save_dir = ''

In [None]:
# slice_npz_batches(full_npz_path, batch_size, save_dir)

### Slice (t or z dimension) folder of reshaped npzs if needed

In [None]:
batch_size = 5
sliced_save_dir = os.path.join(base_dir, 'reshaped_resized')

In [None]:
slice_npz_folder(src_folder = reshaped_save_dir, batch_size = batch_size, save_dir = sliced_save_dir)

### Relabel npzs -- recommended for fig8 first pass jobs
"Predict" relabeling is recommended (unless 3D segmentation models are being used), since this relabeling strategy will perform decently on most 3D data to reduce the human labor involved in correction.

In [None]:
relabel_npzs_folder(npz_dir = sliced_save_dir, relabel_type = 'predict')

## Upload pieces to AWS

### Select directory

"base_dir" is a directory that holds the desired .npz or .trk file folder for data curation. This will go through the folder and upload the files to the desired S3 input bucket. 

In [None]:
upload_dir = sliced_save_dir

In [None]:
# bucket to load uncurated files
input_bucket = "caliban-input"

# bucket for curated files after submission
output_bucket = "caliban-output"

# subfolders in input/output bucket
aws_folder = "HeLa-S3/cyto/Stanford_movies/set7"

# stage becomes another subfolder so that data from subsequent jobs are grouped nearby
stage = 'firstpass'

filenames, filepaths = aws_caliban_upload(input_bucket, 
                                          output_bucket, 
                                          aws_folder, 
                                          stage, 
                                          folder_to_upload = upload_dir)

## Create CSV file for Figure Eight


Figure Eight jobs can be created easily by using a CSV file where each row contains information about one task. To create jobs for caliban, each row has a unique url that directs users to the Caliban tool with the correct data to curate. The CSV file is saved as {identifier}\_{stage}\_upload.csv in a folder that only holds CSVs.

In [None]:
csv_dir = os.path.join(base_dir, "CSV")
identifier = "HeLa-S3_cyto_movies_s7"

initial_csv_maker(csv_dir = csv_dir, 
                  identifier = identifier,
                  stage = stage,
                  input_bucket = input_bucket,
                  output_bucket = output_bucket,
                  subfolders = aws_folder,
                  filenames = filenames,
                  filepaths = filepaths)

## Create Figure Eight job

The Figure Eight API allows us to create a new job and upload data to it from this notebook. However, since our jobs don't include required test questions, editing job information such as the title of the job must be done via the website. This section of the notebook uses the API to create a job and upload data to it, then reminds the user to finish editing the job on the website.


In [None]:
job_id_to_copy =  1463619 #Caliban first-pass job
# job_id_to_copy = _ #Caliban foreground/background correction job
# job_id_to_copy = _ #Caliban inter-cell fixes job
# job_id_to_copy = _ #Caliban tracking/lineage correction job
fig_eight(csv_dir, "{0}_{1}".format(identifier, stage), job_id_to_copy)

## Start Figure Eight job on results of previous job
Due to the complexity of Caliban jobs, full annotation correction takes place over several jobs, which each focus on correcting a different aspect of the file. To begin the next job in a sequence, the files must be moved from one bucket location (their output location) to the input location for the next job. A new CSV file must also be created for the next job.

In this step, the job report from the finished job is downloaded to the CSV folder. The user must input the "next_stage" that the next Figure 8 job will be focusing on; this information will supplement the information in the job report so that the appropriate files can be moved and the CSV file created. After this step has finished, use the "Create Figure Eight job" section to create a new job from the new CSV.

In [None]:
csv_dir = os.path.join(base_dir, "CSV")
results_job_id = 1468933 #job ID for the results we need
next_stage = 'fgbg'
job_id_to_copy = 1472963 # instructions to copy for the 'next_stage' job we're creating now

In [None]:
# identifier = create_next_CSV(csv_dir, results_job_id, next_stage)
identifier = 'HeLa-S3_cyto_movies_s7'
fig_eight(csv_dir, "{0}_{1}".format(identifier, next_stage), job_id_to_copy)