# Caliban Fig8 Upload Pipeline
This pipeline creates a Figure Eight job and uploads data files to an S3 bucket for data curation.

- Create npz/trk file from image+annotations, if needed
- Resize npz into smaller pieces (image dimensions) if needed
- Shorten length of npz if needed
- Upload files to AWS
- Create job csv and upload it to figure 8

In [None]:
# import statements
from __future__ import absolute_import

import math
import numpy as np
import os

from deepcell_toolbox.pre_annotation.npz_preprocessing import reshape_npz, slice_npz_batches, slice_npz_folder, relabel_npzs_folder
from deepcell_toolbox.pre_annotation.aws_upload import aws_caliban_upload
from deepcell_toolbox.pre_annotation.caliban_csv import csv_maker
from deepcell_toolbox.pre_annotation.fig_eight_upload import fig_eight

## Create npz (if needed)

We'll skip this for now since we already have a lot of npzs created. Also, the fig8 image download pipeline includes create_training_data. This part would mostly be for creating npz files from model outputs.

## Create trk (if needed)
Similar to creating npz. This part could probably take in an npz or the raw images + annotations. Adding an existing lineage should be optional. Currently, deepcell.org returns raw, annotations, and lineage zipped together (so that would be the lineage that goes into the trk), but this part could also just make a lineage analogous to Caliban's save_as_trk option.

## Split npz into pieces (semi-optional)
The idea is that you'd be starting from one huge npz and breaking it into managable pieces. This part will probably include:
- reshape npz (size of each piece is smaller, but same number of frames)
- break up each npz into fewer frames (annotator does not necessarily need to do all 30+ frames of a movie at once)
- save these reshaped pieces as individual npz files, so they can be uploaded and worked on separately
- relabel the npzs as needed (choose between no relabel, relabel each cell in each frame to have unique label, or relabel to have unique labels but preserve 3D relationships)

These pieces will need specific names so that we can put them back together again if needed (especially putting frames back into longer contiguous movies).

In [None]:
base_dir = '/gnv_home/data/caliban_data_testing'

### Reshape npz (y and x dimensions) if needed

In [None]:
full_npz_path = '/gnv_home/data/caliban_data_testing/test_unmodified.npz'
x_size = 160
y_size = 145
reshaped_save_dir = os.path.join(base_dir, 'reshaped')

In [None]:
reshape_npz(full_npz_path, x_size, y_size, save_dir = reshaped_save_dir)

### Slice (t or z dimension) single npz if needed
#### Use this option if you do not need to reshape y and x in your npz before beginning work.
This option may be useful in the future for something like curation of 2D npzs. In that case, each slice is independent, so only one slice would be launched at a time. With only one frame to annotate, the annotation time per image allows for a larger image (annotating a larger image would also reduce the number of annotations on the edge, which can be tougher since there is less context to judge cell boundaries).

However, this option has limited use until Caliban supports zooming in and out.

In [None]:
# full_npz_path = ''
# batch_size = 1
# sliced_save_dir = ''

In [None]:
# slice_npz_batches(full_npz_path, batch_size, save_dir)

### Slice (t or z dimension) folder of reshaped npzs if needed

In [None]:
batch_size = 5
sliced_save_dir = os.path.join(base_dir, 'reshaped_resized')

In [None]:
slice_npz_folder(src_folder = reshaped_save_dir, batch_size = batch_size, save_dir = sliced_save_dir)

### Relabel npzs so that each cell is unique -- recommended for fig8 first pass jobs

In [None]:
relabel_npzs_folder(npz_dir = sliced_save_dir)

## Upload pieces to AWS

### Select directory

"base_dir" is a directory that holds the desired .npz or .trk file folder for data curation. This will go through the folder and upload the files to the desired S3 input bucket. 

In [None]:
upload_dir = sliced_save_dir

In [None]:
# bucket to load uncurated files
input_bucket = "caliban-input"

# bucket for curated files after submission
output_bucket = "caliban-output"

# subfolders in input/output bucket
aws_folder = "f8test/gnv0"

uploaded_images = aws_caliban_upload(input_bucket, output_bucket, aws_folder, folder_to_upload = upload_dir)

## Create CSV file for Figure Eight


Figure Eight jobs can be created easily by using a CSV file where each row contains information about one task. To create jobs for caliban, each row has a unique url that directs users to the Caliban tool with the correct data to curate. The CSV file is saved as {identifier}_upload.csv in a folder that only holds CSVs.

In [None]:
csv_dir = os.path.join(base_dir, "CSV")
identifier = "gnv_test_set0"
csv_maker(uploaded_images, csv_dir, identifier)

## Create Figure Eight job

The Figure Eight API allows us to create a new job and upload data to it from this notebook. However, since our jobs don't include required test questions, editing job information such as the title of the job must be done via the website. This section of the notebook uses the API to create a job and upload data to it, then reminds the user to finish editing the job on the website.


In [None]:
job_id_to_copy =  1432466 #Caliban test job
fig_eight(csv_dir, identifier, job_id_to_copy)