# Caliban Fig8 Upload Pipeline
This pipeline creates a Figure Eight job and uploads data files to an S3 bucket for data curation.

In [9]:
# import statements
from __future__ import absolute_import

import os

from ipywidgets import fixed, interactive
from imageio import imread

%matplotlib inline

from deepcell_toolbox.pre_annotation.aws_upload import aws_caliban_upload, caliban_upload
from deepcell_toolbox.pre_annotation.caliban_csv import csv_maker
from deepcell_toolbox.pre_annotation.fig_eight_upload import fig_eight

## Create npz (if needed)

We'll skip this for now since we already have a lot of npzs created. Also, the fig8 image download pipeline includes create_training_data. This part would mostly be for creating npz files from model outputs.

## Create trk (if needed)
Similar to creating npz. This part could probably take in an npz or the raw images + annotations. Adding an existing lineage should be optional. Currently, deepcell.org returns raw, annotations, and lineage zipped together (so that would be the lineage that goes into the trk), but this part could also just make a lineage analogous to Caliban's save_as_trk option.

## Split npz into pieces (semi-optional)
The idea is that you'd be starting from one huge npz and breaking it into managable pieces. This part will probably include:
- reshape npz (size of each piece is smaller, but same number of frames)
- break up each npz into fewer frames (annotator does not necessarily need to do all 30+ frames of a movie at once)
- save these reshaped pieces as individual npz files, so they can be uploaded and worked on separately
- relabel the npzs as needed (choose between no relabel, relabel each cell in each frame to have unique label, or relabel to have unique labels but preserve relationships across 

These pieces will need specific names so that we can put them back together again if needed (especially putting frames back into longer contiguous movies).

(parts of this are already written - Geneva)

## Upload pieces to AWS

### Select directory

"base_dir" is a directory that holds the desired .npz or .trk file folder for data curation. This will go through the folder and upload the files to the desired S3 input bucket. 

In [11]:
base_dir = "/Users/jannie/Desktop/data"

# folder of files to curate with Figure Eight
data_folder = "first_pass_npz"

data_dir = os.path.join(base_dir, data_folder)

In [4]:
# bucket to load uncurated files
input_bucket = "caliban-input"

# bucket for curated files after submission
output_bucket = "caliban-output"

# subfolders in input/output bucket
aws_folder = "f8test"

uploaded_images = aws_caliban_upload(input_bucket, output_bucket, aws_folder, folder_to_upload)

Connected to AWS
f8test
/Users/jannie/Desktop/data/first_pass_npz
['HeLa-S3_cyto_movie_s1_batch_00_first_pass.npz', 'HeLa-S3_cyto_movie_s1_batch_01_first_pass.npz']
HeLa-S3_cyto_movie_s1_batch_00_first_pass.npz
/Users/jannie/Desktop/data/first_pass_npz/HeLa-S3_cyto_movie_s1_batch_00_first_pass.npz  31104450 / 31104450.0  (100.00%)

HeLa-S3_cyto_movie_s1_batch_01_first_pass.npz
/Users/jannie/Desktop/data/first_pass_npz/HeLa-S3_cyto_movie_s1_batch_01_first_pass.npz  31104450 / 31104450.0  (100.00%)



## Create CSV file for Figure Eight


Figure Eight jobs can be created easily by using a CSV file where each row contains information about one task. To create jobs for caliban, each row has a unique url that directs users to the Caliban tool with the correct data to curate. The CSV file is saved as {identifier}_upload.csv in a folder that only holds CSVs.

In [7]:
csv_dir = os.path.join(base_dir, "CSV")
identifier = "test_set"
csv_maker(uploaded_images, csv_dir, identifier)

## Create Figure Eight job

The Figure Eight API allows us to create a new job and upload data to it from this notebook. However, since our jobs don't include required test questions, editing job information such as the title of the job must be done via the website. This section of the notebook uses the API to create a job and upload data to it, then reminds the user to finish editing the job on the website.


In [10]:
job_id_to_copy =  1432466 #Caliban test job
fig_eight(csv_dir, identifier, job_id_to_copy)

Figure eight api key? ········
New job ID is: 1451551
Added data
Now that the data is added, you should go to the Figure Eight website to: 
-change the job title 
-review the job design 
-confirm pricing 
-launch the job (or contact success manager)
