# Data Prep for Crowd Annotation Pipeline

1. Collect raw data 
2. Adjust contrast of images
3. Chop up images into manageable pieces
4. Make into montages
5. Upload to Figure8

Files are named by these scripts such that the code blocks can run back-to-back with minimal input. For this reason, it is recommended that users run through the whole pipeline before processing another set of images.

In [3]:
# import statements
from __future__ import absolute_import

import os

from montage_makers import montage_maker, multiple_montage_maker
from overlapping_chopper import overlapping_crop_dir
from aws_upload import aws_upload, upload
from montage_to_csv import csv_maker
from fig_eight_upload import fig_eight

In [None]:
#sometimes raw images are in .tif stacks, not individual .tif files
#optional code block for turning into individual slices

## 2. Adjust contrast of images
description of contrast_adjustment.py

In [None]:
#contrast_adjustment

## 3. Chop up images into manageable pieces

Each full-size image usually has many cells in it. This makes them difficult to fully annotate! For ease of annotation (and better results), each frame is chopped up into smaller, overlapping frames, ultimately creating a set of movies. 

These smaller movies can be made with overlapping edges, making it easier to stitch annotations together into one large annotated movie (in the post-annotation pipeline). A large overlap will result in redundant annotations.

Even if you want to process the full-sized image, run the chopper with num_segments of 1. The montage makers are written to run on the output of the chopper.

In [4]:
raw_direc = "/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuclear"
identifier = "MouseBrain_s7_nuc"
num_x_segments = 5
num_y_segments = 5
overlap_perc = 10

In [4]:
overlapping_crop_dir(raw_direc, identifier, num_x_segments, num_y_segments, overlap_perc)

Current Image Size:  (1024, 1024)
Correct? (y/n): y
Processing...
Cropped files saved to /home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuclear_chopped_5_5


## 4. Make into montages
multiple_montage_maker is written to run on the output of the chopper, ie the folder where each chopped movie folder is saved. It will make montages of each subfolder according to the variables specified. It will make more than one montage per subfolder if there are enough frames to do so.

The variables used in multiple_montage_maker are saved in a JSON file so they can be reused in post-annotation processing.

In [5]:
montage_len = 15

direc = raw_direc + "_chopped_" + str(num_x_segments) + "_" + str(num_y_segments)
#direc = "/home/geneva/Desktop/Nb_testing/nuclear_test_chopped_4_4"

save_direc = os.path.join(os.path.dirname(direc), identifier + "_montages_" + str(num_x_segments) + "_" + str(num_y_segments))
#save_direc = "/home/geneva/Desktop/Nb_testing/montages"

row_length = 5
x_buffer = 5
y_buffer = 5

In [12]:
multiple_montage_maker(montage_len, direc, save_direc, identifier, 
                       num_x_segments, num_y_segments, row_length, x_buffer, y_buffer)

Now montaging images from: MouseBrain_s7_nuc_x_00_y_00
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x_01_y_00
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x_02_y_00
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x_03_y_00
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x_04_y_00
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x_00_y_01
You will be able to make 2 montages from this movie.
The last 8 frames will not be used in a montage. 

Now montaging images from: MouseBrain_s7_nuc_x

## 5. Upload to Figure Eight
Now that the images are processed into montages, they need to be uploaded to an AWS bucket and submitted to Figure Eight. This involves uploading the files to AWS, making a CSV file with the links to the uploaded images, and using that CSV file to create a Figure Eight job.

### Upload files to AWS
aws_upload will look for image files in the specified directory (folder_to_upload, set by default to be wherever the output of multiple_montage_maker was saved) and upload them into a bucket.

For the Van Valen lab, the default bucket is "figure-eight-deepcell" and keys (aws_folder + file names) correspond to the file structure of our data server.

aws_upload returns a list of the urls to which images were uploaded.

In [7]:
#import os

bucket_name = "figure-eight-deepcell" #default
aws_folder = "MouseBrain/set7"
folder_to_upload = save_direc #usually .../montages
#data_to_upload = "/home/geneva/Desktop/Nb_testing/montages/"

uploaded_montages = aws_upload(bucket_name, aws_folder, folder_to_upload)

#os.path.join("https://s3.us-east-2.amazonaws.com", bucket_name, aws_folder)
#print(uploaded_montages)
#from io_utils import get_img_names
#imgs_to_upload = get_img_names(folder_to_upload)
#for index, img in enumerate(imgs_to_upload):
#    print(img)
#    print(os.path.join(folder_to_upload, img))

What is your AWS access key id? ········
What is your AWS secret access key id? ········
Connected to AWS
/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/MouseBrain_s7_nuc_x_0_y_0_montage_0.png  849622 / 849622.0  (100.00%)

/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/MouseBrain_s7_nuc_x_0_y_0_montage_1.png  890794 / 890794.0  (100.00%)

/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/MouseBrain_s7_nuc_x_0_y_1_montage_0.png  967973 / 967973.0  (100.00%)

/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/MouseBrain_s7_nuc_x_0_y_1_montage_1.png  975484 / 975484.0  (100.00%)

/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/MouseBrain_s7_nuc_x_0_y_2_montage_0.png  969750 / 969750.0  (100.00%)

/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/MouseBrain_s7_nuc_montages_5_5/Mouse

### Make CSV file
Figure Eight jobs can be created easily by using a CSV file where each row contains information about one task. For our jobs, each row has the link to the location of one montage, and information about that montage (currently, just the "identifier" specified at the beginning of the pipeline). The CSV file is saved as "identifier".csv in a folder that only holds CSVs. CSV folders are usually in cell-type directories, so identifiers should be able to distinguish between sets, parts, etc.

In [8]:
#identifier = "test"
csv_direc = "/home/geneva/Desktop/Files_to_work_through/Data_for_crowd/CSV"

In [9]:
csv_maker(uploaded_montages, identifier, csv_direc)

### Create Figure Eight job

In [10]:
#job_id_to_copy = 1344258 #Elowitz timelapse RFP pilot
job_id_to_copy = 1346216 #Deepcell MouseBrain 3x5
#job_id_to_copy = 1306431 #Deepcell overlapping Mibi
#job_id_to_copy = 1292179 #Deepcell HEK
#job_id_to_copy =

'https://api.figure-eight.com/v1/jobs/1346216.json?'

Job not successful. Status code:  400


{'error': {'message': 'Your request is missing required parameters.'}}

In [11]:
from fig_eight_upload import fig_eight

fig_eight(csv_direc, identifier, job_id_to_copy)

Figure eight api key? ········
200
New job ID is: 1352262
Added data
Head over to the Figure Eight website to change the name of the job, review it, then contact the success manager so they can launch this job.


In [6]:
# curl -X GET https://api.figure-eight.com/v1/jobs/{job_id}/copy.json?key={api_key}
#import requests
#from getpass import getpass

#API_key = getpass("enter your API key")
#inputs = {"key" : API_key}
#inputs
#url = "https://api.figure-eight.com/v1/jobs/1344258/copy.json?"

enter your API key········


In [7]:
#r = requests.get(url, params=inputs)
#r.json()


{'id': 1351927,
 'options': {'mail_to': 'gemiller@caltech.edu',
  'flag_on_rate_limit': True,
  'include_unfinished': True,
  'logical_aggregation': True,
  'critical_webhook': False,
  'req_ttl_in_seconds': 7200,
  'front_load': False},
 'title': '(Shilpa Only) Deepcell Rfp Timelapse S0 - Segmentation Of Cells In Microscope Images Over Time (Pilot)',
 'secret': 'HYSPT+E/FFZcsmglmfOHdDcUEGki8VzEs0XHHhnqwnxz',
 'project_number': 'PN2135',
 'alias': None,
 'judgments_per_unit': 1,
 'units_per_assignment': 1,
 'pages_per_assignment': 1,
 'max_judgments_per_worker': None,
 'gold_per_assignment': 1,
 'minimum_account_age_seconds': None,
 'execution_mode': 'worker_ui_remix',
 'payment_cents': 100,
 'design_verified': True,
 'public_data': True,
 'variable_judgments_mode': 'none',
 'max_judgments_per_unit': None,
 'expected_judgments_per_unit': None,
 'min_unit_confidence': None,
 'units_remain_finalized': None,
 'auto_order_timeout': None,
 'auto_order_threshold': 0,
 'completed_at': None,
 

In [10]:
r.json()['id']

1351927