Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #89

Merged
merged 3 commits into from
Jun 5, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
36 changes: 19 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,29 @@
[![Build Status](https://travis-ci.com/vanvalenlab/caliban-toolbox.svg?branch=master)](https://travis-ci.com/vanvalenlab/caliban-toolbox)
[![Coverage Status](https://coveralls.io/repos/github/vanvalenlab/caliban-toolbox/badge.svg?branch=master)](https://coveralls.io/github/vanvalenlab/caliban-toolbox?branch=master)

DeepCell Toolbox is a collection of data engineering tools for processing, annotating, and packaging optical microscopy images. The framework enables crowdsourced annotations and creates training data for [DeepCell](https://github.com/vanvalenlab/deepcell-tf).
Caliban Toolbox is a collection of data engineering tools to process and curate crowdsourced image annotations using [Caliban](https://github.com/vanvalenlab/caliban), our data annotation tool. The Toolbox and Caliban work together to generate annotations for training [DeepCell](https://github.com/vanvalenlab/deepcell-tf).

The process is as follows:
![flow](./docs/flowchart.png)

Read the documentation at
1. Raw data is imported using the data loader, which allows the user to select data based on imaging platform, cell type, and marker of interest.

## Getting Started
2. The raw data can then be run through deepcell-tf to produce predicted labels
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved

3. After making predictions with deepcell, the raw data is processed to make it easier for annotators to view. This includes adding filters, adjusting the contrast, etc. In addition, multiple channels can be summed together. Following these modifications, the user selects which of these channels will be included for the annotators to see.
MekWarrior marked this conversation as resolved.
Show resolved Hide resolved

4. The size of the images is then modified to make annotation easier. In order to get high quality annotations, it is important that the images are not so large that the annotators miss errors. Therefore, the images can be cropped into overlapping 2D regions to break up large FOVs. For stacks of images, the stack can be sliced into smaller more manageable pieces.
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved

5. Once the image dimensions have been set, each unique crop or slice is saved as an NPZ file. During this process, a JSON file is created which stores the necessary data to reconstruct the original image after annotation.

DeepCell Data Engineering uses `nvidia-docker` and `tensorflow` to enable GPU processing.
6. The NPZ files are then uploaded to a cloud bucket, where they can be accesssed by Figure8. During the upload process, the user specifies an existing job to use a template, which populates the instructions for the annotators and the job settings. During the upload process, a log file is created with the necessary information to download the annotations once the job is completed
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved

7. Once the job is completed, the corrected annotations are downloaded from the AWS bucket, where they are stored as the job progresses.
ngreenwald marked this conversation as resolved.
Show resolved Hide resolved

8. These annotations are then stitched back together, and saved as full-size NPZ files to be manually inspected for errors.

9. Following correction, the individual caliban NPZ files are combined together into a single training data NPZ, and saved in the appropriate location in the training data ontology.

## Getting Started

### Build a local docker container

Expand All @@ -23,20 +36,9 @@ docker build -t $USER/caliban_toolbox .

```

The tensorflow version can be overridden with the build-arg `TF_VERSION`.

```bash
docker build --build-arg TF_VERSION=1.9.0-gpu -t $USER/caliban_toolbox .
```

### Run the new docker image

```bash
# NV_GPU refers to the specific GPU to run DeepCell Toolbox on, and is not required

# Mounting the codebase, scripts and data to the container is also optional
# but can be handy for local development

NV_GPU='0' nvidia-docker run -it \
-p 8888:8888 \
$USER/caliban_toolbox:latest
Expand All @@ -47,7 +49,7 @@ It can also be helpful to mount the local copy of the repository and the scripts
```bash
NV_GPU='0' nvidia-docker run -it \
-p 8888:8888 \
-v $PWD/caliban_toolbox:/usr/local/lib/python3.5/dist-packages/caliban_toolbox/ \
-v $PWD/caliban_toolbox:/usr/local/lib/python3.7/site-packages/caliban_toolbox/ \
-v $PWD/notebooks:/notebooks \
-v /data:/data \
$USER/caliban_toolbox:latest
Expand Down