Skip to content

Yeast cells synthetic training data toolkit and segmentation and tracking pipeline

Notifications You must be signed in to change notification settings

ymzayek/yeastcells-detection-maskrcnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep learning pipeline for yeast cell segmentation and tracking

In this pipeline we created synthetic brightfield images of yeast cells and trained a Mask R-CNN model on them. Then we used the trained network on real time-series brightfield microscopy data to automaticly segment and track budding yeast cells.

Participants

Project description

Goals

  • To create synthetic image data to train a deep convolutional neural network
  • To implement an automatic segmentation pipeline using this network
  • To track cells across time frames

Get started on Google Colab

We've tried to make our experiments outsider accessible, particularly by setting up the installation for detectron2 in Google Colab and by downloading all external resources when needed. Please note that in these notebooks the first cells install all dependencies, this should work without restarting. However, try restarting via the Colab Runtime menu on errors, since inappropriate versions might have been imported into the runtime before the appropriate ones were installed. Particularly the Train model on synthetic data might require this. For the other notebooks not restarting will unlikely cause issues.

The two notebooks below allow you to create synthetic data and train a model. For a proof of concept, respectively set the sets and max_iter parameters to the lower values suggested. If you want to run them for a realistic use-case, please know these scripts take several hours to complete, and Google Colab is not intended for this. The results are large (~0.5 - 2GB) and on Colab you might easily fail to safe guard them when Google Colab shuts down the machine due to inactivity.

Implementation

For creating the synthetic data set and training the network see the notebooks create_synthetic_dataset_for_training and train_mask_rcnn_network.

For segmentation and tracking on real data see example pipeline notebook.

All the notebooks can be run on Google Colab and automatically install and download all needed dependencies and data (see links above).

(To run the Mask-RCNN locally, you will need to install the Detecron2 library. For a guide to a Window's installation see these instructions. You also need to download the trained model file from https://datascience.web.rug.nl/models/yeast-cells/mask-rcnn/v1/model_final.pth)

Segmentation: get_segmentation, get_model

  • Input Brightfield time-lapse images. The source file is either a tiff stack or multiple tiff files forming the time-series.

  • Output A dataframe with one row for each detection and # detections X height X width numpy.ndarray with the boolean segmentation masks, the masks and the dataframe have the same length and the mask column refers to the first dimension of the masks array. The dataframe also has columns frame, x and y to mark the frame of the source image and the centroid of the detection.

Example of 512x512 brightfield images and their detections. Detected yeast cells are highlighted by a magenta border. A) and B) show the segmentations in one frame of time-series agarpad experiments, C) shows segmentations in a microfluidic experiment and D) shows a close up of the boundries of detected cells.

Tracking: track_cells

  • Input Besides the dataframe and masks from segmentation, tracking needs hyperparameters for the DBSCAN clustering and the maximum frame distance when determining the distances between detections. You can set a maximum frame distance of <dmax> for the algorithm to use to calculate the distances between detections in the current frame and both frame-dmax, frame+dmax. In other words, this will calculate distances between all instances in a current frame and all the instances in the following and previous frames up to dmax. A higher dmax could control for intermittent false negatives because if a cell is missed in an andjacent frame but picked up again 2 frames ahead, the cell will be tracked. However, this also increases the probability of misclassification due to cell growth and movement with time if you look ahead too far. The min_samples and eps variables are required arguments for the DBSCAN algorithm. For further explanation see sklearn.cluster.DBSCAN.

  • Output The cell column is added to the dataframe of detections, which is -1 if the tracking algorithm marked it as an outlier and hence didn't track it.

Segmented and tracked yeast cells from Mask R-CNN. The frame rate of these time-series images is 180 seconds.

You can visualize the segmentations and tracks in a movie using visualize.create_scene and visualize.show_animation. Further, you can use visualize.select_cell to select a particular cell by label and zoom in on it to observe it better in the movie. The movie displayed with default options gives each cell a unique color that stays the same throughout the movie if the cell is tracked correctly. You also have the options to display the label number by setting the parameter labelnum to True.

Information and feature extraction

This pipeline allows you to extract information about the detected yeast cells in the time-series. The features.extract_contours function gives the contour points [x,y] for each segmentation. The masks for all detections can be extracted and their areas can be caulculated as shown in the example pipeline notebook.


A mother/daughter pair of masks are overlayed on the original brightfield image.

Further, if a flourescent channel is available, the pixel intensity of within each cell can also be calculated using the masks segmented on the brightfield images.

Example of Mask R-CNN pipeline output.

Evaluation

We evaluated our pipeline using benchmark data from the Yeast Image Toolkit (YIT) (Versari et al., 2017). On this platform, several exisiting pipelines have been evaluated for their segmentation and tracking performance. We tested our pipeline and that of YeaZ (Dietler et al., 2020) and YeastNet2 (Salem et al., 2021) on several test sets from this platform.


We chose to compare our pipeline with YeaZ and YeastNet2 because they also use a deep learning CNN, unlike the other pipelines evaluated on YIT.

The YeaZ segmentation and tracking implementation is based on YeaZ-GUI with optimized parameters obtained in this notebook. Additionally, our implementation allows for the use of GPU for the YeaZ pipeline.

The YeastNet2 segmentation and tracking were implemented using the YeaZ-GUI.

We matched the centroids provided in the benchmark ground truth data to the mask outputs for each model. This is slightly different than the way it was done on the evaluation platform of YIT but comparable since they matched centroids of the prediction to the centroids of the ground truth using a maximum distance threshold to count a comparison as a true positive (see their EP for more detail). We then calculated precision, recall, accuracy, and the F1-score.

In the table below, we report the performance metrics for each test set for both YeaZ and our pipeline for comparison.


Segmentation evaluation results from 7 test sets from the YIT. Precision, recall, accuracy, and the F1-score of the performance of our pipeline, YeaZ, and YeastNet2 are reported.
Tracking evaluation results from 7 test sets from the YIT. Precision, recall, accuracy, and the F1-score of the performance of our pipeline, YeaZ, and YeastNet2 are reported.

We further quantitatively evaluated our segmentation accuracy based on IOU and compared it to YeaZ using publicly available annotated ground truth data from the YeaZ group.

Average IOU is calculated for true positives using annotated brightfield images of wild-type cells from the YeaZ dataset

Hyperparameters

For our pipeline, we used calibration curves to set the segmentation threshold score needed by the Mask R-CNN to define the probablity that an instance is a yeast cell. For tracking, we used them to tune the epsilon of DBSCAN and dmax, the maximum amount of frames between two detections allowed to adjacently track them as the same cell.


YIT Test set 1

YIT Test set 2

YIT Test set 3

YIT Test set 4

YIT Test set 5

YIT Test set 6

YIT Test set 7

Calibration curves for each test set showing the 4 different metrics against the segmentation threshold score.

Metrics

TP: true positive detections
FP: false positive detections
FN: false negatives

YIT Test set 1

YIT Test set 2

YIT Test set 3

YIT Test set 4

YIT Test set 5

YIT Test set 6

YIT Test set 7

Calibration curves for tracking performance and hyperparameter tuning.

About

Yeast cells synthetic training data toolkit and segmentation and tracking pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published