24 November 2020
Hugo M. Botelho, Biosystems and Integrative Sciences Institute, University of Lisboa
hmbotelho@fc.ul.pt
We will use CellProfiler as a tool to batch analyze a 2D and a 3D imaging dataset. We will introduce the recently released CellProfiler 4 and present the major changes in this version. Then, we will design a custom analysis pipeline to segment objects, extract quantitative features and export a results table. We will also show how the analysis pipeline can be customized to address advanced analytical requirements and allow annotation with experimental metadata.
CellProfiler is free, open-source, public domain software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically (batch processing). Runs on Windows and macOS.
Advanced algorithms for image analysis are available as individual modules that can be placed in sequential order together to form a pipeline. The pipeline is then used to identify and measure biological objects and features in images, particularly those obtained through fluorescence microscopy.
The basic functionality of CellProfiler can be extended with custom modules (Python).
- Goal: Provide powerful image analysis methods with a user-friendly interface.
- Application: Batch analysis of object-based features.
- Philosophy: Measure everything, ask questions later.
- Data analysis: Based on individual cells (high content).
- Migrates CellProfiler from Python 2 to Python 3.
- Improved compatibility with older pipelines (v2, v3).
- Bug fixes (⚠ outputs may vary from previous versions).
- Morphometry now uses scikit-image (⚠ outputs may vary from previous versions).
- UI improvements.
- Improved contrast display options.
- Improved 3D viewer.
- Performance improvements.
- Measure colocalization in 16-bit images.
- Improved customization and automation in MeasureTexture, MeasureGranularity and MeasureImageIntensity.
- Removed LoadImages and LoadSingleImages.
- ...
Current version: 4.0.6
Extract features for image classification.
Images provided by Nicola Griti and Vikas Trivedi, EMBL Barcelona
2D images for single gastruloids expressing a GFP-fused gene. Imaged with a PerkinElmer Opera Phenix system.
- Number of images: 96
- Pixel size: 270 × 270 pixels
- Number of channels: 2 (brightfield + GFP)
- Dataset size: 13.6 MB
- Experimental treatments: see
metadata.csv
or the table below
Images | Treatment |
---|---|
01A - 01H ; 02A - 02H ; 03A - 03H | control |
04A - 04H ; 05A - 05H ; 06A - 06H | matrigel5 |
07A - 07H ; 08A - 08H ; 09A - 09H | pre2i |
10A - 10H ; 11A - 11H ; 12A - 12H | pre2i_cleaned |
Sample images
We will implement a CellProfiler 4 analysis pipeline to extract object features which may be used to implement an image classifier.
Two CellProfiler pipelines files are available:
2Danalysis_simple.cppipe
Basic analysis download2Danalysis_full.cppipe
Complete analysis, with additional steps download
Analysis of the dataset with 2Danalysis_full.cppipe
will produce the data stored in the outputs folder.
The simple pipeline implements the following steps:
- Annotate dataset with experimental treatments provided in
metadata.csv
; - Split channels;
- Preprocess brightfield image for gastruloid segmentation;
- Gastruloid segmentation;
- Measure GFP fluorescence intensity at each gastruloid;
- Export features and metadata.
The full pipeline implements the following steps:
- Annotate dataset with experimental treatments provided in
metadata.csv
; - Split channels;
- Preprocess brightfield image for gastruloid segmentation;
- Gastruloid segmentation;
- Exclude images without gastruloids;
- Background correction in the GFP channel;
- Measure GFP fluorescence intensity at each gastruloid;
- Measure object morphometric features;
- Save segmentation results as images;
- Export features and metadata.
Within the CellProfiler pipeline each module contains notes explaining what it is performing on this analysis.
Examples from gastruloid segmentation with these pipelines:
One can process the numerical features to build an image classifier. This is illustrated below with an R script which performs hierarchical clustering (unsupervised machine learning).
# Short R script
library(ggplot2)
library(heatmaply)
library(dplyr)
# Load CellProfiler data
cpdata <- read.csv("./dataset2D/output/objects.csv")
# Select relevant data
okcols <- c("AreaShape_Area", "AreaShape_Eccentricity", "AreaShape_EquivalentDiameter", "AreaShape_Extent",
"AreaShape_FormFactor", "AreaShape_MajorAxisLength", "AreaShape_MaxFeretDiameter",
"AreaShape_MaximumRadius", "AreaShape_MeanRadius", "AreaShape_MinFeretDiameter",
"AreaShape_MinorAxisLength", "AreaShape_Solidity")
cpdata_relevant <- cpdata[,okcols]
colnames(cpdata_relevant) <- sub("AreaShape_", "", colnames(cpdata_relevant))
rownames(cpdata_relevant) <- paste0(cpdata$Metadata_colrow, " - ", cpdata$Metadata_treatment)
# Scale data
for(i in 1:ncol(cpdata_relevant)){
x_data <- cpdata_relevant[[i]]
x_min <- min(x_data, na.rm = T)
x_max <- max(x_data, na.rm = T)
cpdata_relevant[[i]] <- (x_data-x_min)/(x_max-x_min)
}
# Heatmap
fig <- heatmaply(cpdata_relevant,
xlab = "Features",
ylab = "Samples",
margins = c(100, 140, 40, 0),
show_dendrogram = c(TRUE, FALSE),
fontsize_row = 7,
k_row = 3)
fig %>% layout(autosize = F, width = 500, height = 1000)
One can identify three major clusters:
- Cluster 1: mostly control + matrigel samples
- Cluster 2: mostly pre2i and pre2i_cleaned
- Cluster 3: outlier
Quantify GFP fluorescence in different cell populations (3D).
Images provided by Nicola Griti and Vikas Trivedi, EMBL Barcelona
One gastruloid imaged with light sheet fluorescence microscopy after fixation and staining to measure the expression level of a certain marker gene.
C01 (Red): nuclei
C02 (Green): GFP-tagged protein
To compare cell populations with different GFP expression levels 4 substacks were extracted from the original volume:
Stack | pixel size |
---|---|
1 | 57 × 57 × 30 |
2 | 57 × 57 × 137 |
3 | 57 × 57 × 64 |
4 | 57 × 57 × 66 |
- Number of stacks: 4
- Number of channels: 2 (DNA + GFP)
- Number of images: 8
- Dataset size: 6.5 MB
- Relative voxel dimensions (x, y, z): (0.76, 0.76, 2)
We will implement a CellProfiler 4 analysis pipeline to quantify GFP fluorescence at every nucleus in the dataset.
A preconfigured CellProfiler pipeline file is available:
3Danalysis.cppipe
download
Analysis of the dataset will produce the data stored in the outputs folder.
The pipeline implements the following steps:
- Preprocess DNA image for nuclear segmentation;
- Nuclei segmentation;
- Measure GFP fluorescence intensity at each nuclei;
- Save segmentation results as images;
- Export features and metadata.
Within the CellProfiler pipeline each module contains notes explaining what it is performing on this analysis.
Example of nuclei segmentation with this pipelines:
One can summarize the numerical features by calculating the average GFP fluorescence intensity across the cells in each substack. This is illustrated below with an R script:
# Short R script
options(warn=-1)
library(dplyr)
library(plotly)
cpdata <- read.csv("./dataset3D/output/objects.csv")
cpsummary <- cpdata %>%
group_by(ImageNumber) %>%
summarise(mean = mean(Intensity_MeanIntensity_GFP), sd = sd(Intensity_MeanIntensity_GFP))
fig <- plot_ly(
data = cpsummary,
x = ~ImageNumber,
y = ~mean,
error_y = list(
type = "data",
array = cpsummary$sd,
color = "black"
),
name = "CellProfiler results summary",
type = "bar"
)
fig <- fig %>% layout(xaxis = list(title = "Substack"), yaxis = list(title = "GFP intensity (mean ± sd)"))
fig
This is indeed what we would expect to get, given a visual inspection of the raw data.