contributed by:

Tobias Rasse
Max Planck Institute for Heart and Lung Research
61231 Bad Nauheim, Germany
tobias.rasse@mpi-bn.mpg.de

Images recorded by:
Tobias Rasse with a Samsung Galaxy S6 Active Smartphone,
CC-BY 4.0 Licence

In [1]:
import pickle
import numpy as np

## General description of OpSeF

The analysis pipeline consists of four principle sets of functions to import and reshape the data, to pre-process it, to segment it and to analyze and classify results.

<img src="./Figures_Demo/Fig_M1.jpg" alt = "Paper Fig M1" style = "width: 500px;"/>

OpSeF is designed such that parameter for pre-processing and selection of the ideal model for segmentation are one functional unit. 

We recommend a iterative optimization, that starts with a large number of model, and relatively few, but conceptually distinct preprocessing pipelines, to the lower then the number of model to be explored while fine-tuning the most pre-processing pipeline, e.g. by optimizing filter kernel or the way how histograms are equalized.

## Description of the parameter tuning process for the cobblestons demo dataset

The parameter tuning was performed as described below:

First minor smooting (median filter with 3x3 kernel) (Input_000) and moderate background substration was applied. Next, the effect of additional histogram equalization (Input_001), edge enhancement (Input 002), and image inversion (Input 003) was tested.

All available model were tested, the Cellposescale factor range 0.2,0.4,0.6 was explored.

### Run 1:

The Cellpose Cyto model withscale factor = 0.4 & Input 3 produced without further optimzation a goods result in both test images. Outlines are well defined, no objects were missed, none oversegemented (Fig 1 A & B). 


<img src="./Figures_Demo/Fig_R1_AB.jpg" alt = "Result Run001" style = "width: 700px;"/>

Input 0 gave a simlar good results (data not shown), but one stone was misssed in one of the two images.

#### => check how well the CP cyto model withscale factor = 0.4 & Input 3 performs on all images

### Run 2:
The model works well. Outlines are well defined.Cellpose has been trained on a large variety of images and had been reported to perform well on  objects of similar shape (Fig S1 adapted from Fig 4 Stringer et. al.).

<img src="./Figures_Demo/cobblestones_S1_CP_Fig4.jpg" alt = "Fig S1" style = "width: 700px;"/>

Only in one image three objects were missed and one is oversegemented. Borders around these stones are very hard to observe and even further training might not resolve such extreme difficult segmentation tasks Fig 1 E,F.

<img src="./Figures_Demo/Fig_R1_CF.jpg" alt = "Result Run002" style = "width: 700px;"/>

## Load Core-Settings that shall not be changed
Please use OpSef_Configure_XXX to change these global settings.
Changes in this file only necessary to intergrate new model,
or to change the aut-ogenerated folderstructure.
Changes to the folderstructure might cause errors.

In [2]:
file_path = "./my_runs/main_settings.pkl"
infile = open(file_path,'rb')
parameter = pickle.load(infile)
print("Loading processing pipeline from",file_path)
infile.close()
model_dic,folder_structure = parameter

Loading processing pipeline from ./my_runs/main_settings.pkl


## Define General Parameter
most parameter define the overall processing and likeley do not change between runs

In [3]:
# Define variables that determine the processing pipeline and (generally) do not change between runs

pc = {}

#################
## Basic ########
#################

pc["sub_f"] = folder_structure # these folder will be auto-generated
pc["batch_size"] = 2 # the number of images to be quantified must be a multiple of batchscale factor (for segmentation)
                     # extract the properties (below) from region_props
pc["get_property"] = ["label","area","centroid", "eccentricity", 
                      "equivalent_diameter","mean_intensity","max_intensity",
                      "min_intensity","perimeter"]
pc["naming_scheme"] = "Simple"        # or "Simple" Export_ZSplit" to create substacks 

###############################
# Define use of second channel
###############################

pc["export_another_channel"] = False  # export other channel (to create a mask or for quantification) ?
if ["export_another_channel"]:
    pc["create_filter_mask_from_channel"] = False # use second channel to make a mask?
    pc["Quantify_2ndCh"] = False      # shall this channel be quantified?
    if pc["Quantify_2ndCh"]:
        pc["merge_results"] = True    # shall the results of the two intensity quantification be merged 
                                     # (needed for advanced plotting)
        pc["plot_merged"] = True     # plot head of dataframe in notebook ?

################################
# Define Analysis & Plotting ###
################################

pc["Export_to_CSV"] = False # shall results be exported to CSV (usually only true for the final run)
if pc["Export_to_CSV"]:
    pc["Intensity_Ch"] = 999 # put 999 if data contains only one channel
    pc["Plot_Results"] = True # Do you want to plot results ?
    pc["Plot_xy"] = [["area","mean_intensity"],["area","circularity"]] # Define what you want to plot (x/y)
    pc["plot_head_main"] = True # plot head of dataframe in notebook ?
    pc["Do_ClusterAnalysis"] = False # shall cluster analysis be performed? 
    if pc["Do_ClusterAnalysis"]: # Define (below) which values will be included in the TSNE:
        pc["include_in_tsne"] = ["area","eccentricity","equivalent_diameter",
                                 "mean_intensity","max_intensity","min_intensity","perimeter"]
        pc["cluster_expected"] = 4 # How many groups/classes do you expected?
        pc["tSNE_learning_rate"] = 100 # Define learning rate
        pc["link_method"] = "ward" # or "average", or "complete", or "single" details see below
else:
    pc["Plot_Results"] = False
    pc["Do_ClusterAnalysis"] = False 
    
pc["toFiji"] = False

In [4]:
# Define here input & basic processing that (generally) does not change between runs

input_def = {}
input_def["root"] = "/home/trasse/Desktop/MLTestData/cobblestones_test" # define folder where images are located
input_def["dataset"] = "cobble_stones_512" # give the dataset a common name
input_def["mydtype"] = np.uint8 # bit depth of input images

input_def["input_type"] = ".tif" # or .tif
if input_def["input_type"] == ".tif":
    input_def["is3D"] = False # is the data 3D or 2D ???
elif input_def["input_type"] == ".lif":
    input_def["rigth_size"] = (2048,2048) 
    input_def["export_single_ch"] = 99   # which channel to extract from the lif file (if only one)

input_def["split_z"] = False # chose here to split z-stack into multiple substacts to avoid that cells fuse after projection
if input_def["split_z"]:# choosen, define:
    input_def["z_step"] = 3 #scale factor of substacks
    if input_def["input_type"] == ".lif":
        input_def["export_multiple_ch"] = [0,1] # channels to be exported
    

#########################################################################
## the following options are only implemented with Tiff files as input ##
#########################################################################

input_def["toTiles"] = False
if input_def["toTiles"]:
    input_def["patch_size"] = (15,512,512)

input_def["bin"] = False 
if input_def["bin"]:
    input_def["bin_factor"] = 2 # same for x/y

# coming soon...
input_def["n2v"] = False
input_def["CARE"] = False

In [5]:
# Define parameter for export (if needed)

if pc["export_another_channel"]:
    input_def["post_export_single_ch"] = 0                   # which channel to extract from the lif file
    input_def["post_subset"] = ["09_Ch_000_CS985"]          # analyse these intensity images
    if pc["Quantify_2ndCh"]:
        pc["Intensity_2ndCh"] = input_def["post_export_single_ch"]        

In [6]:
# Define model

# in this dictionary all settings for the model are stored
initModelSettings = {}
# Variables U-Net Cellprofiler
initModelSettings["UNet_model_file_CP01"] = "./model_Unet/UNet_CP001.h5"
initModelSettings["UNetShape"] = (512,512)
initModelSettings["UNetSettings"] = [{"activation": "relu", "padding": "same"},{ "momentum": 0.9}]
# Variables StarDist
initModelSettings["basedir_StarDist"] = "./model_stardist"
# Variables Cellpose
initModelSettings["Cell_Channels"] = [[0,0],[0,0]]

## Define Runs

These parameter listed below are likely change between runs.

Preprocessing is mainly based on scikit-image.

Segmentation in cooperates the pre-trained U-Net implementation used in Cellprofiler 3.0, the StarDist 2D model and Cellpose.

Importantly, OpSeF is designed such that parameter for pre-processing and selection of the ideal model for segmentation are one functional unit.

<img src="./Figures_Demo/Fig_M4.jpg" alt = "Paper Fig M4" style = "width: 500px;"/>

The above show Figure illustrates this concept with a processing pipeline, in three different models are applied to four different pre-processing pipelines each. Next, the resulting images are classified into results that are largely correct or suffer from failure to detect objects, under- or over-segmentation. In the given example, pre-processing pipeline three and model two seem to give overall the best result. 

In [7]:

# Define variable that might change in each run

run_def = {}
run_def["display_base"] = "000" # defines the image used as basis for the overlay. See documemtation for details.
run_def["run_ID"] = "002" #give each run a new ID (unless you want to overwrite the old data)
run_def["clahe_prm"] = [(18,18),3] # Parameter for CLAHE

# Run 1
input_def["subset"] = ["Train"] # filter by name

# Run 2
input_def["subset"] = ["All"] # filter by name

#########################
# Define preprocessing ##
#########################


# Run1
run_def["pre_list"] = [["Median",3,50,"Max",False,run_def["clahe_prm"],"no",False],
                       ["Median",3,50,"Max",True,run_def["clahe_prm"],"no",False],
                       ["Median",3,50,"Max",False,run_def["clahe_prm"],"sobel",False],
                       ["Median",3,50,"Max",False,run_def["clahe_prm"],"no",True]]

# Run2
run_def["pre_list"] = [["Median",3,50,"Max",False,run_def["clahe_prm"],"no",True]]


# For Cellpose

run_def["rescale_list"] = [0.2,0.4,0.6] # run1
run_def["rescale_list"] = [0.4] # run2

# Define model

run_def["ModelType"] = ["CP_nuclei","CP_cyto","SD_2D_dsb2018","UNet_CP001"] # run1
run_def["ModelType"] = ["CP_cyto"]  # run2


############################################################
# Define postprocessing & filtering                       #
# keep only the objects within the defined ranges         #
###########################################################

# (same for all runs) 
run_def["filter_para"] = {}
run_def["filter_para"]["area"] = [0,10000]
run_def["filter_para"]["perimeter"] = [0,99999999]
run_def["filter_para"]["circularity"] = [0,1] # (equivalent_diameter * math.pi) / perimeter
run_def["filter_para"]["mean_intensity"] = [0,65535]
run_def["filter_para"]["sum_intensity"] = [0,100000000000000]
run_def["filter_para"]["eccentricity"] = [0,10]

############################################################################
# settings that are only needed if condition below is met which means you  #
# plan to use a mask from a second channel to filter results               #
############################################################################

if pc["create_filter_mask_from_channel"]:
    run_def["binary_filter_mp"] = [["open",5,2,1,"Morphology"],["close",5,3,1,"Morphology"],["erode",5,1,1,"Morphology"]]
    run_def["para_mp"] = [["Mean",5],[0.6],run_def["binary_filter_mp"]]


## How to reproduce these results?

The notebook is set up to reproduce the results of the last run (Run 2)

The execute previous runs please delete or commented out settings used for Run 2 

Settings are saved in a .pkl file.

The next cell prints the filepath & name of this file.

OpSef_Run_XXX loads the settings specified above and processed all images.
The only change you have to make within OpSef_Run_XXX  is specifying the location
of this setting file.

## Save Parameter

In [8]:
#  auto-create parameter set from input above
run_def["run_now_list"] = [model_dic[x] for x in run_def["ModelType"]]
parameter = [pc,input_def,run_def,initModelSettings]

# save it
file_name = "./my_runs/Parameter_{}_Run_{}.pkl".format(input_def["dataset"],run_def["run_ID"])
file_name_load = "./Demo_Notebooks/my_runs/Parameter_{}_Run_{}.pkl".format(input_def["dataset"],run_def["run_ID"])
print("Please execute this file with OPsef_Run_XXX",file_name_load)
outfile = open(file_name,'wb')
pickle.dump(parameter,outfile)
outfile.close()

Please execute this file with OPsef_Run_XXX ./Demo_Notebooks/my_runs/Parameter_cobble_stones_512_Run_002.pkl


## Documentation

In [None]:
##########################
## Folderstructure    ####

input_def["root"] = "/home/trasse/Desktop/MLTestData/leaves" # defines the main folder 

# Put files in these subfolder
# .lif
# root/myimage_container.lif
# root/tiff/myimage1.tif (in case this folder is the direct input to the pre-processing pipeline)
#          /myimage2.tif ...
# or
# root/tiff_raw_2D/myimage1.tif (if you want to make patches in 2D)
# root/tiff_to_split/myimage1.tif (if you want ONLY create substacts, bt not BIN or patch before)
# root/tiff_raw/myimage1.tif (for all pipelines that start with patching or binning and use stacks)


######################################
### What is a display base image ????
######################################

run_def["display_base"]
''' 
display base is ideally set to "same".
in this case the visualiation of segmentation border will be
drawn on top of the input image to the segmentation
If this behavior is not desired a three digit number
that refers to the position in the run_def["pre_list"] might to be entered.
example:
run_def["pre_list"] = [["Median",3,8,"Sum",True,clahe_prm],["Mean",5,3,"Max",True,clahe_prm]]
& the image resulting from:
["Mean",5,3,"Max",True,clahe_prm]
shall be used as basis for display:
then set: 
run_def["display_base"] = "001"
'''

##########################
# Parameter for Cellpose:
##########################

# Define: 
# initModelSettings["Cell_Channels"] 
# to run segementation on grayscale=0, R=1, G=2, B=3
# initModelSettings["Cell_Channels"] = [cytoplasm, nucleus]
# if NUCLEUS channel does not exist, set the second channel to 0
initModelSettings["Cell_Channels"] = [[0,0],[0,0]]

# IF ALL YOUR IMAGES ARE THE SAME TYPE, you can give a list with 2 elements
initModelSettings["Cell_Channels"] = [0,0] # IF YOU HAVE GRAYSCALE
initModelSettings["Cell_Channels"] = [2,3] # IF YOU HAVE G=cytoplasm and B=nucleus
initModelSettings["Cell_Channels"] = [2,1] # IF YOU HAVE G=cytoplasm and R=nucleus

# if rescale is set to None, thescale factor of the cells is estimated on a per image basis
# if you want to set thescale factor yourself, set it to 30. / average_cell_diameter



Preprocessing is mainly based on scikit-image. It consist of a linear pipeline:

<img src="./Figures_Demo/Fig_M3.jpg" alt = "Paper Fig M3" style = "width: =800px;"/>

In [10]:
#####################################
## Variables for Preprocessing
#####################################

## The list 

run_def["pre_list"]

#is organized (as illustrated above) organized in the following way:

# (1) Filter type
# (2) Kernel
# (3) substract fixed value
# (4) projection type
# (5) calhe enhance (Yes/No) as defined above
# (6) Calhe parameter
# (7) enhance edges (and how) (no, roberts, sobel)
# (8) invert image

# It is a list of lists, each entry defines one pre-processing pipeline:

# e.g.
# Run1
run_def["pre_list"] = [["Median",3,50,"Max",False,run_def["clahe_prm"],"no",False],
                       ["Median",3,50,"Max",True,run_def["clahe_prm"],"no",False],
                       ["Median",3,50,"Max",False,run_def["clahe_prm"],"sobel",False],
                       ["Median",3,50,"Max",False,run_def["clahe_prm"],"no",True]]

### Link analysis (Settings from t-SNE)

from https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html

linkage{“ward”, “complete”, “average”, “single”}

Which linkage criterion to use:

The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.

ward minimizes the variance of the clusters being merged. average uses the average of the distances of each observation of the two sets. complete or maximum linkage uses the maximum distances between all observations of the two sets.

single uses the minimum of the distances between all observations of the two sets.