MEF module cleanup #165

castillohair · 2015-12-07T02:13:42Z

No description provided.

JS3xton · 2015-12-07T17:16:58Z

fc/mef.py

+    library, with full covariance matrices for each cluster and a fixed,
+    uniform set of weights. This means that `clustering_gmm` implicitly
+    assumes that all bead subpopulations have roughly the same number of
+    events. For more information, consult ``scikit-learn``'s documentation.


Throwing away samples 50% away from cluster centers seems pretty arbitrary and unjustified. I assume this is all part of the scikit-learn GMM bug? In which case this is kinda a hack to get around that? I've already raised my concerns regarding that bug in #159, and it may not be worth fixing at this point in time. In general, though, it would be nice to get rid of this strange initialization procedure if the scikit-learn GMM bug was ever fixed. In my previous experience (MATLAB implementation), a vanilla GMM fit would work fine with a reasonable K-means initialization.

I would also prefer not to hardcode the weights equally. If the GMM algorithm works as advertised, that should be a pretty straightforward parameter to fit, and leaving it unbounded allows for edge cases where you have non-uniform subpopulation sizes (again, the devil's advocate argument that immediately comes to mind is if you mixed two different bead samples and were unable to guarantee equal ratio between the two; just feels like an unnecessary assumption). I feel like this also relates to the GMM bug, though, in which case it may not be worth fixing now and would warrant a deeper investigation into the inner workings of scikit-learn's GMM implementation.

If this works good enough now, then I probably wouldn't change these things.

Both using 50% of the samples for each cluster and forcing the weights to be equal are what's currently making GMM work. This was tested with a bunch of beads files taken by almost everybody in lab.

K-means initialization is actually performed by default (check line 486 in https://github.com/scikit-learn/scikit-learn/blob/c957249/sklearn/mixture/gmm.py). And we saw that this didn't work properly.

Using equal weights is actually less of a hack that you would think. It's a well supported, documented option in scikit's GMM. And the model I'm using with full covariance matrices is arguably less restrictive than the default, which uses equal, diagonal covariance matrices (http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GMM.html).

JS3xton · 2015-12-07T23:23:28Z

fc/mef.py

@@ -607,14 +353,11 @@ def get_transform_fxn(data_beads, peaks_mef, mef_channels,
    ----------
    data_beads : FCSData object
        Flow cytometry data, taken from calibration beads.
-    peaks_mef : array
+    mef_values : array
        Known MEF values of the calibration beads' subpopulations, for
        each channel specified in `mef_channels`.
    mef_channels : int, or str, or list of int, or list of str
        Channels for which to generate transformation functions.


Still think mef_values and mef_channels can be more intuitively combined into a dict where key=channel (int or str) and value = 1D numpy array. See #146.

JS3xton · 2015-12-08T17:49:13Z

fc/mef.py

+        if hasattr(populations[0], 'domain'):
+            high = 0.985*populations[0].domain(0)[-1]
+        else:
+            raise TypeError("argument 'high' not specified")


I think referencing populations[0] will throw an error if populations is an empty list, which is never checked for. Not sure if we care about that edge case, though.

castillohair added 21 commits December 3, 2015 16:15

Eliminated unnecesary functions from fc.mef.

a401fe2

Style changes in fc.mef.

6ba2fc6

Removed unnecesary imports from fc.mef.

8871d3f

Eliminated mef.find_peaks_median.

090c6f3

Changed name of mef.select_peaks_proximity.

ddc3f09

Change variable names in fit_standard_curve and plot_standard_curve.

0a0db42

Small corrections to docstrings in mef.

0f46179

Cleaned clustering in mef.get_transform_fxn.

e3b4fd0

Updated nomenclature and signature of mef.get_transform_fxn.

058783c

Moved cluster sorting step in mef.get_transform_fxn.

8dc0bb1

Reorganized population selection function.

1125b9f

Change in parameter order when calling get_trasnform_fxn from excel_ui.

874f4e1

Updated verbose messages in get_transform_fxn.

ff00d14

Cleaned mef.clustering_gmm().

c6ff873

Added plot for 1D clustering in get_transform_fxn.

ae3aa86

Changed variable names in mef.get_transform_fxn.

ba3436c

Modified histogram/populations plot in get_transform_fxn.

0c38446

Renamed selection function in fc.mef.

c53e734

Changed name of standard curve fitting function in fc.mef.

4a8d837

Small changes in fc.mef's docstrings.

a03905e

Modified examples/analyze.py to work with modified fc.mef.

bf2581e

JS3xton reviewed Dec 7, 2015
View reviewed changes

castillohair added 10 commits December 7, 2015 23:52

Redefined discard_frac in mef.clustering_gmm.

9a29270

Small changes in mef.selection_std.

e504451

Changes in docstring in mef.fit_beads_autofluorescence.

3a1d273

Changed default xlim in mef.plot_standard_curve().

588d65d

Corrections in docstring in mef.get_transform.fxn.

24eb041

Variable names in fc/mef.py

a0d781f

Variable name changes in mef.get_transform_fxn()

d627ceb

More variable names in get_transform_fxn.

2dfcc8e

'_func' -> '_fxn' in mef.get_transform_fxn.

ecd4cb6

'_out' -> '_res' in mef.get_transform_fxn.

780db36

castillohair merged commit 780db36 into develop Dec 8, 2015

castillohair deleted the mef-improvements branch December 8, 2015 14:29

JS3xton reviewed Dec 8, 2015
View reviewed changes

JS3xton mentioned this pull request Jan 16, 2016

Use uncertainty in bead population statistic as weights for least squares regression #177

Open

JS3xton mentioned this pull request Jun 12, 2016

MEFing for floating-point data #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MEF module cleanup #165

MEF module cleanup #165

castillohair commented Dec 7, 2015

JS3xton Dec 7, 2015

castillohair Dec 8, 2015

JS3xton Dec 7, 2015

JS3xton Dec 8, 2015

MEF module cleanup #165

MEF module cleanup #165

Conversation

castillohair commented Dec 7, 2015

JS3xton Dec 7, 2015

Choose a reason for hiding this comment

castillohair Dec 8, 2015

Choose a reason for hiding this comment

JS3xton Dec 7, 2015

Choose a reason for hiding this comment

JS3xton Dec 8, 2015

Choose a reason for hiding this comment