Tx compare refactor #54

tleonardi · 2019-03-20T16:35:27Z

I've added unit tests (using pytest) and refactored TxCompare. Now the GMM option can do (and report) both the anova and logit while doing the GMM fitting only once. I've also added the relevant options in the CLI.

tleonardi · 2019-03-20T16:45:10Z

I've just realised that this breaks SampCompDB.save_report(). I'm fixing it.

a-slide

Looks good Tom.

nanocompore/SampComp.py

nanocompore/SampCompDB.py

nanocompore/TxComp.py

a-slide · 2019-03-26T09:42:58Z

nanocompore/TxComp.py

@@ -155,12 +163,43 @@ def gmm_test_anova(data, log_dwell=True, verbose=False):
    # Generate an array of sample labels
    Y = [ k for k,v in data[condition_labels[0]].items() for _ in v['intensity'] ] + [ k for k,v in data[condition_labels[1]].items() for _ in v['intensity'] ]

-    # Loop over multiple cv_types and n_components and for each fit a GMM
+    gmm_fit = fit_best_gmm(X, max_components=2, cv_types=['spherical', 'tied', 'diag', 'full'])


Do you try all of the possible fits ? Isn't it enough to use "full" only ?

This is a good point, not sure here... the model that is kept in the end is not always the 'full'.. sometimes the lowest bic has different convariance types. However, using only full would save a lot of time...

I think I would be acceptable to use only full. The others are special cases of full anyway aren't they ?

tleonardi added 4 commits March 20, 2019 17:31

Major refactoring of Anova, logit and GMM functions

81d3e91

Renamed force_logit to logit

e472574

Added reporting of log ratios

a3a1dab

Fixes error related to 0 variance within groups

e611ce4

tleonardi requested a review from a-slide as a code owner March 20, 2019 16:35

tleonardi mentioned this pull request Mar 20, 2019

SampCompDB.save_report() broken in TxCompare_refactor #55

Closed

tleonardi added 7 commits March 21, 2019 14:36

Added sum_of_squares()

c2613ca

check for within group variance uses sum_of_squares()

10cb525

Harmonized cluster_counts reporting

cfbfc2a

Added strict=True when testing logit

4dbcc6a

Refactore save_report(). Major performance improvement. Issue #55

7c40cbd

Added test for correct handling of 0 within-group variance. Issue #56

9e5bc93

Warnings are reported even when they are ignored

5a9ef4e

a-slide approved these changes Mar 26, 2019

View reviewed changes

a-slide merged commit a2ce3ad into devel Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tx compare refactor #54

Tx compare refactor #54

tleonardi commented Mar 20, 2019

tleonardi commented Mar 20, 2019

a-slide left a comment

a-slide Mar 26, 2019

tleonardi Mar 26, 2019

a-slide Mar 26, 2019

Tx compare refactor #54

Tx compare refactor #54

Conversation

tleonardi commented Mar 20, 2019

tleonardi commented Mar 20, 2019

a-slide left a comment

Choose a reason for hiding this comment

a-slide Mar 26, 2019

Choose a reason for hiding this comment

tleonardi Mar 26, 2019

Choose a reason for hiding this comment

a-slide Mar 26, 2019

Choose a reason for hiding this comment