Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tx compare refactor #54

Merged
merged 11 commits into from
Mar 26, 2019
Merged

Tx compare refactor #54

merged 11 commits into from
Mar 26, 2019

Conversation

tleonardi
Copy link
Owner

I've added unit tests (using pytest) and refactored TxCompare. Now the GMM option can do (and report) both the anova and logit while doing the GMM fitting only once. I've also added the relevant options in the CLI.

@tleonardi tleonardi requested a review from a-slide as a code owner March 20, 2019 16:35
@tleonardi
Copy link
Owner Author

I've just realised that this breaks SampCompDB.save_report(). I'm fixing it.

Copy link
Collaborator

@a-slide a-slide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Tom.

nanocompore/SampComp.py Show resolved Hide resolved
nanocompore/SampCompDB.py Show resolved Hide resolved
nanocompore/TxComp.py Show resolved Hide resolved
@@ -155,12 +163,43 @@ def gmm_test_anova(data, log_dwell=True, verbose=False):
# Generate an array of sample labels
Y = [ k for k,v in data[condition_labels[0]].items() for _ in v['intensity'] ] + [ k for k,v in data[condition_labels[1]].items() for _ in v['intensity'] ]

# Loop over multiple cv_types and n_components and for each fit a GMM
gmm_fit = fit_best_gmm(X, max_components=2, cv_types=['spherical', 'tied', 'diag', 'full'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you try all of the possible fits ? Isn't it enough to use "full" only ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, not sure here... the model that is kept in the end is not always the 'full'.. sometimes the lowest bic has different convariance types. However, using only full would save a lot of time...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be acceptable to use only full. The others are special cases of full anyway aren't they ?

@a-slide a-slide merged commit a2ce3ad into devel Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants