-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tx compare refactor #54
Conversation
I've just realised that this breaks SampCompDB.save_report(). I'm fixing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Tom.
@@ -155,12 +163,43 @@ def gmm_test_anova(data, log_dwell=True, verbose=False): | |||
# Generate an array of sample labels | |||
Y = [ k for k,v in data[condition_labels[0]].items() for _ in v['intensity'] ] + [ k for k,v in data[condition_labels[1]].items() for _ in v['intensity'] ] | |||
|
|||
# Loop over multiple cv_types and n_components and for each fit a GMM | |||
gmm_fit = fit_best_gmm(X, max_components=2, cv_types=['spherical', 'tied', 'diag', 'full']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you try all of the possible fits ? Isn't it enough to use "full" only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, not sure here... the model that is kept in the end is not always the 'full'.. sometimes the lowest bic has different convariance types. However, using only full would save a lot of time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be acceptable to use only full. The others are special cases of full anyway aren't they ?
I've added unit tests (using pytest) and refactored TxCompare. Now the GMM option can do (and report) both the anova and logit while doing the GMM fitting only once. I've also added the relevant options in the CLI.