Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix t-tests when variance is zero #621

Merged
merged 2 commits into from Apr 28, 2019

Conversation

@ivirshup
Copy link
Collaborator

commented Apr 25, 2019

Just a quick fix, should probably replace most of this code with scipy.stats.ttest_ind/ scipy.stats.stats.ttest_ind_from_stats with equal_var=False.

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

Someone should review this before it's merged. I think this won't cause any problems, but I'm also not too familiar with the DE code.

@LuckyMD

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

I wonder why this wasn't done in the first place. Is scipy not already a dependency of scanpy? Or is this slower than the initial implementation?

@falexwolf

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

Looks great! I wasn't aware of this high-dimensional version of a t-test in Scipy, which seems to be as efficient as the current implementation. I only investigated thoroughly for Wilcoxon rank and found that Scipy doesn't have a scalable version to offer.

But yes, this will get merged after 1.4.1.

@falexwolf falexwolf referenced this pull request Apr 26, 2019
0 of 13 tasks complete
@falexwolf

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

1.4.1 is out, are we sure this is as scalable as it was before and not a backwards breaking change. If yes, we can merge immediately.

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 27, 2019

I'm not completely sure this doesn't break anything, but the regression tests pass. The internal code is very similar, so I'm not too worried about these changes.

It does look like it's (very) slightly slower. Running this a thousand times for pbmc68k dataset took ~2.3% longer (about 1.4 ms per run) than the previous version. That said, we're very inefficient about mean and variance calculation, so I think that's a better place to optimize.

Edit: I've force pushed to fix some minor formatting issues (trailing white space, blank line, typo) that I didn't think deserved it's own commit.

@ivirshup ivirshup force-pushed the ivirshup:de_quick_fix branch from d1c4347 to c681343 Apr 27, 2019

@falexwolf

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

Great! 2.3% is nothing...

@falexwolf falexwolf merged commit 0712168 into theislab:master Apr 28, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.