Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pca for sparse data #403

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
2 participants

Koncopd added some commits Dec 16, 2018

@Koncopd Koncopd requested a review from falexwolf Dec 20, 2018

@falexwolf

This comment has been minimized.

Copy link
Member

falexwolf commented Dec 26, 2018

This is nice! Thank you!

It appears to me that the benchmarks show that this only becomes relevant for very large data. So we need to be mindful to not break backward compatibility for all the small and medium-size datasets that people use (which we do by introducing the tiny difference). Don't you think that in the light of this, it would be better to leave the default as is (densifying) and have an option sparse_pca or something similar?

@Koncopd

This comment has been minimized.

Copy link
Member Author

Koncopd commented Dec 26, 2018

It appears to me that the benchmarks show that this only becomes relevant for very large data.

Hm, even for my example it is 77.14 MiB vs 893.92 MiB, so 10 times difference. This seems large to me, no?

@falexwolf

This comment has been minimized.

Copy link
Member

falexwolf commented Dec 27, 2018

Hm, even for my example it is 77.14 MiB vs 893.92 MiB, so 10 times difference. This seems large to me, no?

Yes, it's definitely large and it's awesome that you solved this problem! I just meant that it's not hitting people's computational resources limits: your example is 60K x 2K, so quite big already, if you densify you need 800MB, which is easily available even on a laptop. That's what I meant.

What do you think?

@Koncopd

This comment has been minimized.

Copy link
Member Author

Koncopd commented Dec 27, 2018

Yeah, this seems important only for large datasets in that sense. So, i will add sparse_pca option with False by default.

@Koncopd

This comment has been minimized.

Copy link
Member Author

Koncopd commented Jan 14, 2019

Not sure what kind of test to add for this...

@falexwolf

This comment has been minimized.

Copy link
Member

falexwolf commented Jan 21, 2019

As discussed, @Koncopd will try to integrate this into scikit-learn itself and not into Scanpy. 😄

@Koncopd

This comment has been minimized.

Copy link
Member Author

Koncopd commented Feb 4, 2019

Similar pull request exists already in sklearn.
scikit-learn/scikit-learn#12841
Will watch.

@flying-sheep flying-sheep force-pushed the master branch 2 times, most recently from 3efb194 to fc84096 Feb 12, 2019

@falexwolf falexwolf referenced this pull request Mar 10, 2019

Open

TODO: Backwards-compat breaking changes #453

0 of 13 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.