Fix pca on sparse data reproducibility #1240
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bugfix for the sparse pca.
It looks like we forgot to pass a random seed when this is used... But we also never had a test that checks if you run the function twice with the same random seed it returns the same result.
This PR fixes both these issues. The new tests are a bit slow, but are definitely needed.
I've also added a fixture for returning a copy of the pbmc3k dataset which has been normalized and had
highly_variable_genes
run on it. Preparation of the object should only happen once per run of the suite, but a new copy will be provided for each test that uses it. This was done to speed up the new tests.