Fix pca on sparse data reproducibility #1240

ivirshup · 2020-05-21T07:01:14Z

Bugfix for the sparse pca.

It looks like we forgot to pass a random seed when this is used... But we also never had a test that checks if you run the function twice with the same random seed it returns the same result.

This PR fixes both these issues. The new tests are a bit slow, but are definitely needed.

I've also added a fixture for returning a copy of the pbmc3k dataset which has been normalized and had highly_variable_genes run on it. Preparation of the object should only happen once per run of the suite, but a new copy will be provided for each test that uses it. This was done to speed up the new tests.

ivirshup · 2020-05-21T07:57:29Z

To limit the amount of time we have to wait for rebuilding, I've added the change from #1236 here

ivirshup added 3 commits May 21, 2020 17:17

Make sure random seed has an effect for pca

dd557ba

Format test_pca.py

8e4623e

Note fix in changelog

de1d7ec

ivirshup force-pushed the pca-reproducibility branch from a50d1e2 to de1d7ec Compare May 21, 2020 07:22

Add docfix for pca

436570c

ivirshup merged commit ec3a44f into scverse:master May 21, 2020

This was referenced May 21, 2020

Can I make a deep copy of the scanpy object? #1239

Closed

Fix docstring for pca, fixes #1234 #1236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pca on sparse data reproducibility #1240

Fix pca on sparse data reproducibility #1240

ivirshup commented May 21, 2020

ivirshup commented May 21, 2020

Fix pca on sparse data reproducibility #1240

Fix pca on sparse data reproducibility #1240

Conversation

ivirshup commented May 21, 2020

ivirshup commented May 21, 2020