Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Remove frozen umap #576
Some tests should fail as there are probably differences in the neighbor algorithm.
This is also why this is a backwards-compat breaking change. Can you just visually inspect https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html and see what's going on?
This is another notebook that should still do something meaningful after the change: https://nbviewer.jupyter.org/github/theislab/scanpy_usage/blob/master/170502_paul15/paul15.ipynb
And finally, of course, https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html should give somewhat consistent results. But I expect slight variations and no perfect consistence... Actually, I'd expect the associated tests (https://github.com/theislab/scanpy/blob/master/scanpy/tests/notebooks/test_pbmc3k.py) to fail. Can you check?
Yes, I would have expected that the adjacency matrix will differ slightly and hence,
One other thing.
We'd like to have two new convenience functions:
The first maps the new data into the existing neighbor graph based on the chosen latent representation.
The second maps the new data into the existing UMAP embedding.
For the second function, one just needs to find a good way of wrapping
For the first, I'm not quite sure how easy it is easy. I'm using
Maybe what UMAP does internally is already sufficient, but I don't know.
Can you investigate and if it's easy cover in this PR? If it's tricky, let's wait for another PR.
Alright! I've got a little example case I'd probably be using for a test case here (doublet prediction by simulation and projection).
My current thoughts:
Basic PCA projection
def pca_update(tgt, src, inplace=True): # TODO: Make sure we know the settings (just whether to center?) from src if not inplace: tgt = tgt.copy() if sparse.issparse(tgt.X): X = tgt.X.toarray() else: X = tgt.X.copy() X -= np.asarray(tgt.X.mean(axis=0)) tgt_pca = np.dot(X, src.varm["PCs"]) tgt.obsm["X_pca"] = tgt_pca return tgt
This looks good!
Storing the forest in the AnnData is good! It should also be compatible with the updates the @tomwhite plans on UMAP and pynndescent (UMAP will depend on pynndescent) as that should be the most basic object to store when to enable queries later on...
But I would not store the "forest" in a default neighbors call. Or do you have any estimate on how large it is?
@Koncopd, can we merge this without the
Is what I wrote in the beginning. I think it turned out tricky and is a case for #562 (comment). So, let's keep this PR really simple and just be about removing the legacy code.
Your statement about "all tests pass except for the PAGA tests" is still true? Did you manually inspect the PAGA notebook and does it look consistent? Just a few cosmetic things should have changed, I guess.
If yes, we'll merge this, now that