Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap #502

Closed
wants to merge 4 commits into from
Closed

Bootstrap #502

wants to merge 4 commits into from

Conversation

eroell
Copy link
Contributor

@eroell eroell commented Jan 14, 2024

PR Checklist

  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes
Addresses issue #497.
This PR introduces convenience functionality for using (random sampling with replacement) bootstrapping towards error estimation of distances between perturbations.

For the user, no breaking changes are introduced. The changes visible are

  • additional parameters bootstrap: bool = False, n_bootstrap: int = 100, bootstrap_random_state: int = 0 in the methods Distance.one_sided and Distance.pairwise
  • an additional method Distance.bootstrap, a bootsrapping analogon to Distance.__call__

Technical details

Additional context: Examples

__call__ vs bootstrap

import pertpy as pt
import scanpy as sc
import matplotlib.pyplot as plt
from seaborn import clustermap

adata = pt.dt.distance_example()

dist = pt.tl.Distance("euclidean")

print(dist(X, Y))
print(dist.bootstrap(X, Y, n_bootstrap=100))
2.640588
MeanVar(mean=3.3500843, variance=0.06426364)

onesided

import pertpy as pt
import scanpy as sc
import matplotlib.pyplot as plt
from seaborn import clustermap

adata = pt.dt.distance_example()

dist = pt.tl.Distance("euclidean")

dists_one = dist.onesided_distances(adata, groupby="perturbation", selected_group="control")
print(dists_one.head())

dists_one, vars_one = dist.onesided_distances(
    adata, groupby="perturbation", selected_group="control", bootstrap=True
)
print(dists_one.head())
print(vars_one.head())
perturbation
p-INTERGENIC393453     1.823859
p-sgELF1-2             1.992385
p-INTERGENIC216151     1.856206
p-INTERGENIC1144056    1.999334
p-sgELF1-5             2.132977

perturbation
p-INTERGENIC393453     2.194509
p-sgELF1-2             2.335306
p-INTERGENIC216151     2.192096
p-INTERGENIC1144056    2.310208
p-sgELF1-5             2.421595

perturbation
p-INTERGENIC393453     0.133328
p-sgELF1-2             0.150482
p-INTERGENIC216151     0.136785
p-INTERGENIC1144056    0.124505
p-sgELF1-5             0.158694

pairwise

import pertpy as pt
import scanpy as sc
import matplotlib.pyplot as plt
from seaborn import clustermap

adata = pt.dt.distance_example()

dist = pt.tl.Distance("euclidean")

dists = dist.pairwise(adata, groupby="perturbation")

means, vars = dist.pairwise(adata, groupby="perturbation", bootstrap=True)

clustermap(dists, robust=True, figsize=(10, 10))
clustermap(means, robust=True, figsize=(10, 10))
clustermap(vars, robust=True, figsize=(10, 10))

image

image

image

@eroell
Copy link
Contributor Author

eroell commented Jan 14, 2024

There are some tests failing, solving this together with maybe some more overhauling - opened draft PR with examples to raise some suggestions, especially with examples, of what our ideas could materialize to.

Slightly bugged that the bootstrapping mean is typically larger than the single-computed distance, not 100% sure about reason and validity right now

@Zethson
Copy link
Member

Zethson commented Jan 14, 2024

Yeah, smaller test datasets are also fine. We have issues with some distances failing on smaller test datasets so we had to sadly keep them a bit on the larger side.

I'm fixing the scgen test now. The 3.11 pre-release fails due to something numba related.

@eroell
Copy link
Contributor Author

eroell commented Jan 14, 2024

Yeah, smaller test datasets are also fine. We have issues with some distances failing on smaller test datasets so we had to sadly keep them a bit on the larger side.

I'm fixing the scgen test now. The 3.11 pre-release fails due to something numba related.

(3 Failures caused by these changes right now I believe, to be fixed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants