-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load AnnData From Zarr #807
Conversation
Was just thinking about something related and I think making separate schemas for the different kinds of options makes sense - no need for an enum, the options just have to be one of these schemas or none at all. This would allow us to have an OME-TIFF loader directly. |
Yes direct |
Is there any reason not to use an array of strings instead? Then you would not need to do |
Can we change |
I would use posix-like ('/') paths relative to the root. All valid keys in a zarr store are posix-like paths, and since these are paths to arrays within the store, it makes sense to provide a path. This is what is done in the multiscale specification. That's the point of being able to do: await openArray({ store, path }); // path is a posix-like path with "/" The "store" takes care of translating a zarr path: # e.g. on windows
import zarr
root = zarr.open('mydataset.zarr') # DirectoryStore('mydataset.zarr')
arr = root.get('my/nested/array/path') # opens an array from within hierarchy
arr2 = root.get('my\nested\array\path') # raises an exception |
@manzt Thanks for the comments on the zarr store. @keller-mark I went with slashes on Trevor's recommendations only because I think the nested stuff can get a little hairy, and it is cleaner to not have to |
…sortium/vitessce into ilan-gold/load_anndata
Co-authored-by: Mark Keller <7525285+keller-mark@users.noreply.github.com>
I am starting this out as a draft since I think there are a few open questions:
options
in a schema? Do we expect this to work more generally than justanndata-zarr
? I think we would the json schema to operate on an enum of file type in this case but I don't know if this is possible (i.e foranndata-zarr
, use a certainoptions
schema as opposed to another one) - for example this is a schema for visualizing thehabib
dataexpression-matrix.zarr
loader so that we can loadobsm
subsets of the cell x gene matrix. I think the idea here would be allowing people to use the highly variable genes without subsettingX
as I do below.genesFilter
, open to other obtionsX
for the expression matrix? My sense is no.As far as what the PR does do, you can specify different parts of the AnnData store to be mapped to parts of the Vitessce configuration i.e
X
for expression matrix, orobsm.leiden
for cell set labels. This is all accomplished through the view configuration which has options for all of this. We introduce a new set of loaders to handle this,anndata-cell-sets.zarr
etc.I also committed some example view configs which I will remove but the process for downloading/viewing them is as follows:
slide-seq
: I can send you the data if you do not have it. Then I ranbefore writing to zarr.
habib
: I ranbefore writing to zarr. Demo is here
pbmc
: No alterations. I got this dataset from hereDemo is here
pbmc_processed
: No alterations. I got this dataset from hereDemo is here