Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required changes in xarray to avoid creating indexes #14

Closed
TomNicholas opened this issue Mar 8, 2024 · 6 comments
Closed

Required changes in xarray to avoid creating indexes #14

TomNicholas opened this issue Mar 8, 2024 · 6 comments
Labels
xarray Requires changes to xarray upstream
Milestone

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Mar 8, 2024

Everything mentioned in this issue pydata/xarray#8699 also needs to be done for this library to work.

EDIT: This is a lie, see comment below

@TomNicholas TomNicholas added this to the v1.0 milestone Mar 8, 2024
@TomNicholas TomNicholas added the xarray Requires changes to xarray upstream label Mar 10, 2024
@TomNicholas TomNicholas mentioned this issue Mar 10, 2024
15 tasks
@TomNicholas TomNicholas changed the title Required changes in xarray Required changes in xarray to avoid creating indexes Mar 15, 2024
@TomNicholas
Copy link
Member Author

TomNicholas commented Mar 15, 2024

The above isn't quite true. We should distinguish here between

  1. Things that are needed to support creating xarray Datasets with no indexes, which we want in order to avoid loading any data. This means Currently no way to create a Coordinates object without indexes for 1D variables pydata/xarray#8704 so we can pass indexes={} to xarray constructors. This would be useful because a concatenation problem where we know in advance the ordering of all datasets (i.e. combine_nested rather than combine_by_coords) doesn't require creating any indexes. I'll track that in this issue.

  2. Things that are needed to support using the xarray backend entrypoint system to open datasets from disk as ManifestArray-backed Variables just by passing a keyword arg to open_dataset/open_mfdataset. This requires dodging some internal array wrapping that occurs in the depths of xarray's backend machinery. I'll track that in Opening via xarray backendentrypoint #35.

@TomNicholas
Copy link
Member Author

TomNicholas commented Mar 20, 2024

  1. Also requires Test concat of dimension coordinate not backed by an index #44 to be solved upstream in xarray.

EDIT: Raised pydata/xarray#8871 on xarray to track it.

@TomNicholas
Copy link
Member Author

TomNicholas commented Mar 25, 2024

I made a branch on my fork of xarray, which contains the (unmerged upstream) PRs pydata/xarray#8711, pydata/xarray#8714 and pydata/xarray#8872. All three of these are needed for (1), i.e. to be able to create virtual datasets not backed by any indexes, and concatenate them without attempting to create any indexes.

Once those three changes are merged upstream and released I'll be able to close this issue (though #35 will still be unsolved.)

@TomNicholas
Copy link
Member Author

Actually now the only unmerged PR to xarray that is needed for (1) is pydata/xarray#8872, so I've changed the VirtualiZarr build to only point to that in 08774f7.

@jsignell
Copy link
Contributor

Is this now closable? 👀

@TomNicholas
Copy link
Member Author

Yes I think so! But note that additional changes in xarray would be required to support #18 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xarray Requires changes to xarray upstream
Projects
None yet
Development

No branches or pull requests

2 participants