Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Opening a datatree from S3 bucket #322

Closed
vlevasseur073 opened this issue Mar 13, 2024 · 3 comments
Closed

Opening a datatree from S3 bucket #322

vlevasseur073 opened this issue Mar 13, 2024 · 3 comments

Comments

@vlevasseur073
Copy link

Dears,

it seems that the current version of datatree can't handle stores from cloud storage (tests made with S3 only).
For instance, trying to open a datatree following the same syntax as xarray.open_dataset (using fsspec chain URLs):

store="zip::s3://bucket/path/product.zarr.zip"
dt = datatree.open_datatree(store,engine="zarr",backend_kwargs={"storage_options": {"s3":secrets["s3input"]}})

where secrets["s3input"] is a dict containing the AWS secret keys and endpoint URLs.

fails with

ClientError                               Traceback (most recent call last)
File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113), in _error_wrapper(func, args, kwargs, retries)
    [112](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:112) try:
--> [113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113)     return await func(*args, **kwargs)
    [114](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:114) except S3_RETRYABLE_ERRORS as e:

File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408), in AioBaseClient._make_api_call(self, operation_name, api_params)
    [407](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:407)     error_class = self.exceptions.from_code(error_code)
--> [408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408)     raise error_class(parsed_response, operation_name)
    [409](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:409) else:

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Indeed in _open_datatree_zarr from datatree/io.py, the kwargs are not given to the zarr.open_group function, so that specifically in this case the storage_options are ignored.
As a workaround in my specific case, replacing in datatree/io.py l.87 (v0.0.14)

zds = zarr.open_group(store, mode="r")

by

storage_options = kwargs["backend_kwargs"]
zds = zarr.open_group(store, mode="r",**storage_options)

works just fine.

@TomNicholas
Copy link
Member

Hi @vlevasseur073, sorry for the slow reply here.

We would welcome a PR to fix this!

@vlevasseur073
Copy link
Author

Hi @TomNicholas, sorry that I have let this issue unanswered for a long time... I've been recently back to this issue and in the meantime I checked the status of the datatree integration into pydata/xarray. Finally, I've opened the equivalent issue pydata/xarray#9197 and propose a PR pydata/xarray#9198

Regards,
Vincent

@keewis
Copy link
Contributor

keewis commented Aug 13, 2024

closing in favor of pydata/xarray#9197

@keewis keewis closed this as completed Aug 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants