Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 anonymous Zarr data examples... #385

Open
DennisHeimbigner opened this issue Jan 7, 2019 · 18 comments
Open

S3 anonymous Zarr data examples... #385

DennisHeimbigner opened this issue Jan 7, 2019 · 18 comments

Comments

@DennisHeimbigner
Copy link

I am in the process of constructing th initial netcdf-c library handler
for the Zarr format. As part of this, I need to verify my assumptions
about the mapping of the storage to S3.
Are there any anonymously accessible zarr datasets that I can access
(read-only)?

@jhamman
Copy link
Member

jhamman commented Jan 7, 2019

@DennisHeimbigner - this dataset is on GCS but may work for you: https://storage.googleapis.com/pangeo-data/ecco/eccov4r3/

@DennisHeimbigner
Copy link
Author

This prefix is readable
https://storage.googleapis.com/pangeo-data
so it may work. Thanks.

@jakirkham
Copy link
Member

There's also a fixture directory in this repo, which has some dummy data for testing/validating the format/spec is still met during testing. This may be useful for seeding your own S3 bucket. Though it is quite small.

@DennisHeimbigner
Copy link
Author

Unfortunately, these do not appear to reside on S3 itself.

@jhamman
Copy link
Member

jhamman commented Jan 8, 2019

Right. This data is in GCS. Perhaps @jacobtomlinson knows of a public s3 zarr out there?

@jacobtomlinson
Copy link

No, but I can make one if you like?

@DennisHeimbigner
Copy link
Author

That would be helpful if you did. It does not have to be complex, I am just
trying to get the basic access correct.

@alimanfoo
Copy link
Member

The S3 example in the zarr tutorial uses a very small toy dataset that is publicly accessible. Bucket is here: http://zarr-demo.s3-eu-west-2.amazonaws.com/

@joshmoore
Copy link
Member

joshmoore commented Jan 21, 2019

Would there be any interest in having a https://www.minio.io/ -based setup using docker within travis so that s3 tests could be run? This would carry a s3fs requirement at least at the testing scope.

Edit: Looks like gh-293 may either make this unnecessary or be a good template for adding this for a AWS clone.

@alimanfoo
Copy link
Member

Would there be any interest in having a https://www.minio.io/ -based setup using docker within travis so that s3 tests could be run? This would carry a s3fs requirement at least at the testing scope.

Edit: Looks like gh-293 may either make this unnecessary or be a good template for adding this for a AWS clone.

Sorry for slow follow up here. I think this would be excellent. I had been concerned that the cloud storage class implementations that are not within the zarr code base were not getting put through the test suite, but this would solve that very nicely. I think #293 provides a template, but it would need a new PR to add test coverage for AWS S3 via s3fs.S3Map.

Also I noticed recently that GCS has support now for local emulation, so it should be possible to get something for GCS too via gcsfs.GCSMap. That could be done separately from the open PR to implement a GCS storage class via the official Python SDK (#252), which would be nice to finish but is a parallel piece of work.

@martindurant
Copy link
Member

GCS has support now for local emulation

how? where? I'd love to see it. I think I saw this mentioned elsewhere.

To @joshmoore , you don't need minio, you can more easily use moto, which is what the s3fs tests use.

@alimanfoo
Copy link
Member

alimanfoo commented Mar 27, 2019

Re emulation, sorry I think I got confused, I had seen this page about emulation for Google Cloud Datastore but of course that's something completely different from Google Cloud Storage.

@joshmoore
Copy link
Member

you don't need minio, you can more easily use moto, which is what the s3fs tests use.

Thanks, @martindurant. I hadn't seen moto before. Happy to have the tests use whatever's appropriate in this repo, especially if mocking is preferred to integration tests. For me, the minio setup is also useful for more production testing. Would you also suggest using moto in server mode for that?

@martindurant
Copy link
Member

I don't see why not. Moto lacks some rather specific features such as file versioning, but is pretty complete. minio also isn't exactly S3...

@meggart
Copy link
Member

meggart commented Apr 15, 2019

The S3 example in the zarr tutorial uses a very small toy dataset that is publicly accessible. Bucket is here: http://zarr-demo.s3-eu-west-2.amazonaws.com/

We are currently implementing an S3 backend for our Julia zarr package https://github.com/meggart/ZarrNative.jl/commits/S3storage . I wanted to ask if it is ok to use the dataset you mention here for our unit tests?

@alimanfoo
Copy link
Member

alimanfoo commented Apr 15, 2019 via email

@mhearne-usgs
Copy link

@alimanfoo Regarding this S3 example, what is the file format of the zaar-demo data? I've tried placing a .zarr file (directory) on S3, and I am having issues accessing it.

@joshmoore
Copy link
Member

@mhearne-usgs : see also https://github.com/martindurant/zarr/pull/1/files for an example of following @martindurant's moto suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants