Prototype of `object-store`-based Store implementation #1661

kylebarron · 2024-02-08T15:30:21Z

Prototype of object-store based store.

object-store is a rust crate for interoperating with remote object stores like S3, GCS, Azure, etc. See the highlights section of its docs. It doesn't even try to implement a filesystem interface, instead focusing on the core atomic operations supported across object stores. This makes it a good candidate for use with a Zarr v3 Store.

object-store-python is a Python binding to object-store. With roeap/object-store-python#6, I added async methods to the library. So the underlying Rust binary will return a Python coroutine that can be awaited.

That and related PRs haven't been merged yet, but you can try this out locally by installing

pip install git+https://github.com/kylebarron/object-store-python.git@dev#subdirectory=object-store

note that you need the Rust compiler on your computer. Install that by following these docs.

TODO:

Implement multiple-range support in the underlying object-store-python library
Examples
Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

jhamman · 2024-02-08T17:19:12Z

Amazing @kylebarron! I'll spend some time playing with this today.

kylebarron · 2024-02-08T21:43:54Z

With roeap/object-store-python#9 it should be possible to fetch multiple ranges within a file concurrently with range coalescing (using get_ranges_async). Note that this object-store API accepts multiple ranges within one object, which is still not 100% aligned with the Zarr get_partial_values because that allows fetches across multiple objects.

That PR also adds a get_opts function which now supports "offset" and "suffix" ranges, of the sort Range:N- and Range:-N, which would allow removing the raise NotImplementedError on line 37.

martindurant · 2024-02-09T01:20:48Z

martindurant/rfsspec#3

martindurant · 2024-02-09T01:21:58Z

src/zarr/v3/store/object_store.py

+    async def get_partial_values(
+        self, key_ranges: List[Tuple[str, Tuple[int, int]]]
+    ) -> List[bytes]:
+        # TODO: use rust-based concurrency inside object-store


How I did it in rfsspec: https://github.com/martindurant/rfsspec/blob/main/src/lib.rs#L141

object-store has a built-in function for this: get_ranges. With the caveat that it only manages multiple ranges in a single file.

get_ranges also automatically handles request merging for nearby ranges in a file.

Yes I know, but mine already did the whole thing, so I am showing how I did that.

normanrz · 2024-02-12T14:30:14Z

Great work @kylebarron!
What are everbody's thoughts on having this in zarr-python vs. spinning it out as a separate package?

martindurant · 2024-02-12T14:45:13Z

What are everbody's thoughts on having this in zarr-python vs. spinning it out as a separate package?

I suggest we see whether it makes any improvements first, so it's author's choice for now.

kylebarron · 2024-02-12T16:00:11Z

While @rabernat has seen some impressive perf improvements in some settings when making many requests with Rust's tokio runtime, which would possibly also trickle down to a Python binding, the biggest advantage I see is improved ease of use in installation.

A common hurdle I've seen is handling dependency management, especially around boto3, aioboto3, etc dependencies. Versions need to be compatible at runtime with any other libraries the user also has in their environment. And Python doesn't allow multiple versions of the same dependency at the same time in one environment. With a Python library wrapping a statically-linked Rust binary, you can remove all Python dependencies and remove this class of hardship.

The underlying Rust object-store crate is stable and under open governance via the Apache Arrow project. We'll just have to wait on some discussion in object-store-python for exactly where that should live.

I don't have an opinion myself on where this should live, but it should be on the order of 100 lines of code wherever it is (unless the v3 store api changes dramatically)

jhamman · 2024-02-12T17:07:22Z

I suggest we see whether it makes any improvements first, so it's author's choice for now.

👍

What are everbody's thoughts on having this in zarr-python vs. spinning it out as a separate package?

I want to keep an open mind about what the core stores provided by Zarr-Python are. My current thinking is that we should just do a MemoryStore and a LocalFilesystemStore. Everything else can be opt-in by installing a 3rd party package. That said, I like having a few additional stores in the mix as we develop the store interface since it helps us think about the design more broadly.

martindurant · 2024-02-12T18:18:11Z

A common hurdle I've seen is handling dependency management, especially around boto3, aioboto3, etc dependencies.

This is no longer an issue, s3fs has much more relaxed deps than it used to. Furthermore, it's very likely to be already part of an installation environment.

normanrz · 2024-02-12T18:18:12Z

I want to keep an open mind about what the core stores provided by Zarr-Python are. My current thinking is that we should just do a MemoryStore and a LocalFilesystemStore. Everything else can be opt-in by installing a 3rd party package.

I agree with that. I think it is beneficial to keep the number of dependencies of core zarr-python small. But, I am open for discussion.

That said, I like having a few additional stores in the mix as we develop the store interface since it helps us think about the design more broadly.

Sure! That is certainly useful.

itsgifnotjiff · 2024-02-23T22:24:53Z

This is awesome work, thank you all!!!

src/zarr/v3/store/object_store.py

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

kylebarron · 2024-10-21T18:13:39Z

The object-store-python package is not very well maintained roeap/object-store-python#24, so I took a few days to implement my own wrapper around the Rust object_store crate: https://github.com/developmentseed/object-store-rs

I'd like to update this PR soonish to use that library instead.

martindurant · 2024-10-21T19:32:54Z

If the zarr group prefers object-store-rs, we can move it into the zarr-developers org, if you like. I would like to be involved in developing it, particularly if it can grow more explicit fsspec compatible functionality.

kylebarron · 2024-10-22T18:01:46Z

I have a few questions because the Store API has changed a bit since the spring.

There's a new BufferPrototype object. Is the BufferPrototype chosen by the store implementation or the caller? It would be very nice if this prototype could be chosen by the store implementation, because then we could return a RustBuffer object that implements the Python buffer protocol, but doesn't need to copy the buffer into Python memory.
Similarly for puts, is Buffer guaranteed to implement the buffer protocol? Contrary to fetching, we can't do zero-copy puts right now with object-store

I like that list now returns an AsyncGenerator. That aligns well with the underlying object-store rust API, but for technical reasons we can't expose that as an async iterable to Python yet (apache/arrow-rs#6587), even though we do expose the readable stream to Python as an async iterable.

TomAugspurger · 2024-10-22T18:55:00Z

Is the BufferPrototype chosen by the store implementation or the caller? It would be very nice if this prototype could be chosen by the store implementation, because then we could return a RustBuffer object that implements the Python buffer protocol, but doesn't need to copy the buffer into Python memory.

This came up in the discussion at https://github.com/zarr-developers/zarr-python/pull/2426/files/5e0ffe80d039d9261517d96ce87220ce8d48e4f2#diff-bb6bb03f87fe9491ef78156256160d798369749b4b35c06d4f275425bdb6c4ad. By default, it's passed as default_buffer_prototype though I think the user can override at the call site or globally.

Does it look compatible with what you need?

kylebarron · 2024-10-22T19:28:36Z

I think I'm confused why it's a parameter at all. Why shouldn't it return a protocol, and the store can implement whatever interface is most convenient to return data.

Put another way: when the store chooses the return interface, it can ensure no memory copies, and then the caller of the store can decide whether they need to copy the memory elsewhere.

TomAugspurger · 2024-10-22T19:32:08Z

Yeah, I'm not familiar with that. Looks like @madsbk added it in #1910, so presumably it's related to whether or not the data will end up on the GPU? I guess that's one bit of context the Store won't necessarily have, assuming it can place the data in host or device memory, and so it being a parameter might be necessary.

d-v-b · 2024-10-22T19:39:49Z

I do think making the concrete return type of store.get(key, prototype) depend on its prototype argument is a bit of an API smell. If we had no other constraints, the obvious play would be to let store.get return what the underlying storage medium returns, and ask the caller of store.get to convert that return value into the buffer of choice. I am guessing there is some reason today why we don't do this, but I wonder what abstractions we should add / change to remove this reason.

kylebarron · 2024-10-22T19:43:24Z

It makes sense that we'll always need a copy for CPU -> GPU, but I'd like to avoid situations where a store must copy data for CPU -> CPU. Right now that could be unavoidable depending on the buffer class the user passes in. Are we saying that the user needs to know the copy semantics of the underlying store?

martindurant · 2024-10-22T19:48:52Z

It makes sense that we'll always need a copy for CPU -> GPU

Actually no, I fully expect that the rapids team should be able to make a direct object-store/NIC->GPU store class and also to do filter decoding on the GPU ( https://docs.rapids.ai/api/kvikio/stable/zarr/ ). Whether any of that ends up here, is another matter.

kylebarron · 2024-10-22T19:59:23Z

Sure, I really meant to say "if the store loads data into the CPU, then we'll need to make a copy for CPU to GPU". I'm not surprised that it's possible to make direct to GPU readers.

jhamman · 2024-10-22T20:02:24Z

@kylebarron - in terms of testing this, you should take a look at how we're doing this for other stores.

Basically, we've created a reusable test harness in zarr.testing.store.StoreTests. You can subclass that and it will run a bunch of store-only tests for you.

zarr-python/tests/test_store/test_remote.py

Lines 107 to 134 in 5807cba

    
           class TestRemoteStoreS3(StoreTests[RemoteStore, cpu.Buffer]): 
        
               store_cls = RemoteStore 
        
               buffer_cls = cpu.Buffer 
        
               @pytest.fixture 
        
               def store_kwargs(self, request) -> dict[str, str | bool]: 
        
                   fs, path = fsspec.url_to_fs( 
        
                       f"s3://{test_bucket_name}", endpoint_url=endpoint_url, anon=False, asynchronous=True 
        
                   ) 
        
                   return {"fs": fs, "path": path, "mode": "r+"} 
        
               @pytest.fixture 
        
               def store(self, store_kwargs: dict[str, str | bool]) -> RemoteStore: 
        
                   return self.store_cls(**store_kwargs) 
        
               async def get(self, store: RemoteStore, key: str) -> Buffer: 
        
                   #  make a new, synchronous instance of the filesystem because this test is run in sync code 
        
                   new_fs = fsspec.filesystem( 
        
                       "s3", endpoint_url=store.fs.endpoint_url, anon=store.fs.anon, asynchronous=False 
        
                   ) 
        
                   return self.buffer_cls.from_bytes(new_fs.cat(f"{store.path}/{key}")) 
        
               async def set(self, store: RemoteStore, key: str, value: Buffer) -> None: 
        
                   #  make a new, synchronous instance of the filesystem because this test is run in sync code 
        
                   new_fs = fsspec.filesystem( 
        
                       "s3", endpoint_url=store.fs.endpoint_url, anon=store.fs.anon, asynchronous=False 
        
                   ) 
        
                   new_fs.write_bytes(f"{store.path}/{key}", value.to_bytes())

madsbk · 2024-10-23T07:48:19Z

The idea with BufferPrototype is to let the user tell how he/she wants the data to the whole stack. Stores and codecs can then optimize for a specific buffer type or just call .as_numpy_array() to get host memory access (zero-copy if the data is already on in host memory). Similarly, they can use .from_bytes() to create a new Buffer from a memoryview (zero-copy). See MemoryStore.get().

@kylebarron you should be able to create a Buffer from a RustBuffer zero-copy like:

async def get(
        self,
        key: str,
        prototype: BufferPrototype,
        byte_range: tuple[int | None, int | None] | None = None,
    ) -> Buffer | None:

    the_rust_buffer: RustBuffer = # load data into a rust buffer
    return prototype.buffer.from_buffer(memoryview(the_rust_buffer))

Now, if the user request a GPU buffer, a later codec can decide to move the data to the GPU and maybe use nvCOMP to decompress the data etc.

kylebarron · 2024-10-23T20:32:06Z

Can someone detail the semantics of ByteRangeRequest? It's a type hint of

zarr-python/src/zarr/abc/store.py

Line 18 in 9dd9ac6

ByteRangeRequest: TypeAlias = tuple[int | None, int | None]

But that type hint on its own isn't fully descriptive, and I can't find any documentation about it. This is what I think it means:

Tuple[int, int]: This is a byte range starting with the first int and ending (exclusive) with the second int. This is a range, not a start and a length, right?
Tuple[None, None]. I assume this is invalid?
Tuple[int, None]: This is an "offset" request? All the bytes after the first int?
Tuple[None, int]: This is what I don't really know. Is this the same as [0, int]? Or is this a suffix request saying the last int bytes of the file?

kylebarron · 2024-10-23T20:33:02Z

The idea with BufferPrototype is to let the user tell how he/she wants the data to the whole stack.

I'm not really a fan of this API, but I don't know the GPU side well enough to propose something else.

martindurant · 2024-10-23T20:36:15Z

This is what I think it means:

That is certainly what it means when used in fsspec; None in the first place is the same as 0 and None in the second place is the same as "end"/"size". Note that they can can be negative, so a suffix range would be (-N, None).

I can't guarantee if the same convention is used here, but zarr blocks are either whole (None, None) or the exact range (start, stop) is known.

kylebarron · 2024-10-23T21:14:08Z

I can't guarantee if the same convention is used here, but zarr blocks are either whole (None, None) or the exact range (start, stop) is known.

If zarr blocks are either whole or known, then shouldn't the type hint for the store be

ByteRangeRequest: TypeAlias = tuple[int, int] | None

?

kylebarron · 2024-10-23T21:15:26Z

@d-v-b it looks like you added the type hint in #2065, can you shed some light on this?

martindurant · 2024-10-23T21:19:03Z

or maybe tuple[int, int] | tuple[None, None] to fit fsspec's convention. I don't think there's any plausible use for suffix "the last N bytes" (fsspec uses this for things like parquet footers).

normanrz · 2024-10-24T06:37:07Z

The suffix request is required for sharding. In shard files, the index containing the byte ranges of the chunks is, by default, at the end of the file. The size of the index can be statically determined from the array metadata. The size of the shard file can not be inferred in the general case. To avoid a preflight request to determine the file size, the suffix request is required. Most HTTP servers including Object Storage services support suffix requests.

d-v-b · 2024-10-24T08:32:32Z

@d-v-b it looks like you added the type hint in #2065, can you shed some light on this?

I introduced that type so that we could have exactly this conversation -- prior to its definition, we had various functions across the codebase that were taking a byte range parameter, but the type of that parameter wasn't defined in a central place. I'm not attached to this particular type! We can totally change it to something nicer, provided the semantics of that type covers all required use cases.

kylebarron · 2024-10-24T14:01:31Z

The suffix request is required for sharding.

Is there an example of a suffix request somewhere in this repo, so we can see how the range is passed as an argument to the store?

Most HTTP servers including Object Storage services support suffix requests.

Except for Azure 😢. The default implementation of the object-store crate is to use suffix requests for other stores, but two requests for azure.

kylebarron · 2024-10-24T14:10:05Z

we had various functions across the codebase that were taking a byte range parameter, but the type of that parameter wasn't defined in a central place.

I think it's worth considering changing it to a dataclass, because the semantics of the tuple are not always clear. And elsewhere in the codebase, a "chunk slice" refers to start and length, not start and end.

d-v-b · 2024-10-24T14:25:25Z

we had various functions across the codebase that were taking a byte range parameter, but the type of that parameter wasn't defined in a central place.

I think it's worth considering changing it to a dataclass, because the semantics of the tuple are not always clear. And elsewhere in the codebase, a "chunk slice" refers to start and length, not start and end.

Agreed, we definitely need to bump up the literacy of this type. I opened #2437 for this discussion.

normanrz · 2024-10-24T14:27:17Z

The suffix request is required for sharding.

Is there an example of a suffix request somewhere in this repo, so we can see how the range is passed as an argument to the store?

https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/codecs/sharding.py#L700-L702

Initial object-store implementation

14be826

jhamman mentioned this pull request Feb 8, 2024

Zarr Sprint Topics zarr-developers/geozarr-spec#33

Open

martindurant reviewed Feb 9, 2024

View reviewed changes

Merge branch 'v3' into kyle/object-store

a492bf0

jhamman added the V3 Affects the v3 branch label Feb 13, 2024

Merge branch 'v3' into kyle/object-store

50b6c47

dcherian reviewed Feb 27, 2024

View reviewed changes

src/zarr/v3/store/object_store.py Outdated Show resolved Hide resolved

Update src/zarr/v3/store/object_store.py

afa79af

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

This was referenced Apr 9, 2024

Generating references from files in S3 (using kerchunk + fsspec) zarr-developers/VirtualiZarr#61

Closed

Using hidefix to determine byte ranges in HDF files? gauteh/hidefix#38

Open

Using cog3pio to determine byte ranges in COG files? weiji14/cog3pio#16

Open

jhamman added this to the After 3.0.0 milestone Apr 22, 2024

jhamman changed the base branch from v3 to main October 14, 2024 20:55

Merge branch 'main' into kyle/object-store

c466f9f

Merge branch 'main' into kyle/object-store

c3e7296

update

f5c884b

kylebarron mentioned this pull request Oct 22, 2024

Added Store.getsize #2426

Open

6 tasks

kylebarron mentioned this pull request Oct 23, 2024

Support range in GetOptions developmentseed/obstore#40

Merged

d-v-b mentioned this pull request Oct 24, 2024

Make the ByteRangeRequest more literate #2437

Open

TomNicholas mentioned this pull request Oct 24, 2024

Add Zarr Reader(s) zarr-developers/VirtualiZarr#262

Open

2 tasks

Prototype of object-store-based Store implementation #1661

Are you sure you want to change the base?

Prototype of object-store-based Store implementation #1661

Conversation

kylebarron commented Feb 8, 2024 • edited Loading

jhamman commented Feb 8, 2024

kylebarron commented Feb 8, 2024

martindurant commented Feb 9, 2024

martindurant Feb 9, 2024

Choose a reason for hiding this comment

kylebarron Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

martindurant Feb 9, 2024

Choose a reason for hiding this comment

normanrz commented Feb 12, 2024

martindurant commented Feb 12, 2024

kylebarron commented Feb 12, 2024

jhamman commented Feb 12, 2024

martindurant commented Feb 12, 2024

normanrz commented Feb 12, 2024

itsgifnotjiff commented Feb 23, 2024

kylebarron commented Oct 21, 2024

martindurant commented Oct 21, 2024

kylebarron commented Oct 22, 2024 • edited Loading

TomAugspurger commented Oct 22, 2024

kylebarron commented Oct 22, 2024

TomAugspurger commented Oct 22, 2024 • edited Loading

d-v-b commented Oct 22, 2024

kylebarron commented Oct 22, 2024

martindurant commented Oct 22, 2024

kylebarron commented Oct 22, 2024

jhamman commented Oct 22, 2024

madsbk commented Oct 23, 2024 • edited Loading

kylebarron commented Oct 23, 2024

kylebarron commented Oct 23, 2024

martindurant commented Oct 23, 2024

kylebarron commented Oct 23, 2024

kylebarron commented Oct 23, 2024

martindurant commented Oct 23, 2024

normanrz commented Oct 24, 2024

d-v-b commented Oct 24, 2024

kylebarron commented Oct 24, 2024

kylebarron commented Oct 24, 2024

d-v-b commented Oct 24, 2024

normanrz commented Oct 24, 2024

Prototype of `object-store`-based Store implementation #1661

Prototype of `object-store`-based Store implementation #1661

kylebarron commented Feb 8, 2024 •

edited

Loading

kylebarron Feb 9, 2024 •

edited

Loading

kylebarron commented Oct 22, 2024 •

edited

Loading

TomAugspurger commented Oct 22, 2024 •

edited

Loading

madsbk commented Oct 23, 2024 •

edited

Loading