Skip to content

Make feather stores read incrementally #2805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 19, 2022

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Dec 17, 2022

This changes feather stores not to use the official Apache Feather reader anymore, but rather use its lower-level building blocks to do reads incrementally. This means that loading a feather store should not take any time anymore; the actual reading only happens on demand as batches are accessed. This is implemented in a way that the store caches already read batches so they're not read again for concurrent queries.

NB: This technically drops support for reading Apache Feather V1 files, but VAST never wrote them to begin with and we never officially said we support that.

This changes `feather` stores not to use the official Apache Feather
reader anymore, but rather use its lower-level building blocks to do
reads incrementally. This means that loading a feather store should not
take any time anymore; the actual reading only happens on demand as
batches are accessed. This is implemented in a way that the store caches
already read batches so they're not read again for concurrent queries.

NB: This technically drops support for reading Apache Feather V1 files,
but VAST never wrote them to begin with and we never officially said we
support that.
@dominiklohmann dominiklohmann added the performance Improvements or regressions of performance label Dec 17, 2022
@dominiklohmann dominiklohmann requested a review from a team December 17, 2022 21:52
@dominiklohmann dominiklohmann force-pushed the topic/feather-incremental-read branch from 76c07f3 to 9e261fc Compare December 18, 2022 15:57
@dominiklohmann dominiklohmann requested a review from mavam December 18, 2022 21:49
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely written blog post. I have only minor remarks.

I was left wondering why this "issue" only concern Feather. Can you add a sentence about the relationship to Parquet? Most readers will be more familiar with this format.

@dominiklohmann
Copy link
Member Author

I was left wondering why this "issue" only concern Feather. Can you add a sentence about the relationship to Parquet? Most readers will be more familiar with this format.

I added a note at the bottom that we plan to make the same improvement for our Parquet stores.

@dominiklohmann dominiklohmann merged commit a66df2d into master Dec 19, 2022
@dominiklohmann dominiklohmann deleted the topic/feather-incremental-read branch December 19, 2022 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements or regressions of performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants