Skip to content

Implement Apache Parquet & Apache Feather V2 stores #2413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Jul 14, 2022

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Jul 6, 2022

NOTE: This is our monthly Hackathon project, co-authored by @patszt, @dispanser, and @dominiklohmann.

This PR implements an Apache Feather V2 store backend, and simplifies the store plugin to remove all the actor logic from it. It builds upon the in-progress Apache Parquet store backend from #2284.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on vast.io, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

Review as a whole. Test extensively on the testbed.

@dominiklohmann dominiklohmann changed the base branch from master to story/sc-17753/parquet-store-plugin July 6, 2022 10:56
@dominiklohmann dominiklohmann force-pushed the topic/hackathon-feather-store branch 3 times, most recently from 971f12c to 8797ad0 Compare July 6, 2022 18:46
@dominiklohmann dominiklohmann added the feature New functionality label Jul 7, 2022
@dispanser dispanser force-pushed the topic/hackathon-feather-store branch from 03cac3b to fbf75f5 Compare July 7, 2022 16:31
@dominiklohmann dominiklohmann marked this pull request as ready for review July 11, 2022 10:10
@dominiklohmann dominiklohmann requested a review from dispanser July 11, 2022 10:11
@dominiklohmann dominiklohmann changed the base branch from story/sc-17753/parquet-store-plugin to master July 11, 2022 13:00
@dispanser dispanser mentioned this pull request Jul 11, 2022
3 tasks
@dominiklohmann dominiklohmann changed the title Implement an Apache Feather V2 store Implement Apache Parquet & Apache Feather V2 stores Jul 11, 2022
Copy link
Contributor

@dispanser dispanser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the feather implementation, while @dominiklohmann reviewed and approved the parquet changes in #2284 .

@dispanser dispanser enabled auto-merge July 11, 2022 15:59
@dispanser dispanser force-pushed the topic/hackathon-feather-store branch from 853a9bf to 8a442ef Compare July 12, 2022 08:08
dominiklohmann and others added 22 commits July 12, 2022 17:19
The IPC format of table slices no longer stores the VAST schema, so the command
was not applicable.
This is an important optimization that causes us to persist slices to their IPC
backing at a later point in time only, i.e., when actually sending them across
the wire. This massively speeds up the performance of the Feather and Parquet
stores in most cases, although they're not quite comparable to the Segment store
yet because of some further unnecessary materializations.
These changes made a seemlingly unrelated integration tests `Transforms`
fail, so in the interest of our RC and other priority work we're
delaying looking into this until later.
@dispanser dispanser disabled auto-merge July 13, 2022 09:16
@dispanser dispanser merged commit 224db38 into master Jul 14, 2022
@dispanser dispanser deleted the topic/hackathon-feather-store branch July 14, 2022 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants