Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arrow output serialization. #31

Closed
niviksha opened this issue Nov 30, 2020 · 3 comments
Closed

Add Arrow output serialization. #31

niviksha opened this issue Nov 30, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@niviksha
Copy link

niviksha commented Nov 30, 2020

@jheer and others, this is already looking like a great library for us at OmniSci, where we just added the ability to return SQL query results over the wire as Arrow buffers 馃憤 馃憤

Wondering if it is possible to serialize the results of an Arquero query to Arrow - it would make for a composable pipeline e.g. for use cases where you retrieve a dataset from OmniSci, wrangle it client-side with Arquero, and then either visualize it with Vega-based charts or things like deck.gl for 3d charts, for example.

We're happy to help in any way we can!

@jheer jheer added the enhancement New feature or request label Nov 30, 2020
@jheer jheer changed the title Feature request - serialize query results to Arrow? Add Arrow output serialization. Nov 30, 2020
@bmschmidt
Copy link

馃憤 would use for a slightly different case: it would let you more easily do arquero computations in a web worker, which is where I do most of my Arrow computation when not on Observable.

@jheer
Copy link
Member

jheer commented Dec 2, 2020

I've started a notebook for prototyping and testing Arrow serialization here:
https://observablehq.com/@jheer/arquero-to-arrow-serialization

Feedback and suggestions are welcome!

Also, @bmschmidt, in case you haven't seen it yet, you might be interested in https://github.com/uwdata/arquero-worker, which implements Arquero worker thread support. Right now the inter-thread communication is done via JSON string serialization, but the plan has always been to eventually augment that with a more efficient transfer options.

@jheer
Copy link
Member

jheer commented Jan 8, 2021

I've released a new package - arquero-arrow - that supports Arrow serialization of Arquero tables as well as arrays of JavaScript objects. This new package provides a simple interface to the apache-arrow JavaScript library (on which it depends), while also using higher-performance encoders for standard integer, float, date, boolean, and string dictionary types.

I also updated the Arquero and Apache Arrow notebook to demonstrate Arrow serialization!

cc @niviksha, @bmschmidt, @ericemc3

@jheer jheer closed this as completed Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants