Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet support #1

Open
dselivanov opened this issue Nov 30, 2015 · 6 comments
Open

Parquet support #1

dselivanov opened this issue Nov 30, 2015 · 6 comments

Comments

@dselivanov
Copy link

dselivanov commented Nov 30, 2015

Any plans to add parquet support?

@jorgemarsal
Copy link
Contributor

jorgemarsal commented Nov 30, 2015

Hi @dselivanov,

No plans to add Parquet support in the short term.
Shouldn't be too hard though. We just need to add a ParquetRecordParser that delegates to the official parquet-cpp implementation https://github.com/apache/parquet-cpp.

I can help you if you want to give it a shot.

@dselivanov
Copy link
Author

dselivanov commented Dec 2, 2015

@jorgemarsal, thanks for clarification. At the moment I also don't have time for porting parquet-cpp, so will use SparkR in short term. Will try to return back and have a closer look on parquet-cpp integration.

@dselivanov
Copy link
Author

dselivanov commented Jan 18, 2016

@jorgemarsal, FYI - SparkR parquet reading totally unusable for any real problem due to very inefficient serialization/deserialization. Collection even tiny data.frame of 100mb takes more than 2 minutes...

@jorgemarsal
Copy link
Contributor

jorgemarsal commented Jan 20, 2016

Parquet support is on the top of my list. Will get to that when I have some free time.

@dselivanov
Copy link
Author

dselivanov commented Feb 19, 2016

FYI introducing-apache-arrow.
Also seems, cloudera developers started actively refactor parquet-cpp.

@DheerajAgarwal
Copy link

DheerajAgarwal commented Oct 10, 2017

Any updates on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants