Parquet support #1

Open
dselivanov opened this Issue Nov 30, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@dselivanov

Any plans to add parquet support?

@jorgemarsal

This comment has been minimized.

Show comment
Hide comment
@jorgemarsal

jorgemarsal Nov 30, 2015

Contributor

Hi @dselivanov,

No plans to add Parquet support in the short term.
Shouldn't be too hard though. We just need to add a ParquetRecordParser that delegates to the official parquet-cpp implementation https://github.com/apache/parquet-cpp.

I can help you if you want to give it a shot.

Contributor

jorgemarsal commented Nov 30, 2015

Hi @dselivanov,

No plans to add Parquet support in the short term.
Shouldn't be too hard though. We just need to add a ParquetRecordParser that delegates to the official parquet-cpp implementation https://github.com/apache/parquet-cpp.

I can help you if you want to give it a shot.

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Dec 2, 2015

@jorgemarsal, thanks for clarification. At the moment I also don't have time for porting parquet-cpp, so will use SparkR in short term. Will try to return back and have a closer look on parquet-cpp integration.

@jorgemarsal, thanks for clarification. At the moment I also don't have time for porting parquet-cpp, so will use SparkR in short term. Will try to return back and have a closer look on parquet-cpp integration.

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Jan 18, 2016

@jorgemarsal, FYI - SparkR parquet reading totally unusable for any real problem due to very inefficient serialization/deserialization. Collection even tiny data.frame of 100mb takes more than 2 minutes...

@jorgemarsal, FYI - SparkR parquet reading totally unusable for any real problem due to very inefficient serialization/deserialization. Collection even tiny data.frame of 100mb takes more than 2 minutes...

@jorgemarsal

This comment has been minimized.

Show comment
Hide comment
@jorgemarsal

jorgemarsal Jan 20, 2016

Contributor

Parquet support is on the top of my list. Will get to that when I have some free time.

Contributor

jorgemarsal commented Jan 20, 2016

Parquet support is on the top of my list. Will get to that when I have some free time.

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Feb 19, 2016

FYI introducing-apache-arrow.
Also seems, cloudera developers started actively refactor parquet-cpp.

FYI introducing-apache-arrow.
Also seems, cloudera developers started actively refactor parquet-cpp.

@DheerajAgarwal

This comment has been minimized.

Show comment
Hide comment
@DheerajAgarwal

DheerajAgarwal Oct 10, 2017

Any updates on this one?

Any updates on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment