Presto plugin for Hoodie #81

vinothchandar · 2017-02-18T04:12:12Z

No description provided.

vinothchandar · 2017-03-18T19:05:58Z

Here we hope to have support for faster, point lookup (batch)

leletan · 2018-07-04T22:44:56Z

Is this completed? It seems the presto patch is merged.

I was trying to follow the hudi doc to build a presto + minimum hive metastore POC (https://stackoverflow.com/questions/43727964/does-presto-require-a-hive-metastore-to-read-parquet-files-from-s3) on top of s3 files written by hudi. It seems presto

cannot match the field names vs the field values (it shows the hudi commit version as one of the real data column value)
does not leverage the commits (2 records in 2 commits with the same primary key, it shows both instead of the later one)

Any guidance on what went wrong? Thanks in advance!

vinothchandar · 2018-07-05T02:11:28Z

yes. presto patch is merged. Yes presto support is via the Hive catalog.

cannot match the field names vs the field values (it shows the hudi commit version as one of the real data column value)
Not following fully.. But hudi does add few metadata fields to the table including the commit version of the records.

does not leverage the commits (2 records in 2 commits with the same primary key, it shows both instead of the later one)
This is weird.. What presto version are you running? Also be sure to provide the hoodie-hadoop-mr jar to presto workers/coordinators.

For Copy-on-write, hoodie support works without any need for a plugin.

This ticket is more for longer term..

leletan · 2018-07-06T16:10:40Z

Thanks for the clarification, @vinothchandar. I guess it was due to my own local env set up.

So I fell back to local env set up in quickstart with hadoop-2.6.0-cdh5.4.7, hive-1.1.0-cdh5.4.7, presto 0.205 and dataset generated from HoodieJavaApp default options. Presto is working fine for select count(*) but when I tried to select *, I got the following error, no matter if I add hoodie-hadoop-mr-*.jar to the plugin or not:

java.lang.UnsupportedOperationException: com.facebook.presto.spi.type.DoubleType
at com.facebook.presto.spi.type.AbstractType.writeSlice(AbstractType.java:135)
at com.facebook.presto.hive.parquet.reader.ParquetBinaryColumnReader.readValue(ParquetBinaryColumnReader.java:55)
at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.lambda$readValues$1(ParquetPrimitiveColumnReader.java:184)
at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.processValues(ParquetPrimitiveColumnReader.java:204)
at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.readValues(ParquetPrimitiveColumnReader.java:183)
at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.readPrimitive(ParquetPrimitiveColumnReader.java:171)
at com.facebook.presto.hive.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:208)
at com.facebook.presto.hive.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:258)
at com.facebook.presto.hive.parquet.reader.ParquetReader.readBlock(ParquetReader.java:241)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:243)
at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:221)
at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:262)
at com.facebook.presto.spi.Page.assureLoaded(Page.java:244)
at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:245)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:373)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:477)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

leletan · 2018-07-06T20:21:09Z

Dug a little bit more, it seems presto is trying to decode the binary serialized parquet written by hudi into double, which will require a DoubleType.writeSlice call, which DoubleType does not implement and its parent class AbstractType will just throw UnsupportedOperationException upon writeSlice call.

Not sure if the above holds true. If yes, all open source presto user with uber hudi will suffer the same issue? Wondering how this is resolved in Uber.

vinothchandar · 2018-07-12T14:41:06Z

So, this does not seem like a Hudi issue to me. the parquet files generated by hudi are the same standard parquet files. Can you try just copying the files themselves into another table (non-hudi) and see if it works.. Also validate that the parquet files are good via spark sql?

Also please feel free to open a new issue around this, since this one is about the presto plugin.

leletan · 2018-07-12T20:09:08Z

Thanks for the info @vinothchandar
I figured it out - missed hive.parquet.use-column-names=true in the presto hive config.
Now everything is working!

vinothchandar · 2018-07-12T20:10:28Z

Nice to hear.. lets put together a hoodie docker container for the future.. Given so many dependencies, it can be overwhelming at times :)

leletan · 2018-07-12T21:39:46Z

Totally agree. Any thoughts on which folder this should go?

vinothchandar · 2018-07-16T15:51:11Z

we can create a hoodie-docker and host all scripts and DockerFiles there

vinothchandar added the enhancement label Jun 4, 2018

vinothchandar closed this as completed Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto plugin for Hoodie #81

Presto plugin for Hoodie #81

vinothchandar commented Feb 18, 2017

vinothchandar commented Mar 18, 2017

leletan commented Jul 4, 2018 •

edited

Loading

vinothchandar commented Jul 5, 2018

leletan commented Jul 6, 2018

leletan commented Jul 6, 2018 •

edited

Loading

vinothchandar commented Jul 12, 2018 •

edited

Loading

leletan commented Jul 12, 2018

vinothchandar commented Jul 12, 2018

leletan commented Jul 12, 2018

vinothchandar commented Jul 16, 2018

Presto plugin for Hoodie #81

Presto plugin for Hoodie #81

Comments

vinothchandar commented Feb 18, 2017

vinothchandar commented Mar 18, 2017

leletan commented Jul 4, 2018 • edited Loading

vinothchandar commented Jul 5, 2018

leletan commented Jul 6, 2018

leletan commented Jul 6, 2018 • edited Loading

vinothchandar commented Jul 12, 2018 • edited Loading

leletan commented Jul 12, 2018

vinothchandar commented Jul 12, 2018

leletan commented Jul 12, 2018

vinothchandar commented Jul 16, 2018

leletan commented Jul 4, 2018 •

edited

Loading

leletan commented Jul 6, 2018 •

edited

Loading

vinothchandar commented Jul 12, 2018 •

edited

Loading