New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HoodieRealTimeInputFormat #42
Comments
|
Agree on approach in https://gist.github.com/prazanna/698459049447d8898a9de11e3863e99d , wrapping is the natural way to go. But planning to approach the reading a little differently..
|
Pasting a scrap of thoughts/impl of the record reader - https://gist.github.com/prazanna/698459049447d8898a9de11e3863e99d |
Just a note - I guess we are okay to assume that we are able to merge in all the updates in memory with HoodieAvroReader. #134 tracks the disk spill if needed. |
Upon more digging, it all boils down to the following to get the merging right
Reading using the a sub schema built off http://grepcode.com/file/repo1.maven.org/maven2/org.apache.avro/avro/1.7.7/org/apache/avro/generic/GenericDatumReader.java#GenericDatumReader.read%28java.lang.Object%2Corg.apache.avro.io.Decoder%29 is how the data is read in
We need to get the types here and turn it into ArrayWritable
Should be zero effect to the change above |
As for the design of the https://github.com/apache/hive/search?p=3&q=FileSplit&type=&utf8=✓ ORC Split : https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcNewSplit.java Of the two places I saw hardcoding of checks, all of them check for https://github.com/apache/hive/blob/a1cbccb8dad1824f978205a1e93ec01e87ed8ed5/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L82 |
This houses the merge-on-read record reader
The text was updated successfully, but these errors were encountered: