Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ORC reader performance #555

Merged
merged 16 commits into from Apr 11, 2019
Merged

Improve ORC reader performance #555

merged 16 commits into from Apr 11, 2019

Conversation

dain
Copy link
Member

@dain dain commented Mar 28, 2019

For TCP-DS, this saves about 9.5% of total CPU when running over gzip-compressed data.

Benchmarks are contained in BenchmarkStreamReaders. For this test, I used 3 forks each with
30 warm and 20 test iterations. The results are average nanoseconds per value read.

Benchmark method Old Error New Error Speedup
readBooleanNoNull 2.48 ± 0.04 0.67 ± 0.01 3.73
readBooleanWithNull 9.49 ± 0.27 2.01 ± 0.02 4.72
readByteNoNull 2.83 ± 0.05 0.45 ± 0.02 6.23
readByteWithNull 8.60 ± 0.30 1.95 ± 0.02 4.41
readDoubleNoNull 15.61 ± 0.17 1.31 ± 0.07 11.89
readDoubleWithNull 16.14 ± 0.25 2.83 ± 0.04 5.70
readFloatNoNull 14.64 ± 0.22 0.81 ± 0.04 18.11
readFloatWithNull 15.77 ± 0.22 2.08 ± 0.04 7.60
readIntNoNull 7.83 ± 0.08 2.56 ± 0.03 3.06
readIntWithNull 13.62 ± 0.18 2.99 ± 0.05 4.55
readLongDecimalNoNull 82.88 ± 0.73 16.85 ± 0.27 4.92
readLongDecimalWithNull 55.30 ± 0.81 11.77 ± 0.19 4.70
readLongNoNull 9.43 ± 0.19 3.33 ± 0.06 2.83
readLongWithNull 14.13 ± 0.17 3.26 ± 0.06 4.34
readShortDecimalNoNull 27.35 ± 0.51 9.62 ± 0.15 2.84
readShortDecimalWithNull 20.81 ± 0.30 6.96 ± 0.11 2.99
readShortNoNull 6.79 ± 0.11 2.55 ± 0.03 2.66
readShortWithNull 11.19 ± 0.12 2.95 ± 0.04 3.79
readSliceDictionaryNoNull 3.31 ± 0.04 2.22 ± 0.02 1.49
readSliceDictionaryWithNull 8.65 ± 0.10 3.67 ± 0.05 2.36
readSliceDirectNoNull 34.46 ± 0.47 3.94 ± 0.08 8.76
readSliceDirectWithNull 33.57 ± 0.56 6.99 ± 0.11 4.80
readTimestampNoNull 17.91 ± 0.30 13.55 ± 0.20 1.32
readTimestampWithNull 16.07 ± 0.24 11.77 ± 0.12 1.36

@cla-bot cla-bot bot added the cla-signed label Mar 28, 2019
@sopel39
Copy link
Member

sopel39 commented Mar 28, 2019

Here are the actual benchmark results (both wall time and CPU): https://s3.us-east-2.amazonaws.com/starburstdata/karol/Benchmarks+comparison-orc_improvements.pdf

@dain dain force-pushed the orc-types branch 2 times, most recently from 5da422f to 144013e Compare April 4, 2019 17:29
dain and others added 9 commits April 10, 2019 16:50
Pass SQL type to ORC stream reader constructor and use that instead
of passing to each readBlock call.
Change reporting to avg nanoseconds per row
Close output streams
Add long reader benchmarks for all types and pollute profile
Add direct slice benchmark
Improve dictionary slice benchmark
Add long decimal benchmark
Load data into memory before benchmarking
Use Presto writer
@dain dain closed this Apr 11, 2019
@dain dain mentioned this pull request Apr 11, 2019
5 tasks
@dain dain added this to the 308 milestone Apr 11, 2019
@dain dain reopened this Apr 11, 2019
@dain dain merged commit b6961f8 into trinodb:master Apr 11, 2019
@sopel39
Copy link
Member

sopel39 commented Apr 11, 2019

Well done!

yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Pass SQL type to ORC stream reader constructor and use that instead
of passing to each readBlock call.

Cherry-pick of trinodb/trino#555

The difference from the original commit include:
1) Rmeoved systemMemoryContext because the BatchStreamReaders don't
have local arrays;
2) Fixed raptor tests by converting all spi types to storage types;
3) Other Nits changes

Co-authored-by: Dain Sundstrom <dain@iq80.com>

Convert column type to storage type for Raptor

Raptor stores TIME and TIMESTAMP data as longs. When creating the batch
RecordReader, these types need to be converted to the storage types.
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Martin Traverso <mtraverso@gmail.com>
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
yingsu00 pushed a commit to yingsu00/presto that referenced this pull request Nov 12, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Pass SQL type to ORC stream reader constructor and use that instead
of passing to each readBlock call.

Cherry-pick of trinodb/trino#555

The difference from the original commit include:
1) Rmeoved systemMemoryContext because the BatchStreamReaders don't
have local arrays;
2) Fixed raptor tests by converting all spi types to storage types;
3) Other Nits changes

Co-authored-by: Dain Sundstrom <dain@iq80.com>

Convert column type to storage type for Raptor

Raptor stores TIME and TIMESTAMP data as longs. When creating the batch
RecordReader, these types need to be converted to the storage types.
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Martin Traverso <mtraverso@gmail.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
rongrong pushed a commit to prestodb/presto that referenced this pull request Nov 13, 2019
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Pass SQL type to ORC stream reader constructor and use that instead
of passing to each readBlock call.

Cherry-pick of trinodb/trino#555

The difference from the original commit include:
1) Rmeoved systemMemoryContext because the BatchStreamReaders don't
have local arrays;
2) Fixed raptor tests by converting all spi types to storage types;
3) Other Nits changes

Co-authored-by: Dain Sundstrom <dain@iq80.com>

Convert column type to storage type for Raptor

Raptor stores TIME and TIMESTAMP data as longs. When creating the batch
RecordReader, these types need to be converted to the storage types.
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Martin Traverso <mtraverso@gmail.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-pick of trinodb/trino#555

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants