Improve MarkDistinctHash #8967

pettyjamesm · 2021-08-25T16:34:11Z

Improves performance of MarkDistinctHash (and therefore, MarkDistinctOperator) by:

Detecting when no positions or all positions of the input are distinct and emitting a RunLengthEncoded block for the output mask (so long as the input position count is > 1).
Building BooleanType mask blocks directly from byte[] instead of via a BlockBuilder, reducing allocation overhead per built mask by 1/2 since we know in advance no nulls will be present in the created block

findepi · 2021-08-26T07:09:37Z

Building BooleanType mask blocks directly from byte[] instead of via a BlockBuilder, reducing allocation overhead per built mask by 1/2 since we know in advance no nulls will be present in the created block

could the block builder take care of this?

pettyjamesm · 2021-08-26T13:56:11Z

could the block builder take care of this?

@findepi - I thought about it, and I'm not sure that complicating the general purpose usage of the block builder with a "no nulls" version handling is a win in general use cases. I looked around for other instances of "known to have no nulls" boolean blocks and the only ones I found were in the ORC reader and those were already just building ByteArrayBlock instances directly. This seemed safer, although we could consider dropping the static helper methods and using ByteArrayBlock directly here too? Adding the explicit contract seemed better for longer term maintainability, but either is fine.

findepi · 2021-08-26T14:29:48Z

complicating the general purpose usage of the block builder with a "no nulls" version handling is a win in general use cases.

i thought some block builders already do this, but maybe i misremember.

pettyjamesm · 2021-08-26T14:49:41Z

i thought some block builders already do this, but maybe i misremember.

Most blockbuilders (although not quite all, last time I checked) special case the null handling in that they will set valueIsNull = null on the block output when no nulls were present in the builder- but none I've seen avoid initializing a boolean valueIsNull[] array within the builder which is what this change avoids by bypassing the builder entirely.

sopel39 · 2021-09-01T09:40:16Z

@pettyjamesm do you have some perf numbers?

skrzypo987 · 2021-09-01T10:57:06Z

core/trino-main/src/test/java/io/trino/type/TestBooleanType.java

+        assertTrue(builderBlock instanceof ByteArrayBlock);
+        assertBlockEquals(BOOLEAN, wrappedBlock, builderBlock);
+        // the wrapping instance does not copy the byte array defensively
+        assertTrue(BOOLEAN.getBoolean(wrappedBlock, 0));


AssertJ assertions would look much better here as you are actually comparing 0/1 byte values, not logical ones.

skrzypo987 · 2021-09-01T11:00:50Z

core/trino-main/src/main/java/io/trino/operator/MarkDistinctHash.java

    }

    @VisibleForTesting
    public int getCapacity()
    {
        return groupByHash.getCapacity();
    }
+
+    private Block processNextGroupIds(GroupByIdBlock ids)


The extraction itself might be a distinct commit. This way it is easier to compare the actual changes

core/trino-main/src/main/java/io/trino/operator/MarkDistinctHash.java

core/trino-main/src/test/java/io/trino/operator/TestMarkDistinctOperator.java

lukasz-stec · 2021-09-01T11:39:47Z

core/trino-main/src/main/java/io/trino/operator/MarkDistinctHash.java

+        }
+        byte[] distinctMask = new byte[positions];
+        for (int position = 0; position < distinctMask.length; position++) {
+            if (ids.getGroupId(position) == nextDistinctId) {


I wonder if branchless version could perform better here (Consider benchmarking it).
Something like:

int distinctMaskValue = ids.getGroupId(position) == nextDistinctId ? 1 : 0; distinctMask[position] = distinctMaskValue; nextDistinctId += distinctMaskValue;

I thought about doing that here, but I chose not to for a couple reasons:

There are no existing benchmarks to test MarkDistinctHash performance (which is itself, hard to isolate from the work of the GroupByHash). This also answers @sopel39's question- I don't have good benchmark numbers for this change

I don't think that the branchless (ie: cmov version of this code) is actually likely to be better than the branching version in practice. This is a hard to prove fact, but I suspect that the control flow here is largely predictable in most query workloads where either most inputs are or are not distinct (ie: most of the time, > 75% of branches will go in the same direction). We could contrive a benchmark that shows whatever we want in that regard, but I'm not sure it would hold for real workloads and so I chose to leave the general code shape as-is in this PR.

I don't think that the branchless (ie: cmov version of this code) is actually likely to be better than the branching version in practice.

We've seen branch reduction to have big impact actually. You might want to investigate it int follow-up work (with different distinct ratios).

Adds a contract method to BooleanType that indicates that callers can legally choose to construct BooleanType blocks from byte[] values directly. Adding this method for bypassing BlockBuilders can be significant for BooleanType in particular, because the per-row overhead of ByteArrayBlockBuilder#valueIsNull is the same as for the values themselves, meaning each builder row is 2x larger when no values can be null. Also adds a similar utility to create single boolean value blocks directly from BooleanType.

Improves MarkDistinctHash by: - Detecting when no positions or all positions of the input are distinct and emitting a RunLengthEncoded block for the output mask - Building BooleanType mask blocks directly from byte[] instead of via the BlockBuilder, reducing allocation overhead per built mask by 1/2 since we know in advance no nulls will be present in the created block

pettyjamesm · 2021-09-02T20:10:12Z

Anything else I need to address on this one before it can be merged?

sopel39 · 2021-09-06T10:24:05Z

core/trino-main/src/main/java/io/trino/operator/MarkDistinctHash.java

+        }
+        byte[] distinctMask = new byte[positions];
+        for (int position = 0; position < distinctMask.length; position++) {
+            if (ids.getGroupId(position) == nextDistinctId) {


I don't think that the branchless (ie: cmov version of this code) is actually likely to be better than the branching version in practice.

We've seen branch reduction to have big impact actually. You might want to investigate it int follow-up work (with different distinct ratios).

sopel39 · 2021-09-06T10:25:00Z

core/trino-spi/src/main/java/io/trino/spi/type/BooleanType.java

+     * encoding changes such that {@link ByteArrayBlock} is not always a valid or efficient representation, then this method must be
+     * removed and any usages changed
+     */
+    public static Block wrapByteArrayAsBooleanBlockWithoutNulls(byte[] booleansAsBytes)


nit: we probably should have similar methods for other types and have readers use them. cc @skrzypo987

cla-bot bot added the cla-signed label Aug 25, 2021

pettyjamesm mentioned this pull request Aug 25, 2021

Improve MarkDistinctHash prestodb/presto#16648

Merged

pettyjamesm requested a review from dain August 25, 2021 17:11

pettyjamesm requested review from electrum and findepi August 26, 2021 20:48

findepi requested review from sopel39, skrzypo987 and lukasz-stec and removed request for findepi September 1, 2021 09:22

skrzypo987 reviewed Sep 1, 2021

View reviewed changes

skrzypo987 approved these changes Sep 1, 2021

View reviewed changes

lukasz-stec approved these changes Sep 1, 2021

View reviewed changes

pettyjamesm force-pushed the improve-mark-distinct-hash branch 2 times, most recently from ccee742 to 2a9911a Compare September 1, 2021 14:14

pettyjamesm added 2 commits September 1, 2021 16:02

pettyjamesm force-pushed the improve-mark-distinct-hash branch from 2a9911a to 3875070 Compare September 1, 2021 20:02

pettyjamesm mentioned this pull request Sep 1, 2021

Improve AccumulatorCompiler code generation #9084

Closed

sopel39 approved these changes Sep 6, 2021

View reviewed changes

sopel39 reviewed Sep 6, 2021

View reviewed changes

sopel39 merged commit 15c9c53 into trinodb:master Sep 6, 2021

sopel39 mentioned this pull request Sep 6, 2021

Release notes for 362 #9015

Closed

13 tasks

pettyjamesm deleted the improve-mark-distinct-hash branch September 6, 2021 12:56

takezoe mentioned this pull request Jul 22, 2022

Improve MarkDistinctHash treasure-data/presto#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve MarkDistinctHash #8967

Improve MarkDistinctHash #8967

pettyjamesm commented Aug 25, 2021

findepi commented Aug 26, 2021

pettyjamesm commented Aug 26, 2021 •

edited by dain

Loading

findepi commented Aug 26, 2021

pettyjamesm commented Aug 26, 2021 •

edited

Loading

sopel39 commented Sep 1, 2021

skrzypo987 Sep 1, 2021

skrzypo987 Sep 1, 2021

lukasz-stec Sep 1, 2021

pettyjamesm Sep 1, 2021

sopel39 Sep 6, 2021

pettyjamesm commented Sep 2, 2021

sopel39 Sep 6, 2021

sopel39 Sep 6, 2021

Improve MarkDistinctHash #8967

Improve MarkDistinctHash #8967

Conversation

pettyjamesm commented Aug 25, 2021

findepi commented Aug 26, 2021

pettyjamesm commented Aug 26, 2021 • edited by dain Loading

findepi commented Aug 26, 2021

pettyjamesm commented Aug 26, 2021 • edited Loading

sopel39 commented Sep 1, 2021

skrzypo987 Sep 1, 2021

Choose a reason for hiding this comment

skrzypo987 Sep 1, 2021

Choose a reason for hiding this comment

lukasz-stec Sep 1, 2021

Choose a reason for hiding this comment

pettyjamesm Sep 1, 2021

Choose a reason for hiding this comment

sopel39 Sep 6, 2021

Choose a reason for hiding this comment

pettyjamesm commented Sep 2, 2021

sopel39 Sep 6, 2021

Choose a reason for hiding this comment

sopel39 Sep 6, 2021

Choose a reason for hiding this comment

pettyjamesm commented Aug 26, 2021 •

edited by dain

Loading

pettyjamesm commented Aug 26, 2021 •

edited

Loading