Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify BlockBuilder #17342

Merged
merged 20 commits into from
Jun 25, 2023
Merged

Simplify BlockBuilder #17342

merged 20 commits into from
Jun 25, 2023

Conversation

dain
Copy link
Member

@dain dain commented May 4, 2023

Description

This PR simplifies BlockBuilder by removing all write methods from the
interface. Each BlockBuilder implementation still has a write method, and
Type implementations simply downcast the expected implementation type.

This change is only possible with the removal of the wrapper BlockBuilder
types SingleArrayBlockWriter, SingleMapBlockWriter, and SingleRowBlockWriter.
These classes have been replaced with a single lambda writeEntry method
on ArrayBlockBuilder, MapBlockBuilder, and RowBlockBuilder, that calls
back with the nested field builders. Since there is now a single writeEntry
method, there is no need for the beginBlockEntry and closeEntry methods.

To simplify the transition to the new writeEntry method, this PR includes
a new helper class for building array, map, and row values in functions. The
ArrayValueBuilder, MapValueBuilder, and RowValueBuilder interfaces have
a simple build value method to build a single value of related kind, and each
of these has a related buffered class that can be used to share allocation
buffers across multiple value build calls. This replaces the pattern of
using a PageBuilder as a buffer in function implementations.

Additionally, MapBlockBuilder has been extended with a strictNotDistinctFrom
that builds the map hash table using NOT DISTINCT FROM semantics. This is
removes the need to use TypedSet to determine duplicates, and is much more
efficient. This behavior is also supported in BufferedMapValueBuilder with
the createBufferedDistinctStrict static factory method.

Added

  • Int128BlockBuilder.writeInt128
  • Fixed12BlockBuilder.writeFixed12
  • Block.getSlice with SliceOutput argument (to replace Block.writeBytesTo)
  • ArrayValueBuilder.buildArrayValue and BufferedArrayValueBuilder
  • MapValueBuilder.buildMapValue and BufferedMapValueBuilder
  • RowValueBuilder.buildRowValue and BufferedRowValueBuilder
  • NOT DISTINCT FROM support in MapBlockBuilder and BufferedMapValueBuilder
  • ArrayBlockBuilder.writeEntry
  • MapBlockBuilder.writeEntry
  • RowBlockBuilder.writeEntry
  • LambdaMetafactoryGenerator to simplify bytecode generation of lambda method calls

Removed

  • BlockBuilder.writeBytes
  • BlockBuilder.writeShort
  • BlockBuilder.writeInt
  • BlockBuilder.writeLong
  • BlockBuilder.beginBlockEntry
  • BlockBuilder.closeEntry
  • SingleArrayBlockWriter
  • SingleMapBlockWriter
  • SingleRowBlockWriter

Release notes

(X) Release notes are required, with the following suggested text:

# SPI
* BlockBuilder has been redesigned to remove write methods from the interface. ({issue}`issuenumber`)
* Array, map, and row build has been changed to use a single `writeEntry`. ({issue}`issuenumber`)

@dain dain requested review from electrum and martint May 4, 2023 04:24
@cla-bot cla-bot bot added the cla-signed label May 4, 2023
@github-actions github-actions bot added bigquery BigQuery connector delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector mongodb MongoDB connector tests:hive labels May 4, 2023
@dain dain force-pushed the simplify-block-builder branch 2 times, most recently from c78127f to 9576f50 Compare May 5, 2023 02:15
@martint martint mentioned this pull request May 15, 2023
19 tasks
@dain dain force-pushed the simplify-block-builder branch 2 times, most recently from 52abc41 to 3b88983 Compare June 1, 2023 23:05
int startSize = keyBlockBuilder.getPositionCount();

// build the map
builder.build(keyBlockBuilder, valueBlockBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to catch E | RuntimeException here and even out the blocks using the code below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out you can't catch a type variable, but for this case we are catching and rethrowing so we can just use Exception

@@ -430,4 +456,9 @@ private static boolean indeterminate(MethodHandle valueIndeterminateFunction, Bl
}
return false;
}

private static boolean not(boolean value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method already exists in CompilerOperations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That class is not available in the SPI

core/trino-spi/src/main/java/io/trino/spi/block/Block.java Outdated Show resolved Hide resolved
core/trino-spi/pom.xml Outdated Show resolved Hide resolved
core/trino-spi/pom.xml Outdated Show resolved Hide resolved
dain added 11 commits June 24, 2023 14:33
The Slice methods on ArrayType date back to the first implementation of
array and have been replaced by methods operating on Block
Support using NOT DISTINCT FROM to build the map hash table. This
rejects less key than the current strict mode, which uses EQUALS.
The beginEntry paradigm is confusing and inefficient. It requires array,
map, and row block builders to use a synthetic wrapper block during the
build, and this wrapper is inefficient due to call site pollution, but
also make it difficult to extend blocks with new behavior, as the
wrappers would have to pass through the extended behaviors.
@dain dain merged commit 2a9db69 into trinodb:master Jun 25, 2023
111 checks passed
@dain dain deleted the simplify-block-builder branch June 25, 2023 02:05
@github-actions github-actions bot added this to the 421 milestone Jun 25, 2023
@mosabua
Copy link
Member

mosabua commented Jul 18, 2023

fyi @osscm .. this one made it into 421

@nineinchnick
Copy link
Member

The developer guide still references beginBlockEntry and needs to be updated https://github.com/trinodb/trino/blob/master/docs/src/main/sphinx/develop/connectors.md#connectorrecordsetprovider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

None yet

5 participants