New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow blocking page source until dynamic filters are ready #3414
Conversation
Using this feature, we can add (more) deterministic integration tests for dynamic filtering, e.g.: |
cb6ca89
to
b2b84a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilterSupplier.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilterSupplier.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/ScanFilterAndProjectOperator.java
Outdated
Show resolved
Hide resolved
b2b84a7
to
9496899
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. However, I'm thinking that it should be the connector that decides whether to wait for dynamic filter and for how long. For example some connectors might be good producing data immediately since they have many splits to process (if few splits gets processed without dynamic filters it shouldn't be a big issue). Other connectors might have fixed and small amount of splits (e.g JDBC connectors) so it's preferred to wait for dynamic filter even much longer than 1 second global default. It could be huge difference if JDBC connector issues query with or without dynamic filter.
WDYT: @martint
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilter.java
Outdated
Show resolved
Hide resolved
To be landed after: #1072 |
SGTM - I wasn't sure about adding a new connector API, so I've tried to add the minimal changes to the engine required by this feature. |
This requires making |
9496899
to
17b1dd4
Compare
Rebased over latest |
17b1dd4
to
16cd01a
Compare
Rebased over latest |
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/DynamicFilter.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/io/prestosql/plugin/hive/TestOrcPageSourceMemoryTracking.java
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
I was not following, so this could have been answered already, please forgive me question in that case. JDBC connectors to not consume the dynamic filter in any way. They just provide RecordCursor and let the engine handle the heavy lifting. Second, are we positive this will not impact low latency queries? Ultimately, sleeping 1s seems to be heuristic and there seems to be a potential of a regression? |
0e5e831
to
0e587ab
Compare
Updated the PR to use |
presto-spi/src/main/java/io/prestosql/spi/connector/DynamicFilter.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
Session session = Session.builder(getSession()) | ||
.setSystemProperty(ENABLE_DYNAMIC_FILTERING, "true") | ||
.setSystemProperty(JOIN_DISTRIBUTION_TYPE, FeaturesConfig.JoinDistributionType.BROADCAST.name()) | ||
.setSystemProperty(PUSH_PARTIAL_AGGREGATION_THROUGH_JOIN, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten the test to use multi-join in order to verify the blocking behaviour:
https://github.com/prestosql/presto/pull/3414/files#diff-6102dfcf993241d1413b4fc63e4ca5a9R128
WDYT?
.setSystemProperty(PUSH_PARTIAL_AGGREGATION_THROUGH_JOIN, "true") | ||
.build(); | ||
DistributedQueryRunner runner = (DistributedQueryRunner) getQueryRunner(); | ||
ResultWithQueryId<MaterializedResult> result = runner.executeWithQueryId(session, "SELECT MAX(quantity) FROM lineitem JOIN orders " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add aggregation explicitly below join within query, e.g:
SELECT * FROM (SELECT MAX(quantity), orderkey FROM lineitem GROUP BY orderkey) JOIN orders on ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the suggestion - if I use the following query:
SELECT *
FROM (SELECT MAX(quantity), orderkey FROM lineitem GROUP BY orderkey) l
JOIN orders
ON l.orderkey = orders.orderkey;
It results in a distributed join (due to a repartition operator between the TableScan of lineitem
and MAX
aggregation:
Fragment 1 [HASH]
Output layout: [max, orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment_10]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
InnerJoin[("orderkey" = "orderkey_9")][$hashvalue_59, $hashvalue_60]
│ Layout: [orderkey:bigint, max:double, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment_10:var
│ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│ Distribution: PARTITIONED
│ dynamicFilterAssignments = {orderkey_9 -> df_283}
├─ Project[]
│ │ Layout: [orderkey:bigint, max:double, $hashvalue_59:bigint]
│ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│ │ $hashvalue_59 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("orderkey"), 0))
│ └─ Aggregate[orderkey]
│ │ Layout: [orderkey:bigint, max:double]
│ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│ │ max := max("quantity")
│ └─ LocalExchange[HASH][$hashvalue] ("orderkey")
│ │ Layout: [orderkey:bigint, quantity:double, $hashvalue:bigint]
│ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
│ └─ RemoteSource[2]
│ Layout: [orderkey:bigint, quantity:double, $hashvalue_57:bigint]
└─ LocalExchange[HASH][$hashvalue_60] ("orderkey_9")
│ Layout: [orderkey_9:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment_10:varchar(79
│ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
└─ RemoteSource[3]
Layout: [orderkey_9:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment_10:varchar
But then, local dynamic filtering doesn't take place - because it needs broadcast join...
.map(page -> applyFilter(page, domains)) | ||
.collect(toList())); | ||
|
||
return new FixedPageSource(pages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, add a wrapping page source, e.g:
DynamicFilteringPageSource
that would have FixedPageSource
as a delegate field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope I did it correctly at https://github.com/prestosql/presto/pull/3414/files#diff-8b4a37d650086d97552712b506dfb6a8...
presto-main/src/main/java/io/prestosql/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
/** | ||
* May contains domains for dynamic filters for different table scans | ||
* (e.g. in case of co-located joins). | ||
*/ | ||
@GuardedBy("this") | ||
private Map<Symbol, Domain> dynamicFilterDomainsResult = new HashMap<>(); | ||
|
||
// Each future blocks until its dynamic filter is collected. | ||
@GuardedBy("this") | ||
private Map<DynamicFilterId, SettableFuture<Void>> futures = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes very similar to DynamicFilterService
from #4224.
Can we untangle these classes a bit?
-
LocalDynamicFiltersCollector
should holdMap<DynamicFilterId, SettableFuture<Domain>> dynamicFilters
-
There should be method
DynamicFilter LocalDynamicFiltersCollector#getDynamicFilter(List<DynamicFilters.Descriptor> dynamicFilters, Map<Symbol, ColumnHandle> columnHandles)
, similar as inDynamicFilterService
-
LocalDynamicFiltersCollector#addDynamicFilter(Map<Symbol, Domain> dynamicFilterDomains)
would becomeaddDynamicFilter(Map<DynamicFilterId, Domain> dynamicFilterDomains)
This would simplify LocalDynamicFilterConsumer
as it wouldn't need to perform dynamic filter id -> symbol
translation in io.prestosql.sql.planner.LocalDynamicFilterConsumer#convertTupleDomainForLocalFilters
and some code could be reused between DynamicFilterService
and LocalDynamicFiltersCollector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit different refactoring - allowing a bit more efficient implementation of DynamicFilter
, by caching the resulting predicate and the blocked future:
https://github.com/prestosql/presto/pull/3414/files#diff-fc014adc6614290fe6f8a48d7b0f6739
WDYT?
Many thanks for the review, will fix the issues and update the PR. |
c9e984b
to
ae4a9e6
Compare
Updated the PR to ae4a9e6 - please re-review :) |
Ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments until LocalDynamicFiltersCollector.java
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFilterConsumer.java
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
Many thanks for the review, will fix soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments. Looks good overall.
presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/TestLocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/TestLocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/TestLocalDynamicFiltersCollector.java
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/TestLocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
ae4a9e6
to
ba93c07
Compare
I have addressed the comments above, and rebased the PR over the latest |
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor comments + tests for large build side
ba93c07
to
6b8b4fc
Compare
presto-main/src/main/java/io/prestosql/sql/planner/LocalDynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Show resolved
Hide resolved
presto-memory/src/test/java/io/prestosql/plugin/memory/TestMemorySmoke.java
Outdated
Show resolved
Hide resolved
6b8b4fc
to
9547911
Compare
Cherry-picked the commits from #4946, fixed the comments above and rebased. |
9547911
to
d9c4edb
Compare
there is a conflict. Please rebase. I will merge tomorrow then |
It is enabled by default.
d9c4edb
to
e23dae7
Compare
Rebased over latest |
// Iterate over dynamic filters that are collected (correspond to one of the futures), and required for filtering (correspond to one of the descriptors). | ||
// It is possible that some dynamic filters are collected in a different stage - and will not available here. | ||
// It is also possible that not all local dynamic filters are needed for this specific table scan. | ||
Set<DynamicFilterId> filterIds = Sets.intersection(symbolsMap.keySet(), futures.keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's convert that into stream:
symbolsMap.keySet().stream()
.filter(filterId -> futures.keySet()::contains)
.map(...)
@@ -1998,6 +1999,13 @@ private PhysicalOperation createLookupJoin( | |||
Optional<Symbol> buildHashSymbol, | |||
LocalExecutionPlanContext context) | |||
{ | |||
// Register dynamic filters, allowing the scan operators to wait for the collection completion. | |||
// Skip dynamic filters that are not used locally (e.g. in case of distributed joins). | |||
Set<DynamicFilterId> localDynamicFilters = Sets.intersection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's convert that into stream:
Set<DynamicFilterId> node.getDynamicFilters().keySet().stream()
.filter(getConsumedDynamicFilterIds(probeNode)::contains)
.collect(toImmutableSet())
@GuardedBy("this") | ||
private int futuresLeft; | ||
|
||
public TableSpecificDynamicFilter(List<ListenableFuture<TupleDomain<ColumnHandle>>> predicateFutures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the constructor private
return null; | ||
} | ||
Page page = delegate.getNextPage(); | ||
if (page != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip filtering when for predicate.isAll()
We allow the connector to block according to dynamic filter collection state. Some connectors would prefer to wait a bit, to benefit from more selective scanning (using the collected dynamic filters).
e23dae7
to
d5bb8c8
Compare
Thanks! |
merged, thanks! |
Much appreciated :) |
By default, we wait up to 1s for the build-side filters to be ready, to allow more efficient probe-side scan.
If the filters are not ready, we fallback to the previous behaviour (use current filters without blocking probe-side).
This feature can be configured using:
To disable, use: