Presto for low latency: How to do plan caching or parameter substitution in presto ? #13384

agrawaldevesh · 2019-07-18T00:12:47Z

agrawaldevesh
Jul 18, 2019

Context: At Uber are trying to query low latency analytical systems like Pinot via Presto. We desire that the total presto overhead of doing so is under 10 milliseconds. However, we have found that the total planning time can range from 20-50 milliseconds.

Unfortunately, there is not one single plan optimizer that is contributing to this latency. Its spread across the 70 or so plan optimizers evenly with some more expensive than others.

To work around this planning latency cost, we tried to cache the plan in a dumb way and then realized that this simplistic approach won't really work and are hence asking the community about better suggestions.

Our queries are all very simple filter/aggregations with perhaps one join at the max

How we did the simplistic caching: The Statement is hash'able, while the PlanNode is not. So our caching key is the rewritten Statement obtained from Analysis object in doAnalyzeQuery. And we cache the final plan obtained after after optimization but before fragmentation.

This did save us about 40 milliseconds of query latency but it didn't work for our most typical queries.

The problem: Consider a query like this:
select count(1) from pinot.schema.table where seconds_since_epoch >= cast(to_unixtime(now() - interval '1' hour) as bigint)
Presto's planning includes expression interpretation, that will interpret away all the constants and in particular interpret away the "now()" to the session start time. So the query would be simplified at the beginning of planning to: select count(1) from pinot.schema.table where seconds_since_epoch >= 1563375727. However, the statement plan tree still refers to the original query.

The reason that presto does this expression interpretation seems to be to extract tuple domains/bounds for use later in partition pruning and/or predicate pushdown.

What this means is that the plan tree is now contaminated with the particular specifics of the current session and thus cannot be used for another query with the same statement. This voids our caching: So we actually have to filter away statements containing time functions like "now()" and avoid caching them.

This is a big bummer because almost all our queries are time based, issued by automated tools. So we are back to the high planning latency :).

We did some thinking on this and couldn't find a simple approach to implement this without major changes.

Need help on a better way to truly fix this
We brainstormed a few approaches listed below, but they all seem pretty heavy weight. So we would like guidance and suggestions on how to proceed.

Don't interpret now() to a constant, but we found that this voided other downstream optimizers and expression analyzers
Implement plan parameters in a better way, such that the parameter substitution is not done until as late as possible (today the parameters are substituted before the planning begins). But now we need to have a second stage of the planning that will actually resolve these parameters.
Break up the planning phase into multiple pieces, make the plan tree hash'able and try to cache the individual phases using the result of the preceding phases.
Bite the bullet and fix the planning times. Perhaps only add the optimizers that are strictly needed. Perhaps cache expensive things like expression rewriting, domain extraction etc.

I am wondering if there is any work in flight that will help with this ? Or any work towards reducing the presto planning time more.

cc: @vkorukanti

oneonestar · 2019-07-18T05:49:37Z

oneonestar
Jul 18, 2019
Collaborator

During our development of internal Oracle connector, we figured out the response time of the following APIs will greatly affect the performance of analyze and planning. You may take a look.

MetadataManager::getTableHandle()
MetadataManager::getTableMetadata()
MetadataManager::getColumnHandles()

0 replies

agrawaldevesh · 2019-07-18T06:23:11Z

agrawaldevesh
Jul 18, 2019
Author

Yeah we found that out too and we have the metadata totally cached. Planning is completely CPU dominated for us.

0 replies

sopel39 · 2019-07-18T08:45:02Z

sopel39
Jul 18, 2019
Collaborator

@agrawaldevesh Could you split which planning part takes the most time:

analysis
initial plan building
optimization
?

There is a plan to improve performance of prepared statements by caching initial plans, so that two first stages (analysis, initial plan building) don't contribute to query runtime.
See: #1143

0 replies

martint · 2019-07-18T17:43:01Z

martint
Jul 18, 2019
Maintainer

@agrawaldevesh, have you tried capturing a performance profile for your workload (with yourkit, JFR, perf, etc)? I'm sure there are a lot of opportunities for optimizing the planning process. Getting some concrete data on where the time is being spent would help guide the efforts.

0 replies

agrawaldevesh · 2019-07-18T22:07:17Z

agrawaldevesh
Jul 18, 2019
Author

Thanks for the comments and looking into this.

Almost the entire time is in optimization. For example, for a sample representative run, 17 ms were spend in the loop running through the optimizers. 1.5ms were spend in the plan distribution and sub-plan creation and a total of about 1ms were spend in the rest (initial plan creation, validation, final validation).

These numbers were obtained with completely cached metadata so I believe no RPC calls were done.

Of the 69 optimizers, the ones that stood out were: ExpressionRewriters, SimplifyExpressions, PickTableLayout, the giant iterative optimizer with predicatePushDownRules, MergeLimits etc.

I haven't run microbenchmarks and tried to profile just the planning piece. That would be a great next step. I did however notice that there is a lot of redundancy and repeated work particularly around expressions. So just caching that alone would be great. But let's defer that, until we have more data about this.

I am not sure how the parameter stuff is scoped out but it does have the potential to fix this issue too. I commented more on this on that issue #1143.

0 replies

sopel39 · 2019-07-19T06:20:04Z

sopel39
Jul 19, 2019
Collaborator

I haven't run microbenchmarks and tried to profile just the planning piece. That would be a great next step.

Please do. Such benchmarks would be really beneficial. Given that most of the time is spent in optimization stage, I'm not sure #1143 would help a lot in this case. Maybe it would be possible to cache partially optimized plan though, but I would start with optimizing existing optimizers.

0 replies

findepi · 2019-07-22T09:51:15Z

findepi
Jul 22, 2019
Collaborator

Of the 69 optimizers, the ones that stood out were: ExpressionRewriters, SimplifyExpressions, PickTableLayout, the giant iterative optimizer with predicatePushDownRules, MergeLimits etc.

Does your query contain LIMIT or ORDER BY? (Are you focusing on select count(1) from pinot.schema.table where seconds_since_epoch >= cast(to_unixtime(now() - interval '1' hour) as bigint) query or trying different ones?)
As I understand you're getting ms spent on each optimizer. Would it be possible to get a more fine-grained performance profile?

0 replies

guhanjie · 2021-05-27T04:09:32Z

guhanjie
May 27, 2021
Collaborator

Recently, we are doing the plan cache work on presto, also.
We also met the same problem, that's the time function(eg now(), current_timestamp,...) on SQL.
We tried to change such functions to deterministic = false, to avoid ExpressionRewrite to constant, and i also requested the issue.
but this does not work in some situation, because the expression interpretation will determine the tuple domains and Constraint, which interfering the partition pruning and/or predicate pushdown.
Finally, we did the same dicision as @agrawaldevesh : to filter away statements containing time functions like "now()" and avoid caching them.
Even through, we have great effect with plan cache in our workload(x millions queries per day), the actual performance profile is avg analysis time: 10ms -> 2ms, total latency reduce 10%

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto for low latency: How to do plan caching or parameter substitution in presto ? #13384

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Presto for low latency: How to do plan caching or parameter substitution in presto ? #13384

agrawaldevesh Jul 18, 2019

Replies: 8 comments

oneonestar Jul 18, 2019 Collaborator

agrawaldevesh Jul 18, 2019 Author

sopel39 Jul 18, 2019 Collaborator

martint Jul 18, 2019 Maintainer

agrawaldevesh Jul 18, 2019 Author

sopel39 Jul 19, 2019 Collaborator

findepi Jul 22, 2019 Collaborator

guhanjie May 27, 2021 Collaborator

agrawaldevesh
Jul 18, 2019

oneonestar
Jul 18, 2019
Collaborator

agrawaldevesh
Jul 18, 2019
Author

sopel39
Jul 18, 2019
Collaborator

martint
Jul 18, 2019
Maintainer

agrawaldevesh
Jul 18, 2019
Author

sopel39
Jul 19, 2019
Collaborator

findepi
Jul 22, 2019
Collaborator

guhanjie
May 27, 2021
Collaborator