Presto for low latency: How to do plan caching or parameter substitution in presto ? #13384
Replies: 8 comments
-
During our development of internal Oracle connector, we figured out the response time of the following APIs will greatly affect the performance of analyze and planning. You may take a look.
|
Beta Was this translation helpful? Give feedback.
-
Yeah we found that out too and we have the metadata totally cached. Planning is completely CPU dominated for us. |
Beta Was this translation helpful? Give feedback.
-
@agrawaldevesh Could you split which planning part takes the most time:
There is a plan to improve performance of prepared statements by caching initial plans, so that two first stages ( |
Beta Was this translation helpful? Give feedback.
-
@agrawaldevesh, have you tried capturing a performance profile for your workload (with yourkit, JFR, perf, etc)? I'm sure there are a lot of opportunities for optimizing the planning process. Getting some concrete data on where the time is being spent would help guide the efforts. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the comments and looking into this. Almost the entire time is in optimization. For example, for a sample representative run, 17 ms were spend in the loop running through the optimizers. 1.5ms were spend in the plan distribution and sub-plan creation and a total of about 1ms were spend in the rest (initial plan creation, validation, final validation). These numbers were obtained with completely cached metadata so I believe no RPC calls were done. Of the 69 optimizers, the ones that stood out were: ExpressionRewriters, SimplifyExpressions, PickTableLayout, the giant iterative optimizer with predicatePushDownRules, MergeLimits etc. I haven't run microbenchmarks and tried to profile just the planning piece. That would be a great next step. I did however notice that there is a lot of redundancy and repeated work particularly around expressions. So just caching that alone would be great. But let's defer that, until we have more data about this. I am not sure how the parameter stuff is scoped out but it does have the potential to fix this issue too. I commented more on this on that issue #1143. |
Beta Was this translation helpful? Give feedback.
-
Please do. Such benchmarks would be really beneficial. Given that most of the time is spent in optimization stage, I'm not sure #1143 would help a lot in this case. Maybe it would be possible to cache partially optimized plan though, but I would start with optimizing existing optimizers. |
Beta Was this translation helpful? Give feedback.
-
Does your query contain LIMIT or ORDER BY? (Are you focusing on |
Beta Was this translation helpful? Give feedback.
-
Recently, we are doing the plan cache work on presto, also. |
Beta Was this translation helpful? Give feedback.
-
Context: At Uber are trying to query low latency analytical systems like Pinot via Presto. We desire that the total presto overhead of doing so is under 10 milliseconds. However, we have found that the total planning time can range from 20-50 milliseconds.
Unfortunately, there is not one single plan optimizer that is contributing to this latency. Its spread across the 70 or so plan optimizers evenly with some more expensive than others.
To work around this planning latency cost, we tried to cache the plan in a dumb way and then realized that this simplistic approach won't really work and are hence asking the community about better suggestions.
Our queries are all very simple filter/aggregations with perhaps one join at the max
How we did the simplistic caching: The Statement is hash'able, while the PlanNode is not. So our caching key is the rewritten Statement obtained from Analysis object in doAnalyzeQuery. And we cache the final plan obtained after after optimization but before fragmentation.
This did save us about 40 milliseconds of query latency but it didn't work for our most typical queries.
The problem: Consider a query like this:
select count(1) from pinot.schema.table where seconds_since_epoch >= cast(to_unixtime(now() - interval '1' hour) as bigint)
Presto's planning includes expression interpretation, that will interpret away all the constants and in particular interpret away the "now()" to the session start time. So the query would be simplified at the beginning of planning to:
select count(1) from pinot.schema.table where seconds_since_epoch >= 1563375727
. However, the statement plan tree still refers to the original query.The reason that presto does this expression interpretation seems to be to extract tuple domains/bounds for use later in partition pruning and/or predicate pushdown.
What this means is that the plan tree is now contaminated with the particular specifics of the current session and thus cannot be used for another query with the same statement. This voids our caching: So we actually have to filter away statements containing time functions like "now()" and avoid caching them.
This is a big bummer because almost all our queries are time based, issued by automated tools. So we are back to the high planning latency :).
We did some thinking on this and couldn't find a simple approach to implement this without major changes.
Need help on a better way to truly fix this
We brainstormed a few approaches listed below, but they all seem pretty heavy weight. So we would like guidance and suggestions on how to proceed.
I am wondering if there is any work in flight that will help with this ? Or any work towards reducing the presto planning time more.
cc: @vkorukanti
Beta Was this translation helpful? Give feedback.
All reactions