Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache reserved bind variables in queries #7698

Merged
merged 5 commits into from Mar 18, 2021

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Mar 16, 2021

Description

This week we're starting our performance mission earlier with a long-standing TODO from @sougou in the SQL parser:

// GetBindvars returns a map of the bind vars referenced in the statement.
// TODO(sougou); This function gets called again from vtgate/planbuilder.
// Ideally, this should be done only once.
 func GetBindvars(stmt Statement) map[string]struct{} {

GetBindvars is not as cheap as it could be. It is, in fact, quite expensive! Walking the AST is hard! If we look at a profile for a Normalize benchmark, we find a surprising result:

image

GetBindvars accounts for almost 50% the time spent when normalizing a query! And we're doing it twice in the normal flow of a request: once when normalizing the incoming query, and again in the planbuilder when rewriting the query into its final plan. @sougou is quite right that "ideally, this should be done only once"... But we can do even better: ideally this shouldn't be done at all!

This PR removes all the usages of GetBindvars from the codebase -- instead, it updates our SQL grammar so it keeps track of the bind variables as it finds them while parsing. Then, we propagate these reserved variable names everywhere they are needed. A bit verbose, but it makes normalization twice as fast (as one would expect from looking at the flame graph):

name                                   old time/op    new time/op    delta
Normalize-16                             7.35µs ± 2%    4.59µs ± 4%  -37.49%  (p=0.008 n=5+5)
NormalizeTraces/django_queries.txt-16     935µs ± 4%     443µs ± 2%  -52.58%  (p=0.008 n=5+5)
NormalizeTraces/lobsters.sql.gz-16       37.9ms ± 1%    18.9ms ± 1%  -50.22%  (p=0.008 n=5+5)

name                                   old alloc/op   new alloc/op   delta
NormalizeTraces/django_queries.txt-16     277kB ± 0%     145kB ± 0%  -47.60%  (p=0.008 n=5+5)
NormalizeTraces/lobsters.sql.gz-16       11.4MB ± 0%     6.1MB ± 0%  -46.25%  (p=0.029 n=4+4)

name                                   old allocs/op  new allocs/op  delta
NormalizeTraces/django_queries.txt-16     8.66k ± 0%     4.22k ± 0%  -51.30%  (p=0.008 n=5+5)
NormalizeTraces/lobsters.sql.gz-16         349k ± 0%      183k ± 0%  -47.71%  (p=0.029 n=4+4)

Obviously, this is in a synthetic benchmark -- in a production environment, the speedup will be even more significant because we're no longer calling GetBindvars redundantly from the query planner.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

vmg added 3 commits March 16, 2021 18:19
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

I have one question.
Can the reserved Bindvars be exposed as a method on Statement Interface than passing it through all the methods?

@vmg
Copy link
Collaborator Author

vmg commented Mar 17, 2021

@harshit-gangal: that was the first design I attempted:

type BindableStatement struct {
    Statement
    KnownBinds BindVars
}

I tried wrapping all the statements that had bindable variables with a Statement type that caches them, but this is extremely complicated to wire up because the codebase is littered with hundreds of (type) checks on statement types that simply "break" its logic when we introduce this new wrapping statement but do not fail to compile. I tried to fix all the breakage for a couple hours, but it was pointless. After that, I switched to explicitly wiring up the bind variables, because changing function signatures makes the compiler guide you to the places that need changing.

Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more conflicts to resolve

@systay systay merged commit 2a1c7b4 into vitessio:master Mar 18, 2021
@askdba askdba added this to the v10.0 milestone Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants