Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize bind var generation #7828

Merged
merged 6 commits into from
Apr 15, 2021

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Apr 12, 2021

Description

This is 3 optimizations in one:

  1. sqlparser: do not split statements that don't contain a semicolon: the insight is simple: the MySQL server implementation tries to split all incoming query packets into their individual statements, but the vast majority of queries do not contain more than one statement. If a query doesn't contain a ;, we don't bother to do a full tokenization phase to split it.

  2. Optimize reserved variable generation: the normalization/rewriting code uses a map to keep track of the bind variable names that are already in use, as to ensure we don't insert duplicated variables. This becomes very slow in queries with many binds, such as bulk inserts. This optimization implements a new ReservedVars type that can detect duplicate variables and generate unique variable names more efficiently.

    • The var generator has a "fast" mode: when creating a new ReservedVars instance, we check whether any of the existing variable names has the same prefix as the Vitess-specific variables we're going to generate. If that's not the case, we can generate unique variables incrementally, ensuring that they'll never collide. We don't need to check the existing variables map when generating.
    • Furthermore, the most common use case for query normalization uses the "vtg" prefix for all variable names (this is the normalization that is performed for all incoming queries in VTGate), so we don't need to allocate new variable names dynamically. This PR introduces a static array of variable names that can be reused between all the incoming queries to a VTGate, drastically reducing memory allocations, particularly in bulk insert queries.
  3. Remove memory allocations in Argument and ListArg: this is the larger refactoring in this PR. These two SQL AST nodes hold an invariant: that the name of the argument or list argument must always begin with : or ::, respectively. However, the vast majority of places in Vitess where we're using bind variables as arguments, they're not prefixed with a colon. This leads to a pattern that repeats dozens of times in the codebase, where we allocate Argument instances by creating superfluous string allocations, e.g. NewArgument(":" + bindVar). If we simply store the name of the argument without the colon, we never have to prepend colons -- an operation that happens hundreds of times during a normal vtgate request. This refactoring reduces the individual memory allocations by 5%, and this is a global reduction. Very significant.

A synthetic benchmark also shows very impressive results. Note that this synthetic benchmark includes parsing (which is required before normalization), and roughly 50% of the runtime of the benchmark is spent on parsing, while this is a perf improvement for normalization. The point improvement, if it could be measured in isolation, would be closer to 40%.

name                    old time/op    new time/op    delta
NormalizeTPCCInsert-16     255ms ± 1%     203ms ± 1%  -20.29%  (p=0.000 n=9+10)

name                    old alloc/op   new alloc/op   delta
NormalizeTPCCInsert-16    68.5MB ± 0%    61.7MB ± 0%   -9.91%  (p=0.000 n=10+10)

name                    old allocs/op  new allocs/op  delta
NormalizeTPCCInsert-16     1.31M ± 0%     1.06M ± 0%  -19.21%  (p=0.000 n=10+10)

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

vmg added 6 commits April 15, 2021 10:49
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants