perf: optimize bind var generation #7828

vmg · 2021-04-12T14:39:40Z

Description

This is 3 optimizations in one:

sqlparser: do not split statements that don't contain a semicolon: the insight is simple: the MySQL server implementation tries to split all incoming query packets into their individual statements, but the vast majority of queries do not contain more than one statement. If a query doesn't contain a ;, we don't bother to do a full tokenization phase to split it.
Optimize reserved variable generation: the normalization/rewriting code uses a map to keep track of the bind variable names that are already in use, as to ensure we don't insert duplicated variables. This becomes very slow in queries with many binds, such as bulk inserts. This optimization implements a new ReservedVars type that can detect duplicate variables and generate unique variable names more efficiently.
- The var generator has a "fast" mode: when creating a new ReservedVars instance, we check whether any of the existing variable names has the same prefix as the Vitess-specific variables we're going to generate. If that's not the case, we can generate unique variables incrementally, ensuring that they'll never collide. We don't need to check the existing variables map when generating.
- Furthermore, the most common use case for query normalization uses the "vtg" prefix for all variable names (this is the normalization that is performed for all incoming queries in VTGate), so we don't need to allocate new variable names dynamically. This PR introduces a static array of variable names that can be reused between all the incoming queries to a VTGate, drastically reducing memory allocations, particularly in bulk insert queries.
Remove memory allocations in Argument and ListArg: this is the larger refactoring in this PR. These two SQL AST nodes hold an invariant: that the name of the argument or list argument must always begin with : or ::, respectively. However, the vast majority of places in Vitess where we're using bind variables as arguments, they're not prefixed with a colon. This leads to a pattern that repeats dozens of times in the codebase, where we allocate Argument instances by creating superfluous string allocations, e.g. NewArgument(":" + bindVar). If we simply store the name of the argument without the colon, we never have to prepend colons -- an operation that happens hundreds of times during a normal vtgate request. This refactoring reduces the individual memory allocations by 5%, and this is a global reduction. Very significant.

A synthetic benchmark also shows very impressive results. Note that this synthetic benchmark includes parsing (which is required before normalization), and roughly 50% of the runtime of the benchmark is spent on parsing, while this is a perf improvement for normalization. The point improvement, if it could be measured in isolation, would be closer to 40%.

name                    old time/op    new time/op    delta
NormalizeTPCCInsert-16     255ms ± 1%     203ms ± 1%  -20.29%  (p=0.000 n=9+10)

name                    old alloc/op   new alloc/op   delta
NormalizeTPCCInsert-16    68.5MB ± 0%    61.7MB ± 0%   -9.91%  (p=0.000 n=10+10)

name                    old allocs/op  new allocs/op  delta
NormalizeTPCCInsert-16     1.31M ± 0%     1.06M ± 0%  -19.21%  (p=0.000 n=10+10)

Related Issue(s)

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

go/vt/sqlparser/ast_rewriting.go

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg requested review from GuptaManan100, harshit-gangal, rohit-nayak-ps, shlomi-noach and systay as code owners April 12, 2021 14:39

vmg force-pushed the vmg/more-mysql-perf branch 2 times, most recently from 390c3e5 to a572b6f Compare April 12, 2021 16:18

systay reviewed Apr 13, 2021

View reviewed changes

go/vt/sqlparser/ast_rewriting.go Show resolved Hide resolved

harshit-gangal approved these changes Apr 15, 2021

View reviewed changes

vmg added 6 commits April 15, 2021 10:49

sqlparser: do not split statements that don't contain a semicolon

6369299

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: more complex normalization benchmarks

373dcc1

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: optimize bind var generation

4956de8

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: do not keep colons in arguments

a23c0dd

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: document ReservedVars

7f28cdf

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: simplify tracked buffer args

5b9b2c2

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg force-pushed the vmg/more-mysql-perf branch from e57618b to 5b9b2c2 Compare April 15, 2021 08:52

harshit-gangal merged commit 8243c57 into vitessio:master Apr 15, 2021

systay added Type: Performance Component: Query Serving labels Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize bind var generation #7828

perf: optimize bind var generation #7828

vmg commented Apr 12, 2021

perf: optimize bind var generation #7828

perf: optimize bind var generation #7828

Conversation

vmg commented Apr 12, 2021

Description

Related Issue(s)

Checklist

Deployment Notes

Impacted Areas in Vitess