perf: optimize bind var generation #7828
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is 3 optimizations in one:
sqlparser: do not split statements that don't contain a semicolon
: the insight is simple: the MySQL server implementation tries to split all incoming query packets into their individual statements, but the vast majority of queries do not contain more than one statement. If a query doesn't contain a;
, we don't bother to do a full tokenization phase to split it.Optimize reserved variable generation: the normalization/rewriting code uses a map to keep track of the bind variable names that are already in use, as to ensure we don't insert duplicated variables. This becomes very slow in queries with many binds, such as bulk inserts. This optimization implements a new
ReservedVars
type that can detect duplicate variables and generate unique variable names more efficiently.ReservedVars
instance, we check whether any of the existing variable names has the same prefix as the Vitess-specific variables we're going to generate. If that's not the case, we can generate unique variables incrementally, ensuring that they'll never collide. We don't need to check the existing variables map when generating."vtg"
prefix for all variable names (this is the normalization that is performed for all incoming queries in VTGate), so we don't need to allocate new variable names dynamically. This PR introduces a static array of variable names that can be reused between all the incoming queries to a VTGate, drastically reducing memory allocations, particularly in bulk insert queries.Remove memory allocations in
Argument
andListArg
: this is the larger refactoring in this PR. These two SQL AST nodes hold an invariant: that the name of the argument or list argument must always begin with:
or::
, respectively. However, the vast majority of places in Vitess where we're using bind variables as arguments, they're not prefixed with a colon. This leads to a pattern that repeats dozens of times in the codebase, where we allocateArgument
instances by creating superfluous string allocations, e.g.NewArgument(":" + bindVar)
. If we simply store the name of the argument without the colon, we never have to prepend colons -- an operation that happens hundreds of times during a normalvtgate
request. This refactoring reduces the individual memory allocations by 5%, and this is a global reduction. Very significant.A synthetic benchmark also shows very impressive results. Note that this synthetic benchmark includes parsing (which is required before normalization), and roughly 50% of the runtime of the benchmark is spent on parsing, while this is a perf improvement for normalization. The point improvement, if it could be measured in isolation, would be closer to
40%
.Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: