Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: keyword lookups in the tokenizer #7606

Merged
merged 3 commits into from Mar 5, 2021
Merged

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Mar 4, 2021

Description

Happy Thursday everyone, this week we're bringing sqlparser performance improvements. I had a chance to sit with @frouioui and look at some of the profiles that we're now acquiring from his Are We Fast Yet (TM) work. There was nothing glaringly obvious that would provide massive optimization gains (as one would expect at this point, Vitess is quite optimized already), but for the normal request lifecycle, all of the sqlparser operations on the AST are always quite hot, and most importantly for our goals, CPU-bound.

Let's start squeezing some blood out of this stone: this particular PR comes from allocation benchmarks: the code in our SQL Tokenizer that processes SQL keywords allocates so much memory that it's showing as a hotspot in a CPU profiler and very clearly an allocation hotstop in the memory profiler.

Why does it do all these allocations? Well, right now it is copying the current token it's processing into a temporary buffer (this is the buffer that gets returned to the caller) and then it does yet another copy of the buffer to lowercase it so it can be looked up in our keywords table (people following at home will surely remember that SQL keywords are case insensitive).

Let's improve this with some very classical Compiler Theory approaches: Instead of using a hash table to lookup keywords, use a perfect table (a perfect table is a minimal hash table where lookups cannot collide -- it is measurably faster than a normal hashtable, even the one built-in in the Go runtime). And since we now have a perfect hash table, we control the hashing algorithm used for lookups... So we can switch the algorithm to perform the hashing case-insensitively. This makes it so we don't have to create lowercase copies of all keywords!

name                     old time/op  new time/op  delta
Normalize-16             7.50µs ± 2%  7.38µs ± 1%    ~     (p=0.222 n=5+5)
ParseDjango-16           11.4µs ± 2%  11.0µs ± 2%  -3.69%  (p=0.016 n=5+5)
Parse1-16                14.5µs ± 2%  14.2µs ± 2%  -2.08%  (p=0.032 n=5+5)
Parse2-16                46.9µs ± 3%  44.5µs ± 3%  -5.07%  (p=0.008 n=5+5)
Parse2Parallel-16        9.29µs ± 4%  9.13µs ± 3%    ~     (p=0.548 n=5+5)
Parse3-16                5.82ms ± 1%  5.85ms ± 1%    ~     (p=0.222 n=5+5)

Results are :gucci: in the most realistic parse benchmarks. The pathological benchmarks do not regress.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

vmg added 2 commits March 4, 2021 17:12
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
@shlomi-noach
Copy link
Contributor

I'm loving this.

Comment on lines 8 to 13
if !ok {
t.Fatalf("keyword %q failed to match", kw.name)
}
if lookup != kw.id {
t.Fatalf("keyword %q matched to %d (expected %d)", kw.name, lookup, kw.id)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use require over t.fatal

Comment on lines 3097 to 3099
if err != nil {
t.Errorf(" Error: %v", err)
t.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use require.NoError(t, err)

Comment on lines 3112 to 3114
if err != nil {
t.Error(scanner.Text())
t.Errorf(" Error: %v", err)
t.Errorf("failed to parse %q: %v", query, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use require.NoError(t, err)

Comment on lines +3156 to +3158
if err != nil {
b.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use require.NoError(t, err)

Comment on lines 3166 to 3168
if err != nil {
b.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Comment on lines 3176 to 3178
if err != nil {
b.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@derekperkins
Copy link
Member

I love how entertaining AND descriptive your PRs are @vmg :)

Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Mar 5, 2021

@harshit-gangal: I've added testify to all the new test cases, but I haven't updated the benchmarks to use testify because the require.NoError check actually adds measurable overhead and screws with the measures (:sweat:).

Ready to merge. 👌

@vmg vmg mentioned this pull request Mar 5, 2021
8 tasks
@deepthi
Copy link
Member

deepthi commented Mar 5, 2021

@harshit-gangal: I've added testify to all the new test cases, but I haven't updated the benchmarks to use testify because the require.NoError check actually adds measurable overhead and screws with the measures ().

What about quicktest? Does it add overhead?

@deepthi deepthi merged commit 1672432 into vitessio:master Mar 5, 2021
@vmg
Copy link
Collaborator Author

vmg commented Mar 8, 2021

@deepthi I haven't tested it, I just noticed the testify issue when re-running the benchmarks. I don't think it's particularly worrisome, Testify still works great for normal testing, so we can fall back to a simple if err != nil in the very few places where we have tight loops in benchmarks.

@vmg vmg mentioned this pull request Mar 12, 2021
8 tasks
@askdba askdba added this to the v10.0 milestone Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants