perf: keyword lookups in the tokenizer #7606

vmg · 2021-03-04T16:22:46Z

Description

Happy Thursday everyone, this week we're bringing sqlparser performance improvements. I had a chance to sit with @frouioui and look at some of the profiles that we're now acquiring from his Are We Fast Yet (TM) work. There was nothing glaringly obvious that would provide massive optimization gains (as one would expect at this point, Vitess is quite optimized already), but for the normal request lifecycle, all of the sqlparser operations on the AST are always quite hot, and most importantly for our goals, CPU-bound.

Let's start squeezing some blood out of this stone: this particular PR comes from allocation benchmarks: the code in our SQL Tokenizer that processes SQL keywords allocates so much memory that it's showing as a hotspot in a CPU profiler and very clearly an allocation hotstop in the memory profiler.

Why does it do all these allocations? Well, right now it is copying the current token it's processing into a temporary buffer (this is the buffer that gets returned to the caller) and then it does yet another copy of the buffer to lowercase it so it can be looked up in our keywords table (people following at home will surely remember that SQL keywords are case insensitive).

Let's improve this with some very classical Compiler Theory approaches: Instead of using a hash table to lookup keywords, use a perfect table (a perfect table is a minimal hash table where lookups cannot collide -- it is measurably faster than a normal hashtable, even the one built-in in the Go runtime). And since we now have a perfect hash table, we control the hashing algorithm used for lookups... So we can switch the algorithm to perform the hashing case-insensitively. This makes it so we don't have to create lowercase copies of all keywords!

name                     old time/op  new time/op  delta
Normalize-16             7.50µs ± 2%  7.38µs ± 1%    ~     (p=0.222 n=5+5)
ParseDjango-16           11.4µs ± 2%  11.0µs ± 2%  -3.69%  (p=0.016 n=5+5)
Parse1-16                14.5µs ± 2%  14.2µs ± 2%  -2.08%  (p=0.032 n=5+5)
Parse2-16                46.9µs ± 3%  44.5µs ± 3%  -5.07%  (p=0.008 n=5+5)
Parse2Parallel-16        9.29µs ± 4%  9.13µs ± 3%    ~     (p=0.548 n=5+5)
Parse3-16                5.82ms ± 1%  5.85ms ± 1%    ~     (p=0.222 n=5+5)

Results are :gucci: in the most realistic parse benchmarks. The pathological benchmarks do not regress.

Related Issue(s)

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

Signed-off-by: Vicent Marti <vmg@strn.cat>

shlomi-noach · 2021-03-04T16:36:12Z

I'm loving this.

harshit-gangal · 2021-03-04T16:33:12Z

go/vt/sqlparser/keywords_test.go

+		if !ok {
+			t.Fatalf("keyword %q failed to match", kw.name)
+		}
+		if lookup != kw.id {
+			t.Fatalf("keyword %q matched to %d (expected %d)", kw.name, lookup, kw.id)
+		}


nit: use require over t.fatal

harshit-gangal · 2021-03-04T16:33:50Z

go/vt/sqlparser/parse_test.go

 	if err != nil {
-		t.Errorf(" Error: %v", err)
+		t.Fatal(err)
 	}


nit: use require.NoError(t, err)

harshit-gangal · 2021-03-04T16:34:43Z

go/vt/sqlparser/parse_test.go

 		if err != nil {
-			t.Error(scanner.Text())
-			t.Errorf(" Error: %v", err)
+			t.Errorf("failed to parse %q: %v", query, err)
 		}


nit: use require.NoError(t, err)

harshit-gangal · 2021-03-04T16:35:13Z

go/vt/sqlparser/parse_test.go

+		if err != nil {
+			b.Fatal(err)
+		}


nit: use require.NoError(t, err)

harshit-gangal · 2021-03-04T16:35:29Z

go/vt/sqlparser/parse_test.go

 		if err != nil {
 			b.Fatal(err)
 		}


same as above.

harshit-gangal · 2021-03-04T16:35:38Z

go/vt/sqlparser/parse_test.go

 		if err != nil {
 			b.Fatal(err)
 		}


same as above.

derekperkins · 2021-03-04T18:06:59Z

I love how entertaining AND descriptive your PRs are @vmg :)

GuptaManan100

💯

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg · 2021-03-05T15:30:50Z

@harshit-gangal: I've added testify to all the new test cases, but I haven't updated the benchmarks to use testify because the require.NoError check actually adds measurable overhead and screws with the measures (:sweat:).

Ready to merge. 👌

deepthi · 2021-03-05T18:19:28Z

@harshit-gangal: I've added testify to all the new test cases, but I haven't updated the benchmarks to use testify because the require.NoError check actually adds measurable overhead and screws with the measures ().

What about quicktest? Does it add overhead?

vmg · 2021-03-08T11:15:03Z

@deepthi I haven't tested it, I just noticed the testify issue when re-running the benchmarks. I don't think it's particularly worrisome, Testify still works great for normal testing, so we can fall back to a simple if err != nil in the very few places where we have tight loops in benchmarks.

vmg added 2 commits March 4, 2021 17:12

sqlparser: improve benchmarks

54c7143

Signed-off-by: Vicent Marti <vmg@strn.cat>

sqlparser: use a perfect hash table for keywords

92b21f9

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg requested review from GuptaManan100, harshit-gangal and systay as code owners March 4, 2021 16:22

harshit-gangal approved these changes Mar 4, 2021

View reviewed changes

GuptaManan100 reviewed Mar 5, 2021

View reviewed changes

GuptaManan100 approved these changes Mar 5, 2021

View reviewed changes

sqlparser: use testify for tests

e25d0fb

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg mentioned this pull request Mar 5, 2021

perf: zero-copy tokenizer #7619

Merged

8 tasks

deepthi merged commit 1672432 into vitessio:master Mar 5, 2021

vmg mentioned this pull request Mar 12, 2021

perf: sqlparser yacc codegen #7669

Merged

8 tasks

deepthi mentioned this pull request Mar 12, 2021

Performance Improvements #7674

Open

askdba added the Component: Query Serving label Apr 6, 2021

askdba added this to the v10.0 milestone Apr 6, 2021

ajm188 mentioned this pull request Jul 15, 2021

slack vitess v10.pre tinyspeck/vitess#228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: keyword lookups in the tokenizer #7606

perf: keyword lookups in the tokenizer #7606

vmg commented Mar 4, 2021

shlomi-noach commented Mar 4, 2021

harshit-gangal Mar 4, 2021

harshit-gangal Mar 4, 2021

harshit-gangal Mar 4, 2021

harshit-gangal Mar 4, 2021

harshit-gangal Mar 4, 2021

harshit-gangal Mar 4, 2021

derekperkins commented Mar 4, 2021

GuptaManan100 left a comment

vmg commented Mar 5, 2021

deepthi commented Mar 5, 2021

vmg commented Mar 8, 2021

perf: keyword lookups in the tokenizer #7606

perf: keyword lookups in the tokenizer #7606

Conversation

vmg commented Mar 4, 2021

Description

Related Issue(s)

Checklist

Deployment Notes

Impacted Areas in Vitess

shlomi-noach commented Mar 4, 2021

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

harshit-gangal Mar 4, 2021

Choose a reason for hiding this comment

derekperkins commented Mar 4, 2021

GuptaManan100 left a comment

Choose a reason for hiding this comment

vmg commented Mar 5, 2021

deepthi commented Mar 5, 2021

vmg commented Mar 8, 2021