Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] use perfect hashes for switches with small strings #506

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

quasilyte
Copy link
Contributor

Previously, we compiled a string switch as follow:

1. compute a tag (condition) string hash
2. switch over precomputed hashes of cases
3. inside a switch case, do `equals()` for a string

So, we compile a switch as a 2-level comparison: hash matching
plus equals() call on a string.

For shorter strings we can avoid the third step by using a
hash that will not result in collisions and identifies a string completely.

Since we're using uint64_t for switch hashes, we have 8 bits for information.
We can store up to 8 chars inside that without any losses.
Our hash function will be more or less memcpy(&h, s, s_len).

Hash of a single char becomes a value in 0 <= x <= 255 range, which
is great for C++ switch density: it's likely to get that switch compiled
as a jump table (useful for lexers, etc).

Another common case is a shared prefix of all case values.

case 'OPERATION_DELETE'
case 'OPERATION_INSERT'
case 'OPERATION_UPDATE'

All cases above have a length of 16 that is too much for our new hashing.
But if we remove the common OPERATION_ prefix, strings fit into 6 bytes.

When we have such switch with common prefixes and varying suffixes that
fit into 8 bytes, we compare the prefix before hashing, if it does match,
variable cases tail is hashed (DELETE, INSERT, UPDATE from above).

Since optimized form doesn't require extra if statement inside every case,
the binary size gets smaller as well (but not dramatically).

In average, we compile 60% of string switches using the optimized scheme.
(~50% for short strings plus around ~10% for prefix form)

Benchmark results:

name                                        old time/op  new time/op  delta
SwitchOnlyDefault                           70.0ns ± 0%  29.0ns ± 0%  -58.57%
Lexer                                        119ns ± 1%    78ns ± 2%  -35.04%
LexerMiss                                   87.0ns ± 0%  77.0ns ± 0%  -11.49%
LexerComplex                                 115ns ± 1%    93ns ± 2%  -19.43%
LexerComplexMiss                            87.0ns ± 0%  74.6ns ± 1%  -14.25%
Switch8oneChar                               158ns ± 0%    95ns ± 1%  -39.81%
Switch8oneCharMiss                           108ns ± 1%   102ns ± 1%   -5.48%
Switch8oneCharNoDefault                      164ns ± 1%    87ns ± 1%  -46.67%
Switch8oneCharNoDefaultMiss                  117ns ± 0%    97ns ± 1%  -17.35%
Switch6perfectHash                           258ns ± 1%   204ns ± 1%  -21.19%
Switch6perfectHashMiss                       130ns ± 0%   119ns ± 1%   -8.21%
Switch6perfectHashNoDefault                  271ns ± 1%   193ns ± 1%  -28.72%
Switch6perfectHashNoDefaultMiss              138ns ± 1%   115ns ± 1%  -16.58%
Switch12perfectHash                          278ns ± 0%   194ns ± 4%  -30.19%
Switch12perfectHashMiss                      139ns ± 1%   109ns ± 1%  -21.62%
Switch12perfectHashNoDefault                 275ns ± 1%   190ns ± 0%  -30.74%
Switch12perfectHashNoDefaultMiss             142ns ± 1%   108ns ± 0%  -24.05%
Switch6perfectHashWithPrefix                 286ns ± 0%   219ns ± 3%  -23.40%
Switch6perfectHashWithPrefixMiss             190ns ± 1%   143ns ± 2%  -24.67%
Switch6perfectHashWithPrefixNoDefault        293ns ± 0%   230ns ± 1%  -21.58%
Switch6perfectHashWithPrefixNoDefaultMiss    194ns ± 1%   153ns ± 2%  -20.82%
Switch12perfectHashWithPrefix                303ns ± 1%   225ns ± 2%  -25.69%
Switch12perfectHashWithPrefixMiss            159ns ± 0%   113ns ± 0%  -28.93%
Switch12perfectHashWithPrefixNoDefault       308ns ± 1%   224ns ± 1%  -27.18%
Switch12perfectHashWithPrefixNoDefaultMiss   141ns ± 1%   117ns ± 2%  -17.14%
[Geo mean]                                   171ns        130ns       -23.93%

Another change is that we now compile a default-only switch a little bit differently.
We don't compute a hash and discard auto-generated variables to avoid C++ compiler warnings.

Refs #418
Fixes #503

compiler/code-gen/vertex-compiler.cpp Outdated Show resolved Hide resolved
compiler/code-gen/vertex-compiler.cpp Outdated Show resolved Hide resolved
Previously, we compiled a string switch as follow:

	1. compute a tag (condition) string hash
	2. switch over precomputed hashes of cases
	3. inside a switch case, do `equals()` for a string

So, we compile a switch as a 2-level comparison: hash matching
plus `equals()` call on a string.

For shorter strings we can avoid the third step by using a
hash that will not result in collisions and identifies a string completely.

Since we're using uint64_t for switch hashes, we have 8 bits for information.
We can store up to 8 chars inside that without any losses.
Our hash function will be more or less `memcpy(&h, s, s_len)`.

Hash of a single char becomes a value in `0 <= x <= 255` range, which
is great for C++ switch density: it's likely to get that switch compiled
as a jump table (useful for lexers, etc).

Another common case is a shared prefix of all case values.

	case 'OPERATION_DELETE'
	case 'OPERATION_INSERT'
	case 'OPERATION_UPDATE'

All cases above have a length of 16 that is too much for our new hashing.
But if we remove the common `OPERATION_` prefix, strings fit into 6 bytes.

When we have such switch with common prefixes and varying suffixes that
fit into 8 bytes, we compare the prefix before hashing, if it does match,
variable cases tail is hashed (`DELETE`, `INSERT`, `UPDATE` from above).

Since optimized form doesn't require extra if statement inside every case,
the binary size gets smaller as well (but not dramatically).

In average, we compile 60% of string switches using the optimized scheme.
(~50% for short strings plus around ~10% for prefix form)

Benchmark results:

    name                                        old time/op  new time/op  delta
    SwitchOnlyDefault                           70.0ns ± 0%  29.0ns ± 0%  -58.57%
    Lexer                                        119ns ± 1%    78ns ± 2%  -35.04%
    LexerMiss                                   87.0ns ± 0%  77.0ns ± 0%  -11.49%
    LexerComplex                                 115ns ± 1%    93ns ± 2%  -19.43%
    LexerComplexMiss                            87.0ns ± 0%  74.6ns ± 1%  -14.25%
    Switch8oneChar                               158ns ± 0%    95ns ± 1%  -39.81%
    Switch8oneCharMiss                           108ns ± 1%   102ns ± 1%   -5.48%
    Switch8oneCharNoDefault                      164ns ± 1%    87ns ± 1%  -46.67%
    Switch8oneCharNoDefaultMiss                  117ns ± 0%    97ns ± 1%  -17.35%
    Switch6perfectHash                           258ns ± 1%   204ns ± 1%  -21.19%
    Switch6perfectHashMiss                       130ns ± 0%   119ns ± 1%   -8.21%
    Switch6perfectHashNoDefault                  271ns ± 1%   193ns ± 1%  -28.72%
    Switch6perfectHashNoDefaultMiss              138ns ± 1%   115ns ± 1%  -16.58%
    Switch12perfectHash                          278ns ± 0%   194ns ± 4%  -30.19%
    Switch12perfectHashMiss                      139ns ± 1%   109ns ± 1%  -21.62%
    Switch12perfectHashNoDefault                 275ns ± 1%   190ns ± 0%  -30.74%
    Switch12perfectHashNoDefaultMiss             142ns ± 1%   108ns ± 0%  -24.05%
    Switch6perfectHashWithPrefix                 286ns ± 0%   219ns ± 3%  -23.40%
    Switch6perfectHashWithPrefixMiss             190ns ± 1%   143ns ± 2%  -24.67%
    Switch6perfectHashWithPrefixNoDefault        293ns ± 0%   230ns ± 1%  -21.58%
    Switch6perfectHashWithPrefixNoDefaultMiss    194ns ± 1%   153ns ± 2%  -20.82%
    Switch12perfectHashWithPrefix                303ns ± 1%   225ns ± 2%  -25.69%
    Switch12perfectHashWithPrefixMiss            159ns ± 0%   113ns ± 0%  -28.93%
    Switch12perfectHashWithPrefixNoDefault       308ns ± 1%   224ns ± 1%  -27.18%
    Switch12perfectHashWithPrefixNoDefaultMiss   141ns ± 1%   117ns ± 2%  -17.14%
    [Geo mean]                                   171ns        130ns       -23.93%

Another change is that we now compile a default-only switch a little bit differently.
We don't compute a hash and discard auto-generated variables to avoid C++ compiler warnings.

Refs #418
Fixes #503
@unserialize unserialize added the optimization Memory comsumption / CPU speedup label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Memory comsumption / CPU speedup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unused variable warning in empty switch()
3 participants