feat: O(1) selector tables #3496

charles-cooper · 2023-07-13T19:06:14Z

What I did

implement O(1) jumptables

How I did it

two methods, hash table with probing and perfect hashing using a two-level technique.

the first method divides the selectors into buckets, uses method_id % n_buckets as a "guess" to where to enter the selector table and then jumps there and performs the familiar linear search for the selector ("probing"). to avoid too large buckets, the jumptable generator searches a range of n_buckets; the average worst case for 80-100 methods is 3 items per bucket and the worst worst case is 4 items per bucket (presumably if you get really unlucky), see _bench_sparse() in vyper/codegen/jumptable_utils.py. the average bucket size is 1.6 methods.

the second method uses a perfect hashing technique. finding a single magic which produces a perfect hash is infeasible for large N (exponential, and seems to run off a cliff around 10 methods). to "get around" this, the methods are divided into buckets of roughly size 10, and a magic is computed per bucket. several n_buckets are tried, trying to minimize n_buckets. the code size overhead of each bucket is roughly 5 bytes per bucket, which works out to ~20% per method, see _bench_dense() in vyper/codegen/jumptable_utils.py. then, the function selector is looked up in two steps - it loads the magic for the bucket given by method_id % n_buckets, and then uses the magic to compute the location of the function selector (and associated metadata) in the data section. from there it loads the function metadata, performs the calldatasize, callvalue and method id checks and jumps into the function.

there is a gas vs code size tradeoff, as can be seen in the following table:

notably, the sparse (gas optimizing) version clocks in at 69 gas in the best case (~109 gas in the "average" case), while the dense version clocks in at ~8 bytes per method.

some additions needed to be made to the assembler to handle pure data blocks.

How to verify it

all existing tests pass, also see new selector table tests in tests/parser/test_selector_table.py which should test different configurations of selector tables somewhat exhaustively

Commit message

this commit replaces the existing linear entry point search with an O(1)
implementation. there are two methods depending on whether optimizing
for code size or gas, hash table with probing and perfect hashing using
a two-level technique.

the first method divides the selectors into buckets, uses
`method_id % n_buckets` as a "guess" to where to enter the selector
table and then jumps there and performs the familiar linear search for
the selector ("probing"). to avoid too large buckets, the jumptable
generator searches a range from ~`n_buckets * 0.85` to
`n_buckets * 1.15` to minimize worst-case probe depth; the average worst
case for 80-100 methods is 3 items per bucket and the worst worst case
is 4 items per bucket (presumably if you get really unlucky), see
`_bench_sparse()` in `vyper/codegen/jumptable_utils.py`. the average
bucket size is 1.6 methods.

the second method uses a perfect hashing technique. finding a single
magic which produces a perfect hash is infeasible for large `N`
(exponential, and in practice seems to run off a cliff around 10
 methods). to "get around" this, the methods are divided into buckets of
roughly size 10, and a magic is computed per bucket. several `n_buckets`
are tried, trying to minimize `n_buckets`. the code size overhead of
each bucket is roughly 5 bytes per bucket, which works out to ~20% per
method, see `_bench_dense()` in `vyper/codegen/jumptable_utils.py`.
then, the function selector is looked up in two steps - it loads the
magic for the bucket given by `method_id % n_buckets`, and then uses the
magic to compute the location of the function selector (and associated
metadata) in the data section. from there it loads the function
metadata, performs the calldatasize, callvalue and method id checks and
jumps into the function.

there is a gas vs code size tradeoff between the two methods - roughly
speaking, the sparse method requires ~69 gas in the best case (~109 gas
in the "average" case) and 12-22 bytes of code per method, while the
dense method requires ~212 gas across the board, and ~8 bytes of code
per method.

to accomplish this implementation-wise, the jumptable info is generated
in a new helper module, `vyper/codegen/jumptable_utils.py`. some
refactoring had to be additionally done to pull the calldatasize,
callvalue and method id checks from external function generation out
into a new selector section construction step in
`vyper/codegen/module.py`.

additionally, a new IR "data" directive was added, and an associated
assembly directive. the data segments in assembly are moved to the end
of the bytecode to ensure that data bytes which happen to look like
`PUSH` instructions do not mangle valid bytecode which comes after the
data section.

Description for the changelog

Cute Animal Picture

move the function selection table into module.py so that it's easier to switch between the two selector table implementations

this commit adds the `--optimize` flag to the vyper cli, and as an option in vyper json. it is to be used separately from the `--no-optimize` flag. this commit does not actually change codegen, just adds the flag and threads it through the codebase so it is available once we want to start differentiating between the two modes, and sets up the test harness to test both modes. it also makes the `optimize` and `evm-version` available as source code pragmas, and adds an additional syntax for specifying the compiler version (`#pragma version X.Y.Z`). if the CLI / JSON options conflict with the source code pragmas, an exception is raised. this commit also: * bumps mypy - it was needed to bump to 0.940 to handle match/case, and discovered we could bump all the way to 0.98* without breaking anything * removes evm_version from bitwise op tests - it was probably important when we supported pre-constantinople targets, which we don't anymore

can just insert calldatasize check in special case where there are trailing 0s

this is important because in EVM, data immediately before regular (valid) code can mangle the valid code.

pcaversaccio · 2023-07-21T15:00:15Z

that's right, we have to do linear search after jumping to the guessed position. but unlike hashmaps constructed at runtime, we have some control over worst-case probe depth because we can search for a good hash function to minimize probe depth - benchmarking shows that typical worst case is 3 items per bucket (and average 1.6 per bucket), so we have a (statistical) bound on the probing depth. so it's something like O(1-3), which is still O(1).

(probably if you were to benchmark with larger and larger selector tables, i'd guess the worst case bucket grows at some rate like log_10(N), but it's pretty much indistinguishable from O(1) since we are only typically dealing with selector tables up to ~100 items in practice.)

oh yeah, right, thanks for the clarifications!

remove an unused __init__.py file

pcaversaccio · 2023-07-21T15:18:43Z

As per offline discussion, it might make sense to test for non-deterministic behaviour of the selector table. See a similar (not equivalent iiuc) bug in Solidity version 0.8.21:

Vyper does already check the call graph stability (see here: #3370)

charles-cooper · 2023-07-21T16:24:55Z

As per offline discussion, it might make sense to test for non-deterministic behaviour of the selector table.

yea this makes sense, although it's not super clear to me the best way to test this, maybe it would be good for this to be addressed in another PR as this PR is already getting quite large

pcaversaccio · 2023-07-21T16:51:37Z

yea this makes sense, although it's not super clear to me the best way to test this, maybe it would be good for this to be addressed in another PR as this PR is already getting quite large

I agree, but this should be addressed before the 0.3.10 release. I opened an issue #3530 to track it.

pcaversaccio · 2023-07-21T17:01:42Z

Can't approve via GitHub, so I approve via comment :) LGTM

not there, payable, different levels of nonpayable.

vyper/codegen/jumptable_utils.py

vyper/codegen/module.py

antazoey · 2023-09-08T19:47:53Z

vyper/compiler/output.py

+            # we can have push_len > len(bytecode_sequence) when there is data
+            # (instead of code) at end of contract
+            # CMC 2023-07-13 maybe just strip known data segments?
+            push_len = min(push_len, len(bytecode_sequence))


@charles-cooper Is it possible for len(bytecode_sequence) to be < push_len here and if that is the case, would that cause isses?

yes, but it shouldn't cause issues, this is just a way of handling disassembly of the data section

charles-cooper added 29 commits July 4, 2023 13:42

wip jumptables

ef474ba

small refactor, add bucket datastructure

20b5b1d

use smaller buckets

391f3b5

don't need primes

eb3d54f

clean up primes generation

9f53fff

small cleanup of jumptables

ac651e7

refactor selector table generation

9ca04d4

move the function selection table into module.py so that it's easier to switch between the two selector table implementations

lint jumptable

03defa6

export jumptable info better

471643b

add bench for jumptable

8bb5734

jumptable: add fall back to exhaustive search if no magic found

8113cb7

put in min calldatasize info

49c2491

add bench for imperfect jumptable

c0b200e

wip sparse jumptable

3341193

switch based on optimization mode

1c8bd81

add notes on amortization

aae4679

wip sparse jumptable

422273b

add jumptable data for sparse mode

dead92d

add a note

c9c5181

sparse jumptable: remove global calldatasize check

5322831

can just insert calldatasize check in special case where there are trailing 0s

get pipeline through bytecode working

7d78e6b

fix type of bytecode output

a869935

add a note

aa2a04e

elide a goto

e4377ec

improve runtime header a bit

f99cd2d

get dense jumptable working (at least compiles)

134107c

wip - fix some encodings

92ad539

relocate data sections to end

4be2f3d

this is important because in EVM, data immediately before regular (valid) code can mangle the valid code.

charles-cooper requested a review from fubuloubu July 13, 2023 19:06

fix lint

de7a450

test default settings

ffeb8d6

remove an unused __init__.py file

add back in __init__.py file

283f168

pcaversaccio mentioned this pull request Jul 21, 2023

Tests for selector table stability #3530

Closed

charles-cooper added 4 commits July 21, 2023 10:39

add different default function configurations to fuzzer

35b2e80

not there, payable, different levels of nonpayable.

fix an f-string

1c9f96c

fix: logging from non-mutating default function

16b91ba

add some jumptable stats tests

af0f60b

fubuloubu approved these changes Jul 21, 2023

View reviewed changes

vyper/codegen/jumptable_utils.py Show resolved Hide resolved

charles-cooper added 2 commits July 23, 2023 12:52

update a comment

d3d8b4d

revert API change to assembly_to_evm

9cfb797

charles-cooper mentioned this pull request Jul 24, 2023

fix: pc maps for function selectors #3487

Closed

fubuloubu reviewed Jul 24, 2023

View reviewed changes

vyper/codegen/module.py Show resolved Hide resolved

charles-cooper added 4 commits July 24, 2023 17:18

clean up a comment

8095ee8

update some comments

1a46120

Merge branch 'master' into jumptables

c7c8f73

simplify linear selector table

ab1638f

charles-cooper enabled auto-merge (squash) July 25, 2023 01:38

charles-cooper merged commit 408929f into vyperlang:master Jul 25, 2023
77 of 78 checks passed

charles-cooper mentioned this pull request Sep 5, 2023

feat: add runtime code layout to initcode #3584

Merged

ZumZoom mentioned this pull request Sep 8, 2023

Constant-complexity dispatcher idea ethereum/solidity#12650

Open

antazoey reviewed Sep 8, 2023

View reviewed changes

charles-cooper mentioned this pull request Sep 8, 2023

VIP: Jump table optimization #2386

Closed

charles-cooper deleted the jumptables branch September 8, 2023 20:45

charles-cooper mentioned this pull request Feb 27, 2024

contract with only internal functions is executable #3446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: O(1) selector tables #3496

feat: O(1) selector tables #3496

charles-cooper commented Jul 13, 2023 •

edited

pcaversaccio commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023 •

edited

charles-cooper commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023

antazoey Sep 8, 2023

charles-cooper Sep 8, 2023

feat: O(1) selector tables #3496

feat: O(1) selector tables #3496

Conversation

charles-cooper commented Jul 13, 2023 • edited

What I did

How I did it

How to verify it

Commit message

Description for the changelog

Cute Animal Picture

pcaversaccio commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023 • edited

charles-cooper commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023

pcaversaccio commented Jul 21, 2023

antazoey Sep 8, 2023

Choose a reason for hiding this comment

charles-cooper Sep 8, 2023

Choose a reason for hiding this comment

charles-cooper commented Jul 13, 2023 •

edited

pcaversaccio commented Jul 21, 2023 •

edited