Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: O(1) selector tables #3496

Merged
merged 96 commits into from Jul 25, 2023
Merged

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Jul 13, 2023

What I did

implement O(1) jumptables

How I did it

two methods, hash table with probing and perfect hashing using a two-level technique.

the first method divides the selectors into buckets, uses method_id % n_buckets as a "guess" to where to enter the selector table and then jumps there and performs the familiar linear search for the selector ("probing"). to avoid too large buckets, the jumptable generator searches a range of n_buckets; the average worst case for 80-100 methods is 3 items per bucket and the worst worst case is 4 items per bucket (presumably if you get really unlucky), see _bench_sparse() in vyper/codegen/jumptable_utils.py. the average bucket size is 1.6 methods.

the second method uses a perfect hashing technique. finding a single magic which produces a perfect hash is infeasible for large N (exponential, and seems to run off a cliff around 10 methods). to "get around" this, the methods are divided into buckets of roughly size 10, and a magic is computed per bucket. several n_buckets are tried, trying to minimize n_buckets. the code size overhead of each bucket is roughly 5 bytes per bucket, which works out to ~20% per method, see _bench_dense() in vyper/codegen/jumptable_utils.py. then, the function selector is looked up in two steps - it loads the magic for the bucket given by method_id % n_buckets, and then uses the magic to compute the location of the function selector (and associated metadata) in the data section. from there it loads the function metadata, performs the calldatasize, callvalue and method id checks and jumps into the function.

there is a gas vs code size tradeoff, as can be seen in the following table:
Screenshot from 2023-07-13 15-05-41

notably, the sparse (gas optimizing) version clocks in at 69 gas in the best case (~109 gas in the "average" case), while the dense version clocks in at ~8 bytes per method.

some additions needed to be made to the assembler to handle pure data blocks.

How to verify it

all existing tests pass, also see new selector table tests in tests/parser/test_selector_table.py which should test different configurations of selector tables somewhat exhaustively

Commit message

this commit replaces the existing linear entry point search with an O(1)
implementation. there are two methods depending on whether optimizing
for code size or gas, hash table with probing and perfect hashing using
a two-level technique.

the first method divides the selectors into buckets, uses
`method_id % n_buckets` as a "guess" to where to enter the selector
table and then jumps there and performs the familiar linear search for
the selector ("probing"). to avoid too large buckets, the jumptable
generator searches a range from ~`n_buckets * 0.85` to
`n_buckets * 1.15` to minimize worst-case probe depth; the average worst
case for 80-100 methods is 3 items per bucket and the worst worst case
is 4 items per bucket (presumably if you get really unlucky), see
`_bench_sparse()` in `vyper/codegen/jumptable_utils.py`. the average
bucket size is 1.6 methods.

the second method uses a perfect hashing technique. finding a single
magic which produces a perfect hash is infeasible for large `N`
(exponential, and in practice seems to run off a cliff around 10
 methods). to "get around" this, the methods are divided into buckets of
roughly size 10, and a magic is computed per bucket. several `n_buckets`
are tried, trying to minimize `n_buckets`. the code size overhead of
each bucket is roughly 5 bytes per bucket, which works out to ~20% per
method, see `_bench_dense()` in `vyper/codegen/jumptable_utils.py`.
then, the function selector is looked up in two steps - it loads the
magic for the bucket given by `method_id % n_buckets`, and then uses the
magic to compute the location of the function selector (and associated
metadata) in the data section. from there it loads the function
metadata, performs the calldatasize, callvalue and method id checks and
jumps into the function.

there is a gas vs code size tradeoff between the two methods - roughly
speaking, the sparse method requires ~69 gas in the best case (~109 gas
in the "average" case) and 12-22 bytes of code per method, while the
dense method requires ~212 gas across the board, and ~8 bytes of code
per method.

to accomplish this implementation-wise, the jumptable info is generated
in a new helper module, `vyper/codegen/jumptable_utils.py`. some
refactoring had to be additionally done to pull the calldatasize,
callvalue and method id checks from external function generation out
into a new selector section construction step in
`vyper/codegen/module.py`.

additionally, a new IR "data" directive was added, and an associated
assembly directive. the data segments in assembly are moved to the end
of the bytecode to ensure that data bytes which happen to look like
`PUSH` instructions do not mangle valid bytecode which comes after the
data section.

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

move the function selection table into module.py so that it's easier to
switch between the two selector table implementations
this commit adds the `--optimize` flag to the vyper cli, and as an
option in vyper json. it is to be used separately from the
`--no-optimize` flag. this commit does not actually change codegen,
just adds the flag and threads it through the codebase so it is
available once we want to start differentiating between the two modes,
and sets up the test harness to test both modes.

it also makes the `optimize` and `evm-version` available as source code
pragmas, and adds an additional syntax for specifying the compiler
version (`#pragma version X.Y.Z`). if the CLI / JSON options conflict
with the source code pragmas, an exception is raised.

this commit also:
* bumps mypy - it was needed to bump to 0.940 to handle match/case, and
  discovered we could bump all the way to 0.98* without breaking
  anything
* removes evm_version from bitwise op tests - it was probably important
  when we supported pre-constantinople targets, which we don't anymore
can just insert calldatasize check in special case where there are
trailing 0s
this is important because in EVM, data immediately before regular
(valid) code can mangle the valid code.
@pcaversaccio
Copy link
Collaborator

that's right, we have to do linear search after jumping to the guessed position. but unlike hashmaps constructed at runtime, we have some control over worst-case probe depth because we can search for a good hash function to minimize probe depth - benchmarking shows that typical worst case is 3 items per bucket (and average 1.6 per bucket), so we have a (statistical) bound on the probing depth. so it's something like O(1-3), which is still O(1).

(probably if you were to benchmark with larger and larger selector tables, i'd guess the worst case bucket grows at some rate like log_10(N), but it's pretty much indistinguishable from O(1) since we are only typically dealing with selector tables up to ~100 items in practice.)

oh yeah, right, thanks for the clarifications!

remove an unused __init__.py file
@pcaversaccio
Copy link
Collaborator

pcaversaccio commented Jul 21, 2023

As per offline discussion, it might make sense to test for non-deterministic behaviour of the selector table. See a similar (not equivalent iiuc) bug in Solidity version 0.8.21:

image

Vyper does already check the call graph stability (see here: #3370)

@charles-cooper
Copy link
Member Author

As per offline discussion, it might make sense to test for non-deterministic behaviour of the selector table.

yea this makes sense, although it's not super clear to me the best way to test this, maybe it would be good for this to be addressed in another PR as this PR is already getting quite large

@pcaversaccio
Copy link
Collaborator

yea this makes sense, although it's not super clear to me the best way to test this, maybe it would be good for this to be addressed in another PR as this PR is already getting quite large

I agree, but this should be addressed before the 0.3.10 release. I opened an issue #3530 to track it.

@pcaversaccio
Copy link
Collaborator

Can't approve via GitHub, so I approve via comment :) LGTM

vyper/codegen/jumptable_utils.py Show resolved Hide resolved
@charles-cooper charles-cooper enabled auto-merge (squash) July 25, 2023 01:38
@charles-cooper charles-cooper merged commit 408929f into vyperlang:master Jul 25, 2023
77 of 78 checks passed
# we can have push_len > len(bytecode_sequence) when there is data
# (instead of code) at end of contract
# CMC 2023-07-13 maybe just strip known data segments?
push_len = min(push_len, len(bytecode_sequence))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charles-cooper Is it possible for len(bytecode_sequence) to be < push_len here and if that is the case, would that cause isses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but it shouldn't cause issues, this is just a way of handling disassembly of the data section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants