This archive contains both the source and compiled binaries used in experiments for the paper entitled "A fast in-place interpreter for WebAssembly", paper #273 at OOPSLA 2022. It contains source checkouts of the benchmarks (PolybenchC) as well as the 6 engines tested (JavaScriptCore, SpiderMonkey, V8, wizard, wasm3, and Wasm Micro Runtime).
The only supported platform for these experiments is Linux running on a x86-64 processor, due mainly to the limitations of Wizard, the experimental engine evaluated in the paper.
You may need to install some libraries to run (definitely to build) some of the Web engines.
% sudo apt install libicu-dev python ruby bison flex cmake build-essential ninja-build git gperf
./benchmarks
- contains the source and compiled (.wasm) of all benchmarks./src
- contains the source and compiled artifacts of all engines./engines
- contains the compiled artifacts of some engines./data
- contains output data generated by the run-*.bash scripts./data-linux-4.15-i7-8700K
- contains data gathered to generate graphs in the paperrun-*.bash
- the scripts used to run engines to generate raw datasummarize-*.bash
- the scripts used to summarize raw data for presentation
The code and data in this archive was used to generate data and graphs that support the following claims:
Section 3.4
- Final paragraph: "the difference between the best (tuned) and worse (untuned) interpreter
performance is 20% to 60% across the benchmark suite"
- supported by additional runs, comparing with ENGINES="wizeng wizeng-slow"
Section 4.2
- From Figure 8
- translation time for optimizing compilers is over 1000ns/byte
- translation time for baseline compilers ranges from ~200-800ns/byte
- translation time for rewriting interpreters ranges from ~20-200ns/byte
- translation time for wizard ranges from 3-4ns/byte
Section 4.3
- From Figure 9
- translation space ratio for wamr is about ~3.7
- translation space ratio for wasm3 is about ~2.0
- translation space ratio for jsc-int is about ~1.0
- translation space ratio for v8-liftoff is about 2.5-2.6
- translation space ratio for v8-turbofan is about 2.4-2.7
- translation space ratio for wizard is about 0.3-0.4
Section 4.4
- From Figure 10
- absolute execution time of benchmarks on v8-turbofan and wasm3
- From Figure 11
- normalized execution time of wasm3 (relative to turbofan)
- below 1x for 4 shortest benchmarks, between 2x and 5x for middle 10, trending to 10x for remaining
- normalized execution time of baseline compilers
- below 1x for 4 shortest benchmarks, between 1x and 1x for middle 9, trending to 2.5x-3.x for remaining
- normalized execution time of optimizing compilers
- 1x to 2x for nearly all benchmarks
- normalized execution time of wasm3 (relative to turbofan)
- From Figure 12
- normalized execution time of all interpreters (relative to wasm3) is within 1.5x to 3.5x
- wizard performs roughly on par with wamr-classic (outliers are +/- 10%)
- wizard performs on with wamr-fast for 4 shortest benchmarks
- wamr-fast is around 1.5x slower than wasm3 on nearly all benchmarks
- wizard is on par (+/- 5%) with jsc-int for nearly all benchmarks
- The jump table for wamr improves performance by roughly 2x
- Supported by additional runs, comparing ENGINES="wamr-classic wamr-slow"
This archive contains both the source and compiled binaries used in experiments for the paper entitled "A fast in-place interpreter for WebAssembly", paper #273 at OOPSLA 2022. It contains source checkouts of the benchmarks (PolybenchC) as well as the 6 engines tested (JavaScriptCore, SpiderMonkey, V8, wizard, wasm3, and Wasm Micro Runtime).
First, apologies for the size! The first 3 engines are, to put it mildly, enormous pieces of software. The checkouts here for JavaScriptCore and SpiderMonkey include the source of the entire browser in which they are embedded. V8 contains the entire JavaScript engine and its tests, which is considerable.
Building the browser engines from source is a major exercise and could take hours of machine time. So, avoid building if you can! The JavaScript shells should run directly from the checkout, as they contain the results of building (i.e. binary JS shells). The remaining engines are simpler and easier to build, but also should not require building.
Sample data that was used to make the figures in the paper is included (in data-linux-4.15-i7-8700K).
The spreadsheet used to make the figures in the paper is included (figures.ods
).
Cut and paste the output of summarize-*.bash
into appropriate places in the spreadsheet to
regenerate them.
The scripts described in "Step-by-step instructions" below can generate all the data used to make figures in the paper.
You shouldn't need to build anything to begin generating data.
Execution time and translation time data for the 6 engines is generated by two scripts. Each has a number of configuration options that can be specified with environment variables. The default settings run experiments that can last hours, mostly due to repeating each benchmark 100 times (to get 95% confidence intervals). See "Shorter runs" below to see how to reduce the running time.
% [DATA=<dir>] [RUNS=<N>] [ENGINES=<list>] ./run-execution-experiments.bash [<benchmark>*]
% [DATA=<dir>] [RUNS=<N>] [ENGINES=<list>] ./run-translation-experiments.bash [<benchmark>*]
The raw data generated into a data directory mostly consists of numbers in text files. Two main scripts summarize the results for viewing or pasting into the spreadsheet.
% [DATA=<dir>] [ENGINES=<list>] [ERROR=1] ./summarize-execution.bash [<benchmark>*]
% [DATA=<dir>] [ENGINES=<list>] ./summarize-translation.bash [<benchmark>*]
Note that the raw data gathered on the test machine is included in this archive, so it is possible to create a summary without running any experiments.
% DATA=./data-linux-4.15-i7-8700K [ENGINES=<list>] ./summarize-execution.bash [<benchmark>*]
To create figures similar to the ones in the paper, use the figures.ods
spreadsheet.
The scripts below generate a tab-separated output.
The tabs are important! Don't cut and paste from a terminal window.
- Output of the
./summarize-translation.bash
script can be pasted into theTranslation
sheet and the spreadsheet should update, making Figures 8 and 9. - Output of the
ERRORS=1 ./summarize-execution.bash
script can be pasted into theExecution
sheet and the spreadsheet should update, making Figures 10 and 11. - Output of the
./summarize-scatter.bash
script can be pasted into theScatter
sheet and the spreadsheet should update, making the scatter plots in Figures 1 and 2.
In each of these sheets, the exact cell to paste the output data is indicated in red.
A typical run of the execution time experiments will produce output like so:
% RUNS=5 ./run-execution-experiments.bash
---- bicg -----------
sm-base 0.015246 0.015196 0.014956 0.015280 0.014996 min=0.014956 avg=0.015135 stddev=0.000000
sm-opt 0.017844 0.019413 0.018549 0.018081 0.017999 min=0.017844 avg=0.018377 stddev=0.000000
v8-liftoff 0.009751 0.010067 0.011369 0.010390 0.009777 min=0.009751 avg=0.010271 stddev=0.000000
v8-turbofan 0.014428 0.015298 0.015406 0.015704 0.015669 min=0.014428 avg=0.015301 stddev=0.000000
jsc-int 0.015033 0.014778 0.015047 0.014803 0.014677 min=0.014677 avg=0.014868 stddev=0.000000
jsc-bbq 0.011690 0.011802 0.011529 0.011716 0.011849 min=0.011529 avg=0.011717 stddev=0.000000
jsc-omg 0.028569 0.026394 0.026610 0.026770 0.027004 min=0.026394 avg=0.027069 stddev=0.000000
wizard 0.012073 0.011955 0.011904 0.011948 0.011822 min=0.011822 avg=0.011940 stddev=0.000000
wasm3 0.007645 0.007467 0.007426 0.007386 0.007413 min=0.007386 avg=0.007467 stddev=0.000000
wamr-slow 0.022461 0.022326 0.022461 0.022303 0.022358 min=0.022303 avg=0.022382 stddev=0.000000
wamr-classic 0.017074 0.018284 0.016912 0.016746 0.017098 min=0.016746 avg=0.017223 stddev=0.000000
wamr-fast 0.012073 0.012204 0.012012 0.012200 0.012087 min=0.012012 avg=0.012115 stddev=0.000000
---- mvt -----------
sm-base 0.015417 0.015232 0.014920 0.015104 0.015092 min=0.014920 avg=0.015153 stddev=0.000000
sm-opt 0.017868 0.017908 ...
It will produce files in the data/ directory like so:
% ls data/execution.bicg.*
data/execution.bicg.jsc-bbq data/execution.bicg.js-int data/execution.bicg.v8-liftoff data/execution.bicg.wamr-fast data/execution.bicg.wizard
data/execution.bicg.jsc-int data/execution.bicg.sm-base data/execution.bicg.v8-turbofan data/execution.bicg.wamr-slow
data/execution.bicg.jsc-omg data/execution.bicg.sm-opt data/execution.bicg.wamr-classic data/execution.bicg.wasm3
A typical run of the translation time experiments will produce output like so:
% RUNS=5 ./run-translation-experiments.bash
---- bicg -----------
sm-base
us=.041685 bytes=2.345500 count=38
us=.036911 bytes=2.345500 count=38
us=.039786 bytes=2.345500 count=38
us=.039786 bytes=2.345500 count=38
us=.039786 bytes=2.345500 count=38
sm-opt
us=.989881 bytes=1.565742 count=38
us=1.122287 bytes=1.832802 count=38
us=1.410752 bytes=2.301797 count=38
us=.885304 bytes=1.636163 count=38
us=.724369 bytes=1.565018 count=38
v8-liftoff
us=.154576 bytes=2.957235 count=76
us=.125776 bytes=2.957235 count=76
us=.264695 bytes=2.957235 count=76
us=.140202 bytes=2.957235 count=76
us=.197135 bytes=2.957235 count=76
v8-turbofan
us=1.768827 bytes=2.582678 count=75
us=1.864880 bytes=2.582678 count=75
us=1.791467 bytes=2.582678 count=76
us=1.632527 bytes=2.582678 count=76
us=1.801221 bytes=2.582678 count=76
jsc-int
us=.105498 bytes=1.040967 count=38
us=.138421 bytes=1.192343 count=38
us=.154968 bytes=1.064465 count=38
us=.149442 bytes=1.040967 count=38
us=.151342 bytes=1.040967 count=38
jsc-bbq
us=1.012372 bytes=0 count=38
us=1.085764 bytes=0 count=38
us=1.061444 bytes=0 count=38
...
It will produce files in the data/
directory like so:
% ls data/translation.bicg.*
data/translation.bicg.jsc-bbq.bytes data/translation.bicg.sm-base.bytes data/translation.bicg.v8-turbofan.bytes data/translation.bicg.wasm3.bytes
data/translation.bicg.jsc-bbq.us data/translation.bicg.sm-base.us data/translation.bicg.v8-turbofan.us data/translation.bicg.wasm3.us
data/translation.bicg.jsc-int.bytes data/translation.bicg.sm-opt.bytes data/translation.bicg.wamr.bytes data/translation.bicg.wizard.bytes
data/translation.bicg.jsc-int.us data/translation.bicg.sm-opt.us data/translation.bicg.wamr-fast.bytes data/translation.bicg.wizard.us
data/translation.bicg.jsc-omg.bytes data/translation.bicg.v8-liftoff.bytes data/translation.bicg.wamr-fast.us
data/translation.bicg.jsc-omg.us data/translation.bicg.v8-liftoff.us data/translation.bicg.wamr.us
It takes approximately 2-3 hours to generate the data for the execution and translation time experiments
with RUNS=100
.
Reducing the number of runs reduces the amount of time proportionally, so that with
RUNS=5
, total running time should be less than 20 minutes.
To reduce the workload, reduce either the number of runs or select a subset of the benchmarks or engines. Generally, the script will overwrite data from previous runs of the same benchmark, so using a partial data directory is recommended.
% mkdir -p partial
% DATA=partial RUNS=5 ./run-execution-experiments.bash
% DATA=partial RUNS=5 ./run-execution-experiments.bash bicg
% DATA=partial RUNS=5 ENGINES="wizard wamr-fast" ./run-execution-experiments.bash bicg