Various benchmarks mostly for streams of interleaved stores, in support of the blog post What has your microcode done for you lately?.
Currently it only works on Linux, but I am interested in porting it to Windows. It should be enough to run make:
make
Run with no arguments for usage info, as follows:
Must provide 1 or 3 arguments
Usage:
bench TEST_NAME
TEST_NAME is one of:
interleaved
basic interleaved stores (1 fixed 1 variable)
interleaved-pf-fixed
interleaved with fixed region prefetch
interleaved-pf-var
interleaved with variable region prefetch
interleaved-pf-both
interleaved with both region prefetch
interleaved-u2
interleaved unrolled by 2x
interleaved-u4
interleaved unrolled by 4x
interleaved-sfenceA
interleaved with 1 sfence
interleaved-sfenceB
interleaved with 1 sfence
interleaved-sfenceC
interleaved with 2 sfences
wrandom1
single region random stores
wrandom1-unroll
wrandom1 but unrolled and fast/cheaty RNG
wlinear1
linear 64B stide writes over one stream
wlinearHL
linear with lfence
wlinearHS
linear with sfence
wlinear1-sfence
linear with sfence
rlinear1
linear 64B stride reads over one region
lcg
raw LCG test
pcg
raw PCG test
At a minimum you need to provide the test name from the list above.
If you provide only the test name, default starting and stopping sizes for the region are used (4 KiB to 512 KiB):
./bench interleaved
Otherwise, you can provide your starting and stopping points as the 2nd and 3rd arguments, in KiB. The plots in the blog post all use 1 to 100,000 KiB as follows:
./bench interleaved 1 100000
Values are rounded up to the next power of two, so 100,000 becomes 131,072.
The /scripts
directory contains a bunch of .sh
scripts that I use to generate the various plots. In particular, to generate all plots it should be as simple as running the all.sh
script:
SUFFIX=new scripts/all.sh
The SUFFIX
here is appended to each plot name and indicates whether an old or new microcode version was used (the exact microcode revion is also automatically added to the plot title). The output appears in the /assets
directory.
You can also run any of the individual plots that all.sh
creates directly, e.g., scripts/rwrite-1-vs-2.sh
will generate the first plot from the post. My default these individual scripts will pop up an interactive window with the plot, but you can write to a file by setting the OUTFILE
environment variable. There are a variety of other variables you can set too, for example STOP=1000 scripts/rwrite-1-vs-2.sh
will change the stopping point to 1000 KiB which results in much faster plot generation. You can take a peek at all.sh
for some examples or other variables.
We support recording various Intel performance counters using pmu-tools' jevents library. To record events, set them in the CPU_COUNTERS
environment variable, using the short name for as shown in the supported events table:
Full Name | Short Name |
---|---|
cpu_clk_unhalted.thread_p | CYCLES |
hw_interrupts.received | INTERRUPTS |
l2_rqsts.references | L2_RQSTS.ALL |
l2_rqsts.all_rfo | L2.RFO_ALL |
l2_rqsts.rfo_miss | L2.RFO_MISS |
l2_rqsts.miss | L2.ALL_MISS |
l2_rqsts.all_pf | L2.ALL_PF |
llc.ref | LLC.REFS |
llc.miss | LLC.MISS |
mem_inst_retired.all_stores | ALL_STORES |