Update Dec 2022: sir-mirjit-base and sir-mirjit branches were merged with Dec 2022 ruby trunk.
What's the branch about
-
The branch
sir-mirjit
is used for development of specialized VM insns (specialized IR or SIR in brief), faster CRuby interpreter and MIR-based JIT (MIRJIT in brief) based on SIR -
The last branch merge point with the trunk is always the head of the branch
sir-mirjit-base
- The branch
sir-mirjit
will be merged with the trunk from time to time and correspondingly the head of the branchsir-mirjit-base
will be the last merge point with the trunk
- The branch
Specialized VM insns
- Specialized VM insns are generated dynamically in a lazy way and on VM insns BB level only. They consist of
- hybrid stack-RTL insns:
sir_{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
where suffix `s` means value on stack, `v` means local variable value, `i` means immediate value.
Some sufix combinations (like `vii`, `sii`, `sss`) are not permitted.
- type-specialized insns:
sir_i{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}{s,v}{s,v,i}{s,v,i}
sir_f{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
sir_ib{eq,neq,lt,gt,le,ge}{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
- speculatively type-specialized insns (with guards):
sir_si{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
sir_sf{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
- insns for profiling (e.g.
sir_inspect_type
orsir_inspect_fixnum
) - different specialized insns for calls (e.g.
sir_iseq_send_without_block
orsir_cfunc_send
) - attribute manipulation (e.g.
sir_send_ivget
orsir_send_ivset
) - iterators (
sir_iter_{start,body,cont}
) - different rarely executed insns to start generation of the specialized IR (e.g
sir_make_bbv
) or jitting (e.g.sir_bbv_jit_call
). - More details about specialized insns can be found at the end of file
insns.def
Project related files
- SIR generation related code can be found in file
sir_gen.c
- SIR execution related code can be found in file
sir_exec.c
andinsns.def
- MIRJIT code generation is in file
mirjit.c
- to build MIRJIT you need to build and install MIR-library from branch
bbv
of MIR project (https://github.com/vnmakarov/mir
)
- to build MIRJIT you need to build and install MIR-library from branch
SIR flow
- Normal IR execution flow:
- Start execution of BB with a stub
- Stub execution generates hybrid stack-based RTL insns and type-specialized insns with profiling insns
- Type specialized insns here are generated by basic block versioning (see article by Maxime Chevalier-Boisvert)
- Several executions of type-specialized insns results in type- and profile- specialized insns
- Type- and profile-specialized insns are a source of MIRJIT
- Exception IR execution flow:
- Switching to non-type specialized stack-based RTL
Current state of the project
- The code is made public only for introducing CRuby developers to the project
- The code is only good enough to run optcarrot and micro-benchmarks from directory
sir-bench
- To run benchmarks:
- optional (for x86-64 mirjit): build and install MIR
bbv
branch with default install prefix (/usr/local
)- if you have already installed MIR in
/usr/local
and want build ruby w/o mirjit, use option--without-mir
for rubyconfigure
- if you have already installed MIR in
- build ruby from this branch
- run microbenchmarks:
- optional (for x86-64 mirjit): build and install MIR
- To run benchmarks:
cd sir-bench
ruby compare.rb "<list ruby benchmarks from sir-bench>" base:../miniruby sir:'../miniruby --sir' yjit:'../miniruby --yjit-call-threshold=1' ...
* clone [optcarrot](https://github.com/mame/optcarrot) from github and run it:
cd sir-bench
../miniruby [--sir|--yjit-call-threshold=1|...] -v -Ilib -r<path-to-optcarrot>/tools/shim <path-to-optcarrot>/bin/optcarrot --benchmark [--opt] -f=3000 <path-to-optcarrot>/examples/Lan_Master.nes
- The status will be changed to reflect the project progress
The current performance SIR interpreter and MIRJIT
-
All measurements are done on Intel i7-9700K with 16GB memory under x86-64 Fedora Core32
-
I compared the following:
- base - the base interpreter (
miniruby
) - sir - the interpreter with specialized (
miniruby --sir
) - yjit - YJIT (
miniruby --yjit-call-threshold=1
) - mjit - MJIT (
ruby --mit-call-threshold=2
) - mir - MIR-based JIT (
miniruby --mirjit
)
- base - the base interpreter (
-
I used the following micro-benchmarks (see MJIT-benchmarks directory):
- aread - reading an instance variable through attr_reader
- aref - reading an array element
- aset - assignment to an array element
- awrite - assignment to an instance variable through attr_writer
- bench - rendering
- call - empty method calls
- complex-mandelbrot - complex mandelbrot
- const2 - reading Class::Const
- const - reading Const
- fannk - fannkuch
- fib - fibonacci
- ivread - reading an instance variable (@var)
- ivwrite - assignment to an instance variable
- mandelbrot - (non-complex) mandelbrot as CRuby v2 does not support complex numbers
- meteor - meteor puzzle
- nbody - modeling planet orbits
- nest-ntimes - nested ntimes loops (6 levels)
- nest-while - nested while loops (6 levels)
- norm - spectral norm
- pent - pentamino puzzle
- red-black - Red Black trees
- sieve - Eratosthenes sieve
- trees - binary trees
- while - while loop
-
Each benchmark ran 3 times and minimal time (or smallest maximum resident memory) was chosen
-
I also used optcarrot for more serious program performance comparison
- I used 3000 frames to run optcarrot
Microbenchmark results (Nov. 10, 2022)
- Wall time speedup:
Elapsed time:
base | sir | mjit | yjit | mir | |
---|---|---|---|---|---|
aread.rb | 1.0 | 4.35 | 2.02 | 4.62 | 6.1 |
aref.rb | 1.0 | 3.92 | 3.54 | 4.06 | 6.05 |
aset.rb | 1.0 | 3.11 | 2.22 | 2.37 | 4.76 |
awrite.rb | 1.0 | 3.51 | 1.95 | 2.16 | 4.55 |
bench.rb | 1.0 | 1.34 | 1.04 | 1.62 | 1.33 |
call.rb | 1.0 | 2.24 | 5.75 | 3.71 | 3.34 |
complex-mandelbrot.rb | 1.0 | 1.15 | 1.11 | 1.39 | 1.15 |
const2.rb | 1.0 | 2.82 | 6.34 | 3.75 | 3.02 |
const.rb | 1.0 | 2.83 | 6.34 | 3.75 | 3.02 |
fannk.rb | 1.0 | 1.18 | 0.88 | 1.0 | 1.19 |
fib.rb | 1.0 | 2.15 | 2.52 | 3.98 | 2.77 |
ivread.rb | 1.0 | 1.83 | 3.61 | 3.16 | 3.08 |
ivwrite.rb | 1.0 | 3.41 | 4.58 | 3.47 | 3.79 |
mandelbrot.rb | 1.0 | 1.56 | 1.17 | 1.6 | 1.59 |
meteor.rb | 1.0 | 1.28 | 1.04 | 1.33 | 1.25 |
nbody.rb | 1.0 | 1.47 | 1.24 | 1.51 | 1.45 |
nest-ntimes.rb | 1.0 | 2.03 | 1.29 | 1.21 | 2.4 |
nest-while.rb | 1.0 | 4.92 | 3.22 | 4.27 | 9.51 |
norm.rb | 1.0 | 1.44 | 1.56 | 1.58 | 1.69 |
pent.rb | 1.0 | 1.06 | 1.01 | 1.04 | 0.89 |
red-black.rb | 1.0 | 1.67 | 0.87 | 3.07 | 1.9 |
sieve.rb | 1.0 | 2.46 | 2.0 | 2.36 | 2.79 |
trees.rb | 1.0 | 1.26 | 1.14 | 1.77 | 1.35 |
while.rb | 1.0 | 4.27 | 4.93 | 2.78 | 5.55 |
GeoMean. | 1.0 | 2.14 | 2.05 | 2.3 | 2.55 |
- CPU time improvements is aprroximately the same except MJIT which has lower CPU time improvement
- Geomean max resident memory increase relative to the base interpreter:
base | sir | mjit | yjit | mir | |
---|---|---|---|---|---|
GeoMean. | 1.0 | 1.16 | 3.61 | 1.03 | 2.18 |
Optcarrot results
- Frame per seconds (more is better):
base | sir | yjit | mjit | mir | |
---|---|---|---|---|---|
optcarrot | 49.5 | 73.0 | 89.5 | 119.3 | 77.2 |
optcarrot --opt | 159.4 | 240.5 | 22.2 | 164.1 | 263.1 |