Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of JIT engine for amd64 target #60

Merged
merged 93 commits into from Dec 9, 2021
Merged

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Nov 26, 2021

This commit adds the initial implementation of Just-In-Time compilation engine. The implementation is based on my experiments in https://github.com/mathetake/go-jit-exp, and aims to avoid the massive PR by introducing the minimal JIT engine like only being able to execute fibonacci func, or only subset of instructions are supported.

Notably, this commit adds jit package which implements the JIT engine for WebAssembly purely written in Go.
Here are background, technical difficulties and some of the design choices (the same contents from wasm/jit/README.md)

General limitations on pure Go JIT engines

In Go program, each Goroutine manages its own stack, and each item on Goroutine stack is managed by Go runtime for garbage collection, etc.

These impose some difficulties on JIT engine purely written in Go because we cannot use native push/pop instructions to save/restore temporaly variables spilling from registers. This results in making it impossible for us to invoke Go functions from JITed native codes with the native call instruction since it involves stack manipulations.

TODO: maybe it is possible to hack the runtime to make it possible to achieve function calls with call.

How to generate native codes

Currently we rely on twitchyliquid64/golang-asm to assemble native codes. The library is just a copy of Go official compiler's assembler with modified import paths. So once we reach some maturity, we could implement our own assembler to reduce the unnecessary dependency as being less dependency is one of our primary goal in this project.

The assembled native codes are represented as []byte and the slice region is marked as executable via mmap system call.

How to enter native codes

Assuming that we have a native code as []byte, it is straightforward to enter the native code region via
Go assembly code. In this package, we have the function without body called jitcall

func jitcall(codeSegment, engine, memory uintptr)

where we pass codeSegment uintptr as a first argument. This pointer is pointing to the first instruction to be executed. The pointer can be easily derived from []byte via unsafe.Pointer:

code := []byte{}
/* ...Compilation ...*/
codeSegment := uintptr(unsafe.Pointer(&code[0]))
jitcall(codeSegment, ...)

And jitcall is actually implemented in jit_amd64.s as a convenience layer to comply with the Go's official calling convention and we delegate the task to jump into the code segment to the Go assembler code.

How to achieve function calls

Given that we cannot use call instruction at all in native code, here's how we achieve the function calls back and forth among Go and (JITed) Wasm native functions.

The general principle is that all the function calls consists of 1) emitting instruction to record the continuation program counter to engine.continuationAddressOffset 2) emitting return instruction.

For example, the following Wasm code

0x3: call 1
0x5: i64.const 100

will be compiled as

mov [engine.functionCallIndex] $1 ;; Set the index of call target function to functionCallIndex field of engine.
mov [engine.continuationAddressOffset] $0x05 ;; Set the continuation address to continuationAddressOffset field of engine.
return ;; Return from the function.
mov ... $100 ;; This is the beginning of program *after* function return.

This way, the engine, which enters the native code via jitcall, can know the continuation address of the caller's function frame:

case jitStatusCallWasmFunction:
    nextFunc := e.compiledWasmFunctions[e.functionCallIndex]
    // Calculate the continuation address so
    // we can resume this caller function frame.
    currentFrame.continuationAddress = currentFrame.f.codeInitialAddress + e.continuationAddressOffset
    currentFrame.continuationStackPointer = e.currentStackPointer + nextFunc.outputNum - nextFunc.inputNum
    currentFrame.baseStackPointer = e.currentBaseStackPointer

and calling into another function in JIT engine's main loop:

    // Create the callee frame.
    frame := &callFrame{
        continuationAddress: nextFunc.codeInitialAddress,
        f:                   nextFunc,
        // Set the caller frame so we can return back to the current frame!
        caller: currentFrame,
        // Set the base pointer to the beginning of the function inputs
        baseStackPointer: e.currentBaseStackPointer + e.currentStackPointer - nextFunc.inputNum,
    }

After finished executing the callee code, we return back to the caller's code with the specified return address:

case jitStatusReturned:
    // Meaning that the current frame exits
    // so we just get back to the caller's frame.
    callerFrame := currentFrame.caller
    e.callFrameStack = callerFrame
    e.currentBaseStackPointer = callerFrame.baseStackPointer
    e.currentStackPointer = callerFrame.continuationStackPointer

To summarize, every function call is achieved by returning back to Go code (engine.exec's main loop) with some continuation info, and enter the callee native code (or host functions) from there. That, of course, comes with a bit of overhead because each function call is implemented by two steps (returning back to jitcall callsite AND entering jitcall again) vs just call instruction (or jmp) in usual native codes.

Note that this mechanism is a minimal PoC impl, so in the near future, we would achieve the function calls without returning back to engine.exec's main loop and instead jmp directly to the callee native code.

Supported instructions

Supported instructions are

  • call
  • if
  • br_if
  • i64.const
  • i64.sub
  • i64.le_u
  • loca.get

but they are enough to prove that it is actually feasible to implement the complete JIT engine purely in Go!

Example code

Here's the Fibonacci function in wat

(module
	(func $fib (export "fib") (param i64) (result i64)
	  (if (result i64) (i64.le_u (local.get 0) (i64.const 1))
		(then (i64.const 1))
		(else
		  (i64.add
			(call $fib (i64.sub (local.get 0) (i64.const 2)))
			(call $fib (i64.sub (local.get 0) (i64.const 1)))
		  )
		)
	  )
	)
)

This function will be compiled to wazeroir as follows:

.entrypoint:
	pick 0
	i64.const 1
	u64.le
	br_if .L2, .L2_else
.L2:
	i64.const 1
	br .L2_cont
.L2_else:
	pick 0
	i64.const 2
	i64.sub
	call 0
	pick 1
	i64.const 1
	i64.sub
	call 0
	i64.add
	br .L2_cont
.L2_cont:
	drop 1..1
	br .return

Then this is compiled as the following native code:

;;;; ".entrypoint" label ;;;;
;; Initialize the reserved registers. These lines are inserted for all functions.
;; Setting up r14 register from r12 (already set by Go assembler code).
0x0000000000000000:  4D 8B 34 24                      mov    r14, qword ptr [r12]
0x0000000000000004:  49 8B 44 24 20                   mov    rax, qword ptr [r12 + 0x20]
0x0000000000000009:  48 C1 E0 03                      shl    rax, 3
0x000000000000000d:  49 01 C6                         add    r14, rax
;; "pick 0"
0x0000000000000010:  49 8B 06                         mov    rax, qword ptr [r14]
;; "i64.const 1"
0x0000000000000013:  48 C7 C1 01 00 00 00             mov    rcx, 1
;; "u64.le"
0x000000000000001a:  48 39 C8                         cmp    rax, rcx
;; "br_if"
0x000000000000001d:  76 03                            jbe    0x22  ;; To .L2 label
0x000000000000001f:  90                               nop    
0x0000000000000020:  EB 10                            jmp    0x32 ;; To .L2_else label

;;;; ".L2" label ;;;;
;; "i64.const 1"
0x0000000000000022:  48 C7 C0 01 00 00 00             mov    rax, 1
;; The beginning of "br .L2_cont". 
;; As we branch to ".L2_cont" label, we must push values on the registers
;; back to stack. In this case, we need to store the rax register holding value from `i64.const 1`
;; to the stack [r14 + 8] where r14 is the base pointer.
0x0000000000000029:  49 89 46 08                      mov    qword ptr [r14 + 8], rax
;; Now branch into .L2_cont label which starts with 0xdb
0x000000000000002d:  E9 A9 00 00 00                   jmp    0xdb

;;;; ".L2_else" label ;;;;
;; "pick 0"
0x0000000000000032:  49 8B 06                         mov    rax, qword ptr [r14]
;; "i64.const 2"
0x0000000000000035:  48 C7 C1 02 00 00 00             mov    rcx, 2
;; "i64.sub"
0x000000000000003c:  48 29 C8                         sub    rax, rcx
;; beginning of "call 0". 
;; First we set the jit status 1 to engine.jitCallStatusCode.
0x000000000000003f:  41 C7 44 24 28 01 00 00 00       mov    dword ptr [r12 + 0x28], 1
;; We set 0 from "call 0", the call target function index, to engine.functionCallIndex
0x0000000000000048:  41 C7 44 24 30 00 00 00 00       mov    dword ptr [r12 + 0x30], 0
;; Then as the registers are caller-save, so we write all the values on the register back to stack.
;; This case the value in rax (at stack pointer 1) must go back to stack in memory.
0x0000000000000051:  49 89 46 08                      mov    qword ptr [r14 + 8], rax
;; Then we set the continuation address offset (0x6e, calculated by compiler) 
;; to the engine.continuationAddressOffset
0x0000000000000055:  48 B8 6E 00 00 00 00 00 00 00    movabs rax, 0x6e
0x000000000000005f:  49 89 44 24 38                   mov    qword ptr [r12 + 0x38], rax
;; Write the current stack pointer (2) back to the engine.currentStackPointer.
0x0000000000000064:  49 C7 44 24 18 02 00 00 00       mov    qword ptr [r12 + 0x18], 2
;; Finally return from this function as a function calls are handled in the Go world.
0x000000000000006d:  C3                               ret    

;; This is where we resume this function after function call.
;; Note that the address is 0x6e, which is set by movabs above.
;; After the function returns, we initialize the reserved registers just like the .entrpoint.
0x000000000000006e:  4D 8B 34 24                      mov    r14, qword ptr [r12]
0x0000000000000072:  49 8B 44 24 20                   mov    rax, qword ptr [r12 + 0x20]
0x0000000000000077:  48 C1 E0 03                      shl    rax, 3
0x000000000000007b:  49 01 C6                         add    r14, rax
;; "pick 1". We know that the value at "depth 1" will be where the stack pointer = 0.
0x000000000000007e:  49 8B 06                         mov    rax, qword ptr [r14]
;; "i64.const 1"
0x0000000000000081:  48 C7 C1 01 00 00 00             mov    rcx, 1
;; i64.sub
0x0000000000000088:  48 29 C8                         sub    rax, rcx
;; beginning of "call 0" just like above.
0x000000000000008b:  41 C7 44 24 28 01 00 00 00       mov    dword ptr [r12 + 0x28], 1
0x0000000000000094:  41 C7 44 24 30 00 00 00 00       mov    dword ptr [r12 + 0x30], 0
0x000000000000009d:  49 89 46 10                      mov    qword ptr [r14 + 0x10], rax
0x00000000000000a1:  48 B8 BA 00 00 00 00 00 00 00    movabs rax, 0xba
0x00000000000000ab:  49 89 44 24 38                   mov    qword ptr [r12 + 0x38], rax
0x00000000000000b0:  49 C7 44 24 18 03 00 00 00       mov    qword ptr [r12 + 0x18], 3
0x00000000000000b9:  C3                               ret    
;; This is where we resume this function after function call.
;; Note that the address is 0xba, which is set by movabs above.
;; After the function returns, we initialize the reserved registers just like the .entrpoint.
0x00000000000000ba:  4D 8B 34 24                      mov    r14, qword ptr [r12]
0x00000000000000be:  49 8B 44 24 20                   mov    rax, qword ptr [r12 + 0x20]
0x00000000000000c3:  48 C1 E0 03                      shl    rax, 3
0x00000000000000c7:  49 01 C6                         add    r14, rax
;; "i64.add". Retrieve the values from the stack. fib(n-1), f(n-2) are stored in rax and rcx
0x00000000000000ca:  49 8B 46 10                      mov    rax, qword ptr [r14 + 0x10]
0x00000000000000ce:  49 8B 4E 08                      mov    rcx, qword ptr [r14 + 8]
;; Then add these two values.
0x00000000000000d2:  48 01 C1                         add    rcx, rax
;; We now have the value of fib(n-1)+f(n-2) in rcx register. 
;; But before entering .L2_const label, we must write it back to the stack.
0x00000000000000d5:  49 89 4E 08                      mov    qword ptr [r14 + 8], rcx
;; Now jump to .L2_cont label.
0x00000000000000d9:  EB 00                            jmp    0xdb

;;;; ".L2_cont" label ;;;;
;; "Drop 1..1"
0x00000000000000db:  49 8B 46 08                      mov    rax, qword ptr [r14 + 8]
0x00000000000000df:  49 89 06                         mov    qword ptr [r14], rax
;; br .return
0x00000000000000e2:  41 C7 44 24 28 00 00 00 00       mov    dword ptr [r12 + 0x28], 0
0x00000000000000eb:  49 C7 44 24 18 01 00 00 00       mov    qword ptr [r12 + 0x18], 1
0x00000000000000f4:  C3                               ret    

@mathetake mathetake changed the title Initial implementation of JIT engine Initial implementation of JIT engine for amd64 target Nov 26, 2021
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Copy link
Collaborator

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well documented.. unleashing some more comments before proceeding to unblock you a bit

wasm/jit/engine_test.go Outdated Show resolved Hide resolved
wasm/jit/engine_test.go Show resolved Hide resolved
wasm/jit/engine_test.go Outdated Show resolved Hide resolved
wasm/jit/engine_test.go Outdated Show resolved Hide resolved
wasm/jit/engine_test.go Show resolved Hide resolved
wasm/jit/jit_amd64.go Show resolved Hide resolved
wasm/jit/jit_amd64.go Outdated Show resolved Hide resolved
wasm/jit/jit_amd64.go Outdated Show resolved Hide resolved
wasm/jit/jit_amd64.go Outdated Show resolved Hide resolved
wasm/jit/jit_amd64.go Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
@mathetake
Copy link
Member Author

hope addressed all comments 😄

Copy link
Collaborator

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK DONE!

I can't say I made meaningful comments about the assembly translation, but I did look over your tests that they are there and if there was a problem it wouldn't be shotguns in the dark.

This is an epic progress, so happy for you to merge when you feel ready!

wasm/jit/jit_amd64.go Show resolved Hide resolved
wasm/jit/jit_amd64.go Outdated Show resolved Hide resolved
wasm/jit/jit_amd64.go Outdated Show resolved Hide resolved
wasm/jit/jit_amd64.go Show resolved Hide resolved
wasm/jit/jit_amd64.go Show resolved Hide resolved

import "syscall"

const mmapFlags = syscall.MAP_ANONYMOUS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment why or rename the constant to mmapFlagAnonymous

wasm/naivevm/vm.go Show resolved Hide resolved
wasm/wazeroir/compiler.go Outdated Show resolved Hide resolved
wasm/wazeroir/interpreter.go Outdated Show resolved Hide resolved
wasm/wazeroir/operations.go Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
@mathetake
Copy link
Member Author

so for mmap stuff, I will do it in a follow-up PR as I have to understand the system call more 😄

@mathetake
Copy link
Member Author

merging!!!!!!!!!!!

@mathetake mathetake merged commit 297f9db into main Dec 9, 2021
@mathetake mathetake deleted the init-jit branch December 9, 2021 07:40
@mathetake
Copy link
Member Author

Here's the tracking issue for amd64 JIT: #65

mathetake added a commit that referenced this pull request Jan 20, 2022
This commit completes the baseline (singlepass) JIT WebAssembly compiler
for amd64 target with the implementation for br_table instruction.

The implementation passes 100% of WebAssembly specification test,
just like our interpreter. This is the world's first JIT compilation engine purely
written in Go, and the implementation is stable under multple goroutines, and
never broken by Go runtime. The JIT engine is tested both for Linux and Darwin.

Even though this passes the spectests and is implemented as "JIT", our calling convention
proxies all function calls though non-native Go world, therefore there's some performance
overhead. This would be fixed in a following commit which allows us to make function calls
in the fully native way. See #60 and RATIONALE.md for design details.

resolves #65 #42

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants