Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce CallEngine assigned to api.Function implementation. #761

Merged
merged 10 commits into from
Aug 24, 2022

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Aug 24, 2022

This introduces wasm.CallEngine internal type, and assign it to the api.Function
implementations. api.Function.Call now uses that CallEngine assigned to to it
to make function calls.

Internally, when creating CallEngine implementation, the compiler engine allocates
call frames and values stack. Previously, we allocate these stacks for each function calls,
which was a severe overhead as we can recognize in the benchmarks. As a result,
this reduces the memory usage (== reduces the GC jobs) as long as we reuse
the same api.Function multiple times.

As a side effect, now api.Function.Call is not goroutine-safe. So this adds the comment
about it on that method.

Benchmark result

before

### amd64 

goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/jit
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkAllocation/Call-32         	  105142	     11371 ns/op	    5000 B/op	      21 allocs/op


### arm64

goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/jit
BenchmarkAllocation/Call-10         	  200076	      5730 ns/op	    5048 B/op	      21 allocs/op

after

### amd64

goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/jit
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkAllocation/Call-32         	  168476	      8470 ns/op	     200 B/op	      11 allocs/op


### arm64

goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/jit
BenchmarkAllocation/Call-10         	  246650	      4895 ns/op	     200 B/op	      11 allocs/op

pprof

before

      flat  flat%   sum%        cum   cum%
     0.69s 52.67% 52.67%      0.69s 52.67%  runtime.madvise // <--- meaning majority is spent on allocation
     0.19s 14.50% 67.18%      0.19s 14.50%  runtime._ExternalCode
     0.16s 12.21% 79.39%      0.16s 12.21%  runtime.usleep
     0.07s  5.34% 84.73%      0.07s  5.34%  runtime.pthread_cond_wait
     0.05s  3.82% 88.55%      0.05s  3.82%  runtime.pthread_cond_signal
     0.03s  2.29% 90.84%      0.03s  2.29%  runtime.kevent
     0.03s  2.29% 93.13%      0.03s  2.29%  runtime.pthread_kill
     0.01s  0.76% 93.89%      0.01s  0.76%  github.com/tetratelabs/wazero/internal/engine/compiler.(*callEngine).execWasmFunction
     0.01s  0.76% 94.66%      0.01s  0.76%  runtime.asmcgocall
     0.01s  0.76% 95.42%      0.01s  0.76%  runtime.funcInfo.entry

after

      flat  flat%   sum%        cum   cum%
     320ms 42.67% 42.67%      320ms 42.67%  runtime._ExternalCode // <--- now native code execution is dominant
     270ms 36.00% 78.67%      270ms 36.00%  runtime.madvise
      20ms  2.67% 81.33%       80ms 10.67%  reflect.Value.call
      20ms  2.67% 84.00%       20ms  2.67%  reflect.directlyAssignable
      20ms  2.67% 86.67%       40ms  5.33%  reflect.funcLayout
      10ms  1.33% 88.00%       10ms  1.33%  github.com/tetratelabs/wazero/internal/integration_test/vs.(*wazeroRuntime).log
      10ms  1.33% 89.33%      130ms 17.33%  github.com/tetratelabs/wazero/internal/integration_test/vs.allocationCall
      10ms  1.33% 90.67%       10ms  1.33%  reflect.Value.Elem
      10ms  1.33% 92.00%       10ms  1.33%  runtime.getitab
      10ms  1.33% 93.33%       10ms  1.33%  runtime.pthread_kill

Signed-off-by: Takeshi Yoneda takeshi@tetrate.io

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
@mathetake mathetake marked this pull request as ready for review August 24, 2022 05:23
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few cleanups. Glad the perf is so much better and before our beta!

api/wasm.go Show resolved Hide resolved
internal/engine/compiler/engine.go Show resolved Hide resolved
internal/engine/compiler/engine.go Show resolved Hide resolved
internal/engine/interpreter/interpreter.go Show resolved Hide resolved
internal/integration_test/vs/bench.go Outdated Show resolved Hide resolved
internal/wasm/call_context.go Outdated Show resolved Hide resolved
internal/wasm/call_context.go Outdated Show resolved Hide resolved
internal/wasm/module.go Outdated Show resolved Hide resolved
internal/wasm/gofunc.go Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
@mathetake
Copy link
Member Author

ok now in good shape! Thanks @codefromthecrypt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants