-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YJIT: Profile gen_push_frame overhead #540
Comments
Thanks Kokubun, this is very good data to have. One thing that is striking is how the percentage taken by JIT cycles is so small. Just 12% on lobsters and 11.1% on chunky-png. Makes me wonder where most of the time is spent and if there are obvious bottlenecks worth optimizing in C land. |
Also curious to know what the profile looks like for |
activerecord
chunky-png
erubi-rails
hexapdf
liquid-c
liquid-compile
liquid-render
lobsters
psych-load
railsbench
ruby-lsp
sequel
|
Here's the result. I'm glad "JITed cycles" is almost 100% for them, which supports its accuracy. fib
30k_methods
|
Another relevant data: cfunc_itself
|
Thanks for taking the time to do this. I agree it's good to see that JITted cycles is at 99% on the microbenchmarks. Validates that the computation is working as expected. It looks like there is a long tail of C functions for every benchmark, though there are definitely a few low hanging fruits in there still.
The overhead is maybe less than I would have thought? This does seem to indicate that we'll need to do more to speed up function calls... Or that we should be more aggressive in our pursuit of inlining next year. |
I have another revision that profiles each insn instead, which I'm going to file a separate issue for. Here's the per-insn profiling result for the same benchmark: fib
It spends a fair amount of time in |
Hmmmm that seems a bit weird to me. The code we generate for |
I would expect a bit more difference between those instructions too, but it's not a super short code either. Because it takes a couple of return values, it needs a guard for both operands. # Insn: 0028 opt_plus (stack_size: 2)
# guard arg0 fixnum
0x55a87618122f: test byte ptr [rbx - 8], 1
0x55a876181233: je 0x55a87618333e
# guard arg1 fixnum
0x55a876181239: test byte ptr [rbx], 1
0x55a87618123c: je 0x55a87618335c
0x55a876181242: mov rax, qword ptr [rbx - 8]
0x55a876181246: sub rax, 1
0x55a87618124a: add rax, qword ptr [rbx]
0x55a87618124d: jo 0x55a87618331d
# reg_temps: 00000000 -> 00000001
0x55a876181253: mov rsi, rax |
Filed: #541 |
I profiled code generated by
gen_push_frame
on headline benchmarks withperf record -F max -e cycles
and JIT interface.The following numbers show the number of sampled cycles where instructions generated by
gen_push_frame
were being executed. The percentage is from the number divided by JITed cycles.For example,
[JIT] C gen_push_frame 1.9% 101927670
means that it spent 1.9% of the time on JIT code for pushing a C frame.The % roughly represents the potential performance impact of frame outlining for each frame type. For benchmarks where
ISEQ gen_push_frame
% is larger thanC gen_push_frame
(e.g. railsbench, lobsters), outlining ISEQ frames might be more important.activerecord
chunky-png
erubi-rails
hexapdf
liquid-c
liquid-compile
liquid-render
lobsters
mail
psych-load
railsbench
ruby-lsp
sequel
The text was updated successfully, but these errors were encountered: