Fix GraalVM arguments in the README #57

vjovanov · 2018-01-18T17:25:42Z

No description provided.

retronym · 2018-01-23T23:26:15Z

I'm still hoping to better understand which optimizations Graal performs that makes it run scalac 30% faster than C2. (details in #35)

vjovanov · 2018-01-30T09:57:01Z

This paper could help in understanding.
This talk is also good. The paper will appear on CGO 2018: "Dominance-based Duplication Simulation (DBDS) - Code Duplication to Enable Compiler Optimizations".

retronym · 2018-01-30T23:32:58Z

Thanks for the links, Vojin.

I've run this benchmark suite with Graal and C2 and collected profiles from each with a few different tools.

C2
Graal

The forward flamegraphs (C2 / Graal) show that the Graal compiler was still quite busy when the profiler started after 100s warmup. I suppose that makes sense with Graal's more aggressive approach to inlining. We might need to give Graal longer to warm up to get a real measurement of the peak performance.

Looking at the respective reverse flamegraphs (C2 / Graal), I can immediately see less time in itable stubs, suggesting that Graal has managed to devirtualize more. We've got some known problems in the 2.12 collections related to excessive forwarder methods that exhaust C2's default MaxInlineDepth, which we've worked around manually in HashMap but are still present in LinkedHashMap.

Comparing the perfnorm runs, (C2 / Graal), I see ~2x improvement in Graal on iTLB loads and misses. Graal is generally better across the other stats. Clocks Per Instruction improves from 0.819 to 0.738. I'm not particularly good at drawing conclusions from these counters as I'm still got a lot to learn here! I'll also note that because Graal was still rather active during the benchmark (~25%, albeit on background thread(s)), these numbers are measuring the performance of scalac and Graal itself, and I need to repeat the experiment with a longer warmup to have an apples for apples comparison.

I'm hopeful that further analysis and testing will help Graal (by providing a good datapoint and by flushing out some bugs), but might also help us in the near term find a few areas in scalac that could be hand-tuned to bridges some of the performance gap.

vjovanov · 2018-01-31T12:58:30Z

Regarding the ongoing compilation:

Although compilation is running in another thread this can congest the memory bus, as well as, affect frequency scaling. I have noticed that without frequency scaling enabled, and on machines with more memory bandwidth, I get better results. This is could very well be the reason. So yes, we need to increase the warmup time. Using -XX:+PrintCompilation is a good way to see when benchmarks are warmed up--I prefer this to just looking at the performance of the last round.
In the flamegraph I can see that about 10% of the compilation time is also spent in the C1 compiler. This could mean that the benchmark is not completely warmed up even on HotSpot.
I ran the benchmark with the following flags hot -psource=scalap -wi 30 -i 5 -f1 -jvm graalvm/Contents/Home/bin/java -jvmArgs -XX:+PrintCompilation, and it seems that we have some deoptimization loops. Methods like scala.reflect.internal.tpe.TypeComparers$class::firstTry$1 get recompiled on every iteration. There is a few dozen of these that we need to investigate.

Before we get rid of compilation in the background it is hard to judge the other numbers exactly. The iTLB loads look interesting indeed.

The devirtualization results are quite interesting. I guess most of it comes from better inlining heuristics. To better understand what inlining decisions are made, which could help in manually optimizing scalac code, you can use the following flags -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining. Unfortunately, this flag prints all inlining decisions. I have opened an issue that will allow you to use the Graal logging scopes to narrow down printing to a single method. I think it would be easier to wait a bit for better printing of inlining decisions than to use the existing flags.

I would also compare include comparison with Graal core in the graphs. For Scala, it is already doing a better job than C2. Maybe it is better investing time in optimizing the scalac code for that compiler.

retronym · 2018-02-01T01:34:46Z

We have frequency scaling disabled during benchmark runs (using this script).

Our profile runs also collect the output of -prof hs_comp, which shows e.g. the total number of methods compiled after each iteration. That's what I tend to use to determine whether things are warm enough. Here's how it looks for Graal with twice the warmup (200s), vs C2

I've subscribed to oracle/graal#293 and look forward to what you discover. I know that running scalac on a single thread throws up some tricky problems for the JIT compiler, e.g if type tests and other branches that look to be constant during typer but can explore different paths when new AST nodes show up or disappear later on.

I actually went and turned the default implementations of Type.mapOver, Tree.traverse and Tree.transform into virtual calls to avoid these problems in some areas (and (to avoid O(N) cost of a pattern matching running through successive instanceOfs).

I do plan to automate benchmarking against different JDKs (e.g 8/9/HS+Graal/OpenJ9) soon, but it requires a bit of annoying plumbing work in our results database first.

Fix GraalVM arguments in the README

d09ae00

vjovanov force-pushed the fix-graal-arguments branch from 12bf885 to d09ae00 Compare January 18, 2018 17:39

retronym merged commit 367811e into scala:master Jan 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GraalVM arguments in the README #57

Fix GraalVM arguments in the README #57

Uh oh!

vjovanov commented Jan 18, 2018

Uh oh!

retronym commented Jan 23, 2018 •

edited

Loading

Uh oh!

vjovanov commented Jan 30, 2018

Uh oh!

retronym commented Jan 30, 2018 •

edited

Loading

Uh oh!

vjovanov commented Jan 31, 2018

Uh oh!

retronym commented Feb 1, 2018 •

edited

Loading

Uh oh!

Uh oh!

Fix GraalVM arguments in the README #57

Fix GraalVM arguments in the README #57

Uh oh!

Conversation

vjovanov commented Jan 18, 2018

Uh oh!

retronym commented Jan 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vjovanov commented Jan 30, 2018

Uh oh!

retronym commented Jan 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vjovanov commented Jan 31, 2018

Uh oh!

retronym commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

retronym commented Jan 23, 2018 •

edited

Loading

retronym commented Jan 30, 2018 •

edited

Loading

retronym commented Feb 1, 2018 •

edited

Loading