-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More, more competitors (lightweightedness is questionable of course) #2
Comments
Hi, its great that this project is trying to create a compact JIT. |
My only suggestion is: please avoid global state. Having global state makes it impossible to use the library easily. |
Thank you for pointing this. I know nanojit project for long time. I even did some its benchmarking. It does not fit to my goals (light weight jit compiler for CRuby which recently already got GCC/LLVM based JIT). As I wrote I need at least 70% performance of code generated by GCC with -O2. I've just repeated sieve benchmark of dmr_c with nanojit backend. The generated code is about 3.5 times slower than code generated with GCC -O2. It is even 15% slower than one generated by GCC with -O0. That is also why I did not include GNU lighting project too. Opposite to my project, nanojit is a solid project supporting a few architectures and probably you are right I should add it to other JIT candidates in README.md. I am going to do this when I have spare time. As for C compiler, dmr_c is based on Sparse, my C compiler is not based on any existing project like tcc, 8cc, 9cc, etc. It is written completely from scratch. Still big work is needed to finish it. But I can say that its code will be at least 5 times less than sparse. Btw currently I am working on LLVM IR to MIR translator. I guess the initial version will be published in Sep -Oct. |
Thank you for your kind words. |
Yes, thank you. I keep it in my mind. The current project is not suitable for anything right now. I am focused to try it first as a JIT for MRuby w/o parallel compilation. For CRuby I will need to make it suitable for multi-threaded code because CRuby JIT engine requires to compile the code in parallel with bytecode interpretation and other compilations. MIR project will be developed in parallel with its usage as a JIT for MRuby/CRuby. And it will be only ready when MRuby or/and CRuby JITs are ready and the JITs are proven to have a specific performance. |
NanoJIT was designed to be a trace compiler, and that is how it is used in Flash. It is fine for a sequence of non-branching code, but if there are branches then the register allocation cannot cope with it. I'd be interested in the program you used for testing; I can try that out myself.
Do the benefits outweigh the cost of maintaining your own? 8cc is very small too. Regards |
It can be hard to remove global state later on as the code becomes larger.
I too started the various JIT projects as I wanted a JIT backend for Lua. Here's my experience for what its worth:
I think you may want to see whether the LLVM or GCC backends for Ruby are any good. As far as I know, they are not. In Lua I can get 20x improvement with my backend, provided type annotations are used. If you can get 2x improvement in Ruby you will be lucky I think! Regards |
Btw I see that you have a lot of experience writing compilers (gcc) - so perhaps you can crack this! I hope so certainly because a nice compact JIT written in C that can generate optimized code would be fantastic. In my opinion though, a new small high performance scripting language competing with LuaJIT would be better than than trying to speed up Ruby! |
There is already LIBGCCJIT written by my colleague David Malcolm Unfortunately, it is hard to implement inlining because it is done too early. As inlining is the most important optimization of JIT, libgccjit did not fit for my goals. I used another approach based on C-code generation and pre-compiled headers to implement MJIT in CRuby. This approach has practically the same compilation speed as LIBGCCJIT and permits to implement inlining at least on path Ruby code -> Ruby code:
Ruby is actively used in openshift/openstack which is a strategic area for RedHat. So it defines my language choice (although I'd like to design JIT compiler which could be used for other languages too). But MRuby could satisfy your criteria. Mike Pall worked roughly 10 years on LuaJIT, did amazing job, and achieved quite a lot. To compete with LuaJIT, I probably would need 10 years too. |
Thanks for sharing your experience.
Yes, implementing specialization/deoptimization is not trivial work but it is much less than writing a good JIT compiler.
OMR JIT is still too complex for my goals besides IBM already implemented CRuby with OMR JIT. The results are not so good. I suspect they did not implemented specialization and compilation in parallel with Ruby execution. There is a comparison of different CRuby JIT implementations in https://developers.redhat.com/blog/2018/03/22/ruby-3x3-performance-goal/
Most Ruby programs are io bound and JIT can not help. But JIT could extend Ruby usage into CPU bound program area. For some programs, the current CRuby JIT (MJIT) can improve code close to 3 times. The problem is that MJIT makes slower the most widely used Ruby application Ruby on Rails until a lot of methods is compiled. That is a reason to implement tired compilation and one reason for MIR project. If you are interesting, more details can be found on https://www.slideshare.net/VladimirMakarov13/the-lightweightjitcompilerprojectforc-ruby-141836482 |
#define SieveSize 8190 for (iter = 0; iter < 100000; iter++) { I used drm_c with nanojit for this function. Most time is spent in function code execution. On my computer nanojit uses 7.18 CPU sec, code generated by GCC -O2 uses 2.30s, and code generated by GCC -O0 takes 6.26s |
I used LIBGCCJIT in my Lua project - I found its compilation is very slow.
I had a look at the article. Firstly I am guessing that you haven't implemented all the functionality of Ruby? For example, can it 100% interoperate with existing Ruby libraries and code? In my experience dynamic languages tend to have features that are very JIT unfriendly. For example, in Lua, a C library can manipulate the Lua stack. A JIT has to deal with such situations. I am not familiar with Ruby but I have read that it is very hard to optimize Ruby.
Well, not if you created a new language specifically designed to be JITed efficiently. Lua has many constructs that are bad for JITing. But a language could be designed that avoids such features and therefore allows efficient JITing. I am skeptical about Ruby efforts. I guess until you have a 100% compatible Ruby implementation that achieves 2 or 3x improvement, it is impossible to say anything. And that could take years to implement too. Using a slow compiler like GCC or LLVM is simply not an option in the Lua world because of LuaJIT's speed. That is why I find your project very interesting as its compact size and hopefully speed of compilation would be ideal for Lua. But then we don't know how well the generated code will behave. In Lua, the stack is a heap allocated structure and the optimzer needs to be able to figure out when values in the stack are temporary and do not need to be stored/accessed from the heap. LLVM and GCC can do this but at a huge cost. |
LLVM and GCC can be used mostly as tier 2 JIT compilers.
No I did not impelement all functionality but I was pretty close. The approach and the code was adopted by Ruby community. Takashi Kokubun adopted the code for original CRuby VM insns and implemented full functionality. Now LLVM/GCC based JIT is a part of CRuby.
It is the same for Ruby. Ruby is very dynamic, practically everything can change during execution.
As wrote JIT is now a part of the 2 last CRuby releases. The current JIT does not use register insns as I proposed and do not implement speculated code and inlining. Still it achieves 2 times faster code on most widely used Ruby benchmark optcarrot.
The same problem exists in CRuby. I did this oprimization on code generated from VM insns. Most ruby local variable was translated into C function local vars which were translated into hardware registers by GCC/LLVM. If the speculation were wrong, code saving C local variables in (heap/stack) memory was executed, and the execution of Ruby code continued in the interpreter. For Ruby global/object/class variables more sophisticated (escape) analysis is needed. |
I did some tests with your benchmark. Timings
Test programsI will add the results using LLVM JIT when I get some time. Note on Ravi perf: I think that the performance is degraded by the inner for loop which has a variable increment (prime) - currently my backend optimizes when the increment is known positive integer, but in this case it falls back to a generic for loop. I suspect that if I optimized this case resulting JIT code will perform close to the dmr_c with omrjit backend. So, I am interested in Ruby results, with and without JIT. |
I also did some measurements on my machine. Fortunately, the sieve can be compiled by c2mir. Here are the results:
So c2mir + MIR-generator achieves 65% of GCC -O2. According to your measurements, it is close to OMRJIT which achieves 70%. But I should say I am doing stupid generation right now, because I am focused to make c2mir just to work. As for Ruby JIT, I have no time now to build and benchmark my old code. But you can find sieve data on https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#microbenchmark-results Basically, seive is sped up 2 times by Ruby JIT I worked on. |
Hi, I had a look at those benchmarks. They are all relative figures aren't they? So it does not give me a feel of how Ruby performs compared to above. But not important ... I will check this out myself. |
Yes, they are relative figures. I am sure Ruby even with JIT will be much slower than RAVI because it is more dynamic language where everything can be changed during execution. Just a simple example, all integer arithmetic requires to check overflow. If there is an overflow, value becomes a multi-precision value. You can define any operation, for example change integer + onto integer - :). + can be defined for any values. There are also different representations of arrays and objects, etc. So there are a lot checks even if you generate a speculative code. Ruby has no type annotations. It might be changed in the future as one goal of Ruby 3 is to have some type system. |
The integer stuff sounds horrible. I watched your talk about it. I guess they were trying to avoid creating an object like Python does. |
@vnmakarov What do you think about the V8 TurboFan's CodeAssembler ? |
I never worked on or benchmarked Turbofan and I have a limited knowledge of it (mostly from articles and presentations). But here is my opinion of this project. Turbofan has different goals and it is in a different category than MIR. It is not a light-weight JIT compiler project. It requires more resources. Turbofan tries to squeeze performance as much as possible. It has longer optimization pipeline. It is a very mature project developed by very smart people, a lot of experience there. MIR goal is to make it as simple as possible and still generate a decent code. I guess Turbofan can generate 30-40% faster code than MIR. MIR will be simpler to port than Turbofan (although Turbofan has already major target ports). Turbofan with CodeAssembler is closer to Oracle Graal project than to MIR. They use the same IR (sea of nodes) for optimizations and have an interface to add new languages. MIR is more flexible and streamable IR. I hope to use it as an interface between different language processors in the future. To simplify the project MIR is designed to be simultaneously an interface IR, IR for optimization and IR for the interpretation. There are other less important design solutions to simplify MIR project. Actually LLVM IR could be used for the same purposes but it is too complicated and a bit unstable (Chris Lattner's team in Google works on its extension which could make it even more complicated). LLVM IR is very bad for interpretation as it has SSA phi nodes and this make it interpretation 100 times more slower than generated code (MIR interpretation is only 6-10 times slower than MIR generated code). Turbofan and LLVM are written on C++. I don't like this. C++ usage can be very easily abused. For example, SLOCs for GCC and Clang/LLVM are pretty close but LLVM binary code for one target is about 3 times more. GCC was originally written on C, although it was moved to C++ a few years ago. Still its code is mostly C. |
@vnmakarov QBE seems closest to your goals; did you test it against MIR? |
Yes, I played with it. It is a very interesting code written by a talented guy Quentin Carbonneaux as I understand when he did a PhD in Yale. QBE has a good set of optimizations, some of which are absent in MIR-generator (like alias analysis and simple loop analysis for better RA spill heuristics). It can be considered as mini-LLVM. It has a simplified version of LLVM IR. Why I decided not to take it and work on it:
When I played with QBE on sieve I got the impression that it has practically the same generated code (may be even better) quality as MIR-generator but its compilation speed was about 5 times slower (I used valgrind --tool=lackey). Although for QBE it included parsing IR representation and output of assembler code. With assembler to binary transformation, QBE compilation was about 30 times slow (yes as is the bottleneck). I believe it would be easier for me to write what I need than to adapt QBE to my purposes. |
That's not true, phi nodes are optional in LLVM. My C front-end generates alloca only. LLVM IR is very well designed IMO - it is strongly typed with extensive type checks, makes it hard to write wrong code. It looks like you haven't used LLVM ;-) |
Have you seen https://github.com/michaelforney/cproc. It is C11 front-end to QBE. |
He seems to be maintaining and enhancing it over a few years ... so not sure that one can assume this. |
I sure like that you made that choice because MIR is something I have been looking for for 4 years now. I also hope this is not going to fizzle out as so many projects do. I can't wait to try it out. |
I've been using this for some time. In a different way than you. I am writing LLVM IR to MIR translator. I can use reg2mem pass to remove phi-nodes. In this case I have to implement kind of LLVM mem2reg myself to generate a code with registers. Or I can remove phi- nodes during the translation to MIR. Both approaches are inconvenient. So LLVM IR is not so good for this kind of work. You use alloca generation to avoid dealing with phi-nodes and this is convenient for you because LLVM takes care about generating efficient code after that. So for your task LLVM IR is good. At some point in compiler pass, you need to get off SSA. But LLVM IR can not represent non-SSA code. LLVM IR with alloca without phi-nodes is also SSA. This form of LLVM IR is verbose (a lot of loads and stores which complicates the code, make it big and it less readable). So LLVM had to use machine IR (another MIR) when SSA code can not represent adequately code at further points of compiler pipeline. That is why I wrote that SSA IR as interface language is not a good idea. But besides SSA, I don't like LLVM IR, specifically syntax of its textual representation. People have different tastes in languages. That why I wrote "personally" I don't like it. |
What is the reason for the LLVM IR to MIR interface? It seems a large piece of work, and is it not better to complete the C front-end and MIR first? To be honest, I can't see why anyone would want to generate LLVM IR and then use MIR... Regards |
There are several reasons for this work:
|
I can't see why anyone would generate C++ code, and then use LLVM as well as MIR.
However then the experiment is not going to be valid, as you will be relying on LLVM doing all the optimization.
As above, makes the whole experiment pointless IMO.
I don't know about Ruby but in my case I want to get rid of LLVM. It is a 20MB beast attached to my 200k language. So if I can't replace LLVM with MIR it would be pointless. Maybe Ruby is different. Personally I still don't see a good reason for this work ... anyway I do wish you success with it! |
There is a probably misunderstanding here. I am not going to generate C or C++ code and then generate MIR from it. C or C++ code I mentioned is already written by human. For example, standard Ruby method Here is a slide illustrating how I am planning to implement CRuby JIT with MIR: https://www.slideshare.net/VladimirMakarov13/the-lightweightjitcompilerprojectforc-ruby-141836482/31 Generating C or C++ code during JIT work and translating it into MIR would be huge wasting compilation time and memory. Whole advantage of MIR-generator would disappear.
In my example it is only for method
That is what I am going to do too. LLVM (or my C compiler) to MIR would be used only during building CRuby but not during CRuby work. |
From README:
I know @dibyendumajumdar maintains (or whatever he does to it, dibyendumajumdar/nanojit#15) https://github.com/dibyendumajumdar/nanojit because he likes that it's lightweight. He's also dissatisfied with the bloatedness of Eclipse OMR up to a level of forking it: https://github.com/dibyendumajumdar/nj .
Oh, he also has a C compiler for those JITs: https://github.com/dibyendumajumdar/dmr_c ;-)
The text was updated successfully, but these errors were encountered: