Question: compiling multiple modules incrementally #10

dibyendumajumdar · 2019-10-20T12:35:30Z

In a JIT environment many modules need to be compiled incrementally, i.e. not at the same time. I am planning to use the C2MIR component in Ravi - and my question is: is this currently supported with c2mir?

I noticed that there is a compile / link process in c2mir test code; presumably this needs to be done for all modules as they are compiled. But compiled functions need to be held in memory.

The model I use with LLVM or OMR is this:

Create Context - this holds all compiled objects
Repeat as required:
Compile a new module - compiled functions from new module are saved in the context.
When shutting down, destroy the context, thereby destroying all the compiled objects.

dibyendumajumdar · 2019-10-21T06:49:46Z

I noticed that the c2mir implementation does not expose an API at the moment so I guess the answer to my question is no?

vnmakarov · 2019-10-21T15:19:53Z

Currently to execute MIR code which resides in memory (you can create it there by API or by parsing MIR textual representation or reading MIR binary representation) you need:

load a module (or modules) containing the mir code
link it. it is necessary because mir code can call other code outside the loaded module(s), e.g. some standard library functions, e.g. fprintf, some binary code function created not from MIR or MIR modules loaded before
call interpreter to execute a function from a linked module. In this case all executed MIR code will be interpreted
or generate binary for code the function and call it. A binary code for all functions called from this function directly or indirectly will be generated (if it is not generated yet)

You can load module containing a function with a name another loaded function has. In this case the new function will rewrite the old one. For example, we can generate different versions of function for a Ruby method using different speculative assumptions.

There a lot of functionality I'd like to add:

lazy binary code generation for MIR function: code generation on the first MIR function call
execution of some code in the interpreter and some code by generation of binary code and using it
freeing binary code and reusing it. E.g. if you load a new function version, you can free the old version binary code
unloading binary code for memory saving and on its subsequent call to use interpreter or generate a binary code again

As for your scenario: now you can create a context, create mir code, load and link it, optionally generate the binary code for it and interpret it or call the binary code. Then free the context and all allocated memory for this task (including binary code) will be freed. That is how libgccjit works, for example. But you are right there is no currently way to free memory for binary code and reuse it until the corresponding context is released.

dibyendumajumdar · 2019-10-21T19:55:56Z

Hi @vnmakarov

As for your scenario: now you can create a context, create mir code, load and link it, optionally generate the binary code for it and interpret it or call the binary code. Then free the context and all allocated memory for this task (including binary code) will be freed. That is how libgccjit works, for example. But you are right there is no currently way to free memory for binary code and reuse it until the corresponding context is released.

Sorry I didn't explain clearly what I am looking to do. Not freeing compiled code is not an immediate issue.

I will try to explain more fully.

Basically in Ravi, there is automatic compilation and user requested compilation. Either way, I need to compile different modules at different times.
Example:
Module A at time 0.
Module B at time 3.
Module C at time 4.

All above modules / functions within them will be active.

And so on. For now it is not a problem if memory is not released; but how can I do above?
Should I create a context for each module? If I want to retain the compiled function, how can I do that?

In libgccjit I used to only create 1 context. Same with LLVM.

I hope this is a bit clearer.

Thanks and Regards
Dibyendu

vnmakarov · 2019-10-22T14:26:40Z

Basically in Ravi, there is automatic compilation and user requested compilation. Either way, I need to compile different modules at different times.
Example:
Module A at time 0.
Module B at time 3.
Module C at time 4.

All above modules / functions within them will be active.

And so on. For now it is not a problem if memory is not released; but how can I do above?
Should I create a context for each module? If I want to retain the compiled function, how can I do that?

You can use one context for your case. For example,

load Module A; link; optionally generate binary code for function(s) from Module A
... Do something including execution of code from module A
load Module B; link; optionally generate binary code for function(s) from Module B or/and A
...Do something including execution of code from module B and A (of course
        if module B do not contain exported function with the same name as exported from A)
...

Module B can refers for a code in module A as it was already loaded and linked.

If module B refers for a function (or data) in module A and module A refers for a function in module B, you should link them together:

load module A; load module B; link

You can use also different contexts.

All generated code currently retains and can be used until finishing the corresponding context.

dibyendumajumdar · 2019-10-22T20:36:21Z

Cool, I will try this out over the next couple of weeks.

dibyendumajumdar · 2019-10-27T15:49:23Z

Hi,
I tested this out and it appears to work. I had to make some small changes in c2mir.c.
In particular I modified compile() function to return the module.
Also had trouble with include paths.

My simple compile front-end looks like this:

static size_t curr_char;
static const char *code;

static int t_getc (void) {
  int c = code[curr_char];

  if (c == 0)
    c = EOF;
  else
    curr_char++;
  return c;
}

static void t_ungetc (int c) {
  if (c == EOF) {
    assert (code[curr_char] == 0);
  } else {
    assert (curr_char != 0 && code[curr_char - 1] == c);
    curr_char--;
  }
}

static int other_option_func (int i, int argc, char *argv[], void *data) {
  return i;
}

static MIR_item_t find_function(MIR_module_t module, const char *func_name) {
  MIR_item_t func, main_func = NULL;
  for (func = DLIST_HEAD (MIR_item_t, module->items); func != NULL;
       func = DLIST_NEXT (MIR_item_t, func)) {
    if (func->item_type == MIR_func_item && strcmp (func->u.func->name, func_name) == 0)
      main_func = func;
  }
  return main_func;
}

void *MIR_compile_C_module(const char *inputbuffer, const char *func_name, void *(Import_resolver_func)(const char *name))
{
  int n = 0;
  int ret_code = 0;
  int (*fun_addr) (void *) = NULL;
  if (!ctx) {
    ctx = MIR_init ();
  }
  c2mir_init();
  code = inputbuffer;
  curr_char = 0;
  compile_init (0, NULL, t_getc, t_ungetc, other_option_func, &n);
  char module_name[80];
  snprintf(module_name, sizeof module_name, "%s_module", func_name);
  curr_module_num++;
  MIR_module_t module = compile (module_name);
  if (!module) ret_code = 1;
  compile_finish ();
  if (ret_code == 0 && module) {
    MIR_item_t main_func = find_function(module, func_name);
    if (main_func == NULL) {
      fprintf(stderr, "function %s not found\n", func_name);
      exit(1);
    }
    MIR_load_module (ctx, module);
    MIR_gen_init (ctx);
#if MIR_GEN_DEBUG
    MIR_gen_set_debug_file (ctx, stderr);
#endif
    MIR_link (ctx, MIR_set_gen_interface, Import_resolver_func);
    fun_addr = MIR_gen (ctx, main_func);
    MIR_gen_finish (ctx);
  }
  c2mir_finish ();
  return fun_addr;
}

The following demonstrates how I can use this:

static void *import_resolver(const char *name) {
  if (strcmp(name, "printf") == 0) {
    return printf;
  }
  return NULL;
}

int main(int argc, const char *argv[]) {

  const char *code1 = "extern int printf(const char *, ...);\n"
                      "int f1(void) { printf(\"hello world from f1\\n\"); return 0; }\n";
  const char *code2 = "extern int printf(const char *, ...);\n"
                      "int f2(void) { printf(\"hello world from f2\\n\"); return 0; }\n";

  int (*fp1)() = MIR_compile_C_module(code1, "f1", import_resolver);
  int (*fp2)() = MIR_compile_C_module(code2, "f2", import_resolver);

  fp1();
  fp2();

  return 0;
}

I have made these changes in my fork of your library.

dibyendumajumdar · 2019-10-27T21:16:29Z

I think that it would be useful if the c2mir was also treated as a library / API, which means exposing some api for clients to call. My suggestion is something like:

MIR_module_t 
MIR_compile_C_module(MIR_context_t ctx, int argc, const char *argv[], const char *inputbuffer);

That is, given an optional buffer plus command line arguments, generate a module and return it.

Couple of other observations:

I had trouble with the two header files mirc.h and another x86 one that appears to be required at runtime. Is this really needed to be given at runtime? Can it be 'built-in' instead?
Obviously the C2MIR has static state at the moment. I hope this will be removed at some point.

Regards

dibyendumajumdar · 2019-11-03T12:12:13Z

Hi @vnmakarov

Regarding the two C header files that c2mir looks for at runtime:

static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};

Are these essential?

Is it possible to change the way they are found? Right now they are expected to be somewhere below "./" - ideally they should be found through the -I include path option, or maybe an environment variable such as C2MIR_INCLUDE_PATH could be used if available.

Even better would be to add these definitions in the code itself, rather than relying on header files.

vnmakarov · 2019-11-03T20:00:16Z

Regarding the two C header files that c2mir looks for at runtime:
static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};
Are these essential?

No they are not essential but defined macros and types in these files are essential for c2mir work. Using the files was a fast implementation of a lot of predefined macros. As you propose they could be hard coded in c2mir code. As I now seriously consider to implement C as an additional input to the JIT (that is how you started to use it for Ravi), it would be nice to get rid of these files. I put this work on my list. So thank you for your proposal.

More general question about configuration and installation. The current makefile and file locations are only for my current development.. I did not started to work on packaging and even designing it (what it would be static/dynamic libraries, how many libraries and so on). I can only say that most probably I will use autoconf. I don't like cmake.

Is it possible to change the way they are found? Right now they are expected to be somewhere below "./" - ideally they should be found through the -I include path option, or maybe an environment variable such as C2MIR_INCLUDE_PATH could be used if available.

Even better would be to add these definitions in the code itself, rather than relying on header files.

dibyendumajumdar · 2019-11-03T20:07:56Z

I can only say that most probably I will use autoconf. I don't like cmake.

I guess you should go with your preference.
CMake has couple of advantages. It generates IDE project files on Windows / Mac OSX. It is also used by CLion IDE.

dibyendumajumdar · 2019-11-03T21:15:29Z

Regarding the two C header files that c2mir looks for at runtime:
static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};
Are these essential?
No they are not essential but defined macros and types in these files are essential for c2mir work.

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

vnmakarov · 2019-11-04T15:30:43Z

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

Sorry, I was not accurate. What I meant compiling some C programs (at least from tests I use) need these files. So they are necessary in a general case. But you can write code (and this is your case) where they are not necessary.

vnmakarov · 2019-11-04T15:42:23Z

I guess you should go with your preference.
CMake has couple of advantages. It generates IDE project files on Windows / Mac OSX. It is also used by CLion IDE.

There is a personal preference but the preference is also from the fact that CRuby or CPython does not use cmake. They use autoconf. I hope to have some integration of this project with CRuby project and using cmake would be an issue.

dibyendumajumdar · 2019-11-04T15:45:21Z

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

Sorry, I was not accurate. What I meant compiling some C programs (at least from tests I use) need these files. So they are necessary in a general case. But you can write code (and this is your case) where they are not necessary.

Hi isn't it the reverse? The general case doesn't need those headers, they are only needed to emulate GCC behaviour?

vnmakarov · 2019-11-04T17:51:57Z

Hi isn't it the reverse? The general case doesn't need those headers, they are only needed to emulate GCC behaviour?

I don't know about other environments but when you use GLIBC, you can not compile a lot of standard headers w/o some of these GCC macros. That is why Clang mimics GCC and pretends implementing all GCC features although it is not true, for example you can not still compile elfutils with clang because clang does not implement nested functions which are heavily used by elfutils.

dibyendumajumdar · 2019-11-04T20:21:17Z

I don't know about other environments but when you use GLIBC, you can not compile a lot of standard headers w/o some of these GCC macros. That is why Clang mimics GCC and pretends implementing all GCC features although it is not true, for example you can not still compile elfutils with clang because clang does not implement nested functions which are heavily used by elfutils.

I guess if you have always worked with gcc and glibc that may seem like the default. But for sure on Windows none of that applies.

I think there are two questions here:

a) Does MIR need those header files to parse/generate code? My impression is no.
b) Should MIR by default provide a GCC like environment. Sure I see no problem with that.

a) wasn't clear to me because I assumed those headers contained some code layout data. But when I looked at them they were just #defines and typedefs.

vnmakarov · 2019-11-05T02:44:33Z

I think there are two questions here:

a) Does MIR need those header files to parse/generate code? My impression is no.

For the current C2MIR, in some cases, yes. For example, you include a C standard header file whose implementation uses a macro defining endianess of CPU. This macro is provided by GCC. For C2MIR it is taken from the mentioned include file. Another case is macro describing an architecture like x86_64 and aarch64. These macros can be used even in files compiled on Windows, especially in mingw environment.

But again, these c2mir header files can be removed and the macro and definitions can be hard-coded in c2mir sources. And I am going to do this later.

b) Should MIR by default provide a GCC like environment. Sure I see no problem with that.

Providing full GCC like environment is a huge work, probably bigger that the current C2MIR implementation. So for C2MIR __GNUC__ macro will be never defined (btw clang defines this macro although again it does not implement all GCC features).

What I do is that I am just trying to implement a minimum (mostly add some predefined GCC macros) to compile and pass the current tests.

dibyendumajumdar mentioned this issue Nov 26, 2019

I think that it would be useful if the c2mir was also treated as a library / API, which means exposing some api for clients to call. #14

Closed

dibyendumajumdar closed this as completed Dec 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: compiling multiple modules incrementally #10

Question: compiling multiple modules incrementally #10

dibyendumajumdar commented Oct 20, 2019

dibyendumajumdar commented Oct 21, 2019

vnmakarov commented Oct 21, 2019

dibyendumajumdar commented Oct 21, 2019 •

edited

Loading

vnmakarov commented Oct 22, 2019

dibyendumajumdar commented Oct 22, 2019

dibyendumajumdar commented Oct 27, 2019

dibyendumajumdar commented Oct 27, 2019 •

edited

Loading

dibyendumajumdar commented Nov 3, 2019

vnmakarov commented Nov 3, 2019

dibyendumajumdar commented Nov 3, 2019

dibyendumajumdar commented Nov 3, 2019 •

edited

Loading

vnmakarov commented Nov 4, 2019

vnmakarov commented Nov 4, 2019

dibyendumajumdar commented Nov 4, 2019

vnmakarov commented Nov 4, 2019

dibyendumajumdar commented Nov 4, 2019

vnmakarov commented Nov 5, 2019

Question: compiling multiple modules incrementally #10

Question: compiling multiple modules incrementally #10

Comments

dibyendumajumdar commented Oct 20, 2019

dibyendumajumdar commented Oct 21, 2019

vnmakarov commented Oct 21, 2019

dibyendumajumdar commented Oct 21, 2019 • edited Loading

vnmakarov commented Oct 22, 2019

dibyendumajumdar commented Oct 22, 2019

dibyendumajumdar commented Oct 27, 2019

dibyendumajumdar commented Oct 27, 2019 • edited Loading

dibyendumajumdar commented Nov 3, 2019

vnmakarov commented Nov 3, 2019

dibyendumajumdar commented Nov 3, 2019

dibyendumajumdar commented Nov 3, 2019 • edited Loading

vnmakarov commented Nov 4, 2019

vnmakarov commented Nov 4, 2019

dibyendumajumdar commented Nov 4, 2019

vnmakarov commented Nov 4, 2019

dibyendumajumdar commented Nov 4, 2019

vnmakarov commented Nov 5, 2019

dibyendumajumdar commented Oct 21, 2019 •

edited

Loading

dibyendumajumdar commented Oct 27, 2019 •

edited

Loading

dibyendumajumdar commented Nov 3, 2019 •

edited

Loading