Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: compiling multiple modules incrementally #10

Closed
dibyendumajumdar opened this issue Oct 20, 2019 · 17 comments
Closed

Question: compiling multiple modules incrementally #10

dibyendumajumdar opened this issue Oct 20, 2019 · 17 comments

Comments

@dibyendumajumdar
Copy link
Contributor

Hi @vnmakarov

In a JIT environment many modules need to be compiled incrementally, i.e. not at the same time. I am planning to use the C2MIR component in Ravi - and my question is: is this currently supported with c2mir?

I noticed that there is a compile / link process in c2mir test code; presumably this needs to be done for all modules as they are compiled. But compiled functions need to be held in memory.

The model I use with LLVM or OMR is this:

Create Context - this holds all compiled objects
Repeat as required:
Compile a new module - compiled functions from new module are saved in the context.
When shutting down, destroy the context, thereby destroying all the compiled objects.

@dibyendumajumdar
Copy link
Contributor Author

I noticed that the c2mir implementation does not expose an API at the moment so I guess the answer to my question is no?

@vnmakarov
Copy link
Owner

Currently to execute MIR code which resides in memory (you can create it there by API or by parsing MIR textual representation or reading MIR binary representation) you need:

  • load a module (or modules) containing the mir code
  • link it. it is necessary because mir code can call other code outside the loaded module(s), e.g. some standard library functions, e.g. fprintf, some binary code function created not from MIR or MIR modules loaded before
  • call interpreter to execute a function from a linked module. In this case all executed MIR code will be interpreted
  • or generate binary for code the function and call it. A binary code for all functions called from this function directly or indirectly will be generated (if it is not generated yet)

You can load module containing a function with a name another loaded function has. In this case the new function will rewrite the old one. For example, we can generate different versions of function for a Ruby method using different speculative assumptions.

There a lot of functionality I'd like to add:

  • lazy binary code generation for MIR function: code generation on the first MIR function call
  • execution of some code in the interpreter and some code by generation of binary code and using it
  • freeing binary code and reusing it. E.g. if you load a new function version, you can free the old version binary code
  • unloading binary code for memory saving and on its subsequent call to use interpreter or generate a binary code again

As for your scenario: now you can create a context, create mir code, load and link it, optionally generate the binary code for it and interpret it or call the binary code. Then free the context and all allocated memory for this task (including binary code) will be freed. That is how libgccjit works, for example. But you are right there is no currently way to free memory for binary code and reuse it until the corresponding context is released.

@dibyendumajumdar
Copy link
Contributor Author

dibyendumajumdar commented Oct 21, 2019

Hi @vnmakarov

As for your scenario: now you can create a context, create mir code, load and link it, optionally generate the binary code for it and interpret it or call the binary code. Then free the context and all allocated memory for this task (including binary code) will be freed. That is how libgccjit works, for example. But you are right there is no currently way to free memory for binary code and reuse it until the corresponding context is released.

Sorry I didn't explain clearly what I am looking to do. Not freeing compiled code is not an immediate issue.

I will try to explain more fully.

Basically in Ravi, there is automatic compilation and user requested compilation. Either way, I need to compile different modules at different times.
Example:
Module A at time 0.
Module B at time 3.
Module C at time 4.

All above modules / functions within them will be active.

And so on. For now it is not a problem if memory is not released; but how can I do above?
Should I create a context for each module? If I want to retain the compiled function, how can I do that?

In libgccjit I used to only create 1 context. Same with LLVM.

I hope this is a bit clearer.

Thanks and Regards
Dibyendu

@vnmakarov
Copy link
Owner

Basically in Ravi, there is automatic compilation and user requested compilation. Either way, I need to compile different modules at different times.
Example:
Module A at time 0.
Module B at time 3.
Module C at time 4.

All above modules / functions within them will be active.

And so on. For now it is not a problem if memory is not released; but how can I do above?
Should I create a context for each module? If I want to retain the compiled function, how can I do that?

You can use one context for your case. For example,

load Module A; link; optionally generate binary code for function(s) from Module A
... Do something including execution of code from module A
load Module B; link; optionally generate binary code for function(s) from Module B or/and A
...Do something including execution of code from module B and A (of course
        if module B do not contain exported function with the same name as exported from A)
...

Module B can refers for a code in module A as it was already loaded and linked.

If module B refers for a function (or data) in module A and module A refers for a function in module B, you should link them together:

load module A; load module B; link

You can use also different contexts.

All generated code currently retains and can be used until finishing the corresponding context.

@dibyendumajumdar
Copy link
Contributor Author

Cool, I will try this out over the next couple of weeks.

@dibyendumajumdar
Copy link
Contributor Author

Hi,
I tested this out and it appears to work. I had to make some small changes in c2mir.c.
In particular I modified compile() function to return the module.
Also had trouble with include paths.

My simple compile front-end looks like this:

static size_t curr_char;
static const char *code;

static int t_getc (void) {
  int c = code[curr_char];

  if (c == 0)
    c = EOF;
  else
    curr_char++;
  return c;
}

static void t_ungetc (int c) {
  if (c == EOF) {
    assert (code[curr_char] == 0);
  } else {
    assert (curr_char != 0 && code[curr_char - 1] == c);
    curr_char--;
  }
}

static int other_option_func (int i, int argc, char *argv[], void *data) {
  return i;
}

static MIR_item_t find_function(MIR_module_t module, const char *func_name) {
  MIR_item_t func, main_func = NULL;
  for (func = DLIST_HEAD (MIR_item_t, module->items); func != NULL;
       func = DLIST_NEXT (MIR_item_t, func)) {
    if (func->item_type == MIR_func_item && strcmp (func->u.func->name, func_name) == 0)
      main_func = func;
  }
  return main_func;
}

void *MIR_compile_C_module(const char *inputbuffer, const char *func_name, void *(Import_resolver_func)(const char *name))
{
  int n = 0;
  int ret_code = 0;
  int (*fun_addr) (void *) = NULL;
  if (!ctx) {
    ctx = MIR_init ();
  }
  c2mir_init();
  code = inputbuffer;
  curr_char = 0;
  compile_init (0, NULL, t_getc, t_ungetc, other_option_func, &n);
  char module_name[80];
  snprintf(module_name, sizeof module_name, "%s_module", func_name);
  curr_module_num++;
  MIR_module_t module = compile (module_name);
  if (!module) ret_code = 1;
  compile_finish ();
  if (ret_code == 0 && module) {
    MIR_item_t main_func = find_function(module, func_name);
    if (main_func == NULL) {
      fprintf(stderr, "function %s not found\n", func_name);
      exit(1);
    }
    MIR_load_module (ctx, module);
    MIR_gen_init (ctx);
#if MIR_GEN_DEBUG
    MIR_gen_set_debug_file (ctx, stderr);
#endif
    MIR_link (ctx, MIR_set_gen_interface, Import_resolver_func);
    fun_addr = MIR_gen (ctx, main_func);
    MIR_gen_finish (ctx);
  }
  c2mir_finish ();
  return fun_addr;
}

The following demonstrates how I can use this:

static void *import_resolver(const char *name) {
  if (strcmp(name, "printf") == 0) {
    return printf;
  }
  return NULL;
}

int main(int argc, const char *argv[]) {

  const char *code1 = "extern int printf(const char *, ...);\n"
                      "int f1(void) { printf(\"hello world from f1\\n\"); return 0; }\n";
  const char *code2 = "extern int printf(const char *, ...);\n"
                      "int f2(void) { printf(\"hello world from f2\\n\"); return 0; }\n";

  int (*fp1)() = MIR_compile_C_module(code1, "f1", import_resolver);
  int (*fp2)() = MIR_compile_C_module(code2, "f2", import_resolver);

  fp1();
  fp2();

  return 0;
}

I have made these changes in my fork of your library.

@dibyendumajumdar
Copy link
Contributor Author

dibyendumajumdar commented Oct 27, 2019

I think that it would be useful if the c2mir was also treated as a library / API, which means exposing some api for clients to call. My suggestion is something like:

MIR_module_t 
MIR_compile_C_module(MIR_context_t ctx, int argc, const char *argv[], const char *inputbuffer);

That is, given an optional buffer plus command line arguments, generate a module and return it.

Couple of other observations:

  • I had trouble with the two header files mirc.h and another x86 one that appears to be required at runtime. Is this really needed to be given at runtime? Can it be 'built-in' instead?

  • Obviously the C2MIR has static state at the moment. I hope this will be removed at some point.

Regards

@dibyendumajumdar
Copy link
Contributor Author

Hi @vnmakarov

Regarding the two C header files that c2mir looks for at runtime:

static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};

Are these essential?

Is it possible to change the way they are found? Right now they are expected to be somewhere below "./" - ideally they should be found through the -I include path option, or maybe an environment variable such as C2MIR_INCLUDE_PATH could be used if available.

Even better would be to add these definitions in the code itself, rather than relying on header files.

@vnmakarov
Copy link
Owner

Regarding the two C header files that c2mir looks for at runtime:

static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};

Are these essential?

No they are not essential but defined macros and types in these files are essential for c2mir work. Using the files was a fast implementation of a lot of predefined macros. As you propose they could be hard coded in c2mir code. As I now seriously consider to implement C as an additional input to the JIT (that is how you started to use it for Ravi), it would be nice to get rid of these files. I put this work on my list. So thank you for your proposal.

More general question about configuration and installation. The current makefile and file locations are only for my current development.. I did not started to work on packaging and even designing it (what it would be static/dynamic libraries, how many libraries and so on). I can only say that most probably I will use autoconf. I don't like cmake.

Is it possible to change the way they are found? Right now they are expected to be somewhere below "./" - ideally they should be found through the -I include path option, or maybe an environment variable such as C2MIR_INCLUDE_PATH could be used if available.

Even better would be to add these definitions in the code itself, rather than relying on header files.

@dibyendumajumdar
Copy link
Contributor Author

I can only say that most probably I will use autoconf. I don't like cmake.

I guess you should go with your preference.
CMake has couple of advantages. It generates IDE project files on Windows / Mac OSX. It is also used by CLion IDE.

@dibyendumajumdar
Copy link
Contributor Author

dibyendumajumdar commented Nov 3, 2019

Regarding the two C header files that c2mir looks for at runtime:

static const char *standard_includes[]
  = {"include/mirc/mirc.h", "include/mirc/x86-64/mirc-x86_64-linux.h"};

Are these essential?

No they are not essential but defined macros and types in these files are essential for c2mir work.

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

@vnmakarov
Copy link
Owner

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

Sorry, I was not accurate. What I meant compiling some C programs (at least from tests I use) need these files. So they are necessary in a general case. But you can write code (and this is your case) where they are not necessary.

@vnmakarov
Copy link
Owner

I guess you should go with your preference.
CMake has couple of advantages. It generates IDE project files on Windows / Mac OSX. It is also used by CLion IDE.

There is a personal preference but the preference is also from the fact that CRuby or CPython does not use cmake. They use autoconf. I hope to have some integration of this project with CRuby project and using cmake would be an issue.

@dibyendumajumdar
Copy link
Contributor Author

Well looks like my code doesn't need those header files. So for now I can simply avoid treating their lack as a fatal error.

Sorry, I was not accurate. What I meant compiling some C programs (at least from tests I use) need these files. So they are necessary in a general case. But you can write code (and this is your case) where they are not necessary.

Hi isn't it the reverse? The general case doesn't need those headers, they are only needed to emulate GCC behaviour?

@vnmakarov
Copy link
Owner

Hi isn't it the reverse? The general case doesn't need those headers, they are only needed to emulate GCC behaviour?

I don't know about other environments but when you use GLIBC, you can not compile a lot of standard headers w/o some of these GCC macros. That is why Clang mimics GCC and pretends implementing all GCC features although it is not true, for example you can not still compile elfutils with clang because clang does not implement nested functions which are heavily used by elfutils.

@dibyendumajumdar
Copy link
Contributor Author

I don't know about other environments but when you use GLIBC, you can not compile a lot of standard headers w/o some of these GCC macros. That is why Clang mimics GCC and pretends implementing all GCC features although it is not true, for example you can not still compile elfutils with clang because clang does not implement nested functions which are heavily used by elfutils.

I guess if you have always worked with gcc and glibc that may seem like the default. But for sure on Windows none of that applies.

I think there are two questions here:

a) Does MIR need those header files to parse/generate code? My impression is no.
b) Should MIR by default provide a GCC like environment. Sure I see no problem with that.

a) wasn't clear to me because I assumed those headers contained some code layout data. But when I looked at them they were just #defines and typedefs.

@vnmakarov
Copy link
Owner

I think there are two questions here:

a) Does MIR need those header files to parse/generate code? My impression is no.

For the current C2MIR, in some cases, yes. For example, you include a C standard header file whose implementation uses a macro defining endianess of CPU. This macro is provided by GCC. For C2MIR it is taken from the mentioned include file. Another case is macro describing an architecture like x86_64 and aarch64. These macros can be used even in files compiled on Windows, especially in mingw environment.

But again, these c2mir header files can be removed and the macro and definitions can be hard-coded in c2mir sources. And I am going to do this later.

b) Should MIR by default provide a GCC like environment. Sure I see no problem with that.

Providing full GCC like environment is a huge work, probably bigger that the current C2MIR implementation. So for C2MIR __GNUC__ macro will be never defined (btw clang defines this macro although again it does not implement all GCC features).

What I do is that I am just trying to implement a minimum (mostly add some predefined GCC macros) to compile and pass the current tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants