Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: User-defined native modules #52

Open
ChayimFriedman2 opened this issue Aug 6, 2020 · 26 comments
Open

Proposal: User-defined native modules #52

ChayimFriedman2 opened this issue Aug 6, 2020 · 26 comments

Comments

@ChayimFriedman2
Copy link

ChayimFriedman2 commented Aug 6, 2020

Any language needs its way to run native code. As a language for embedding, Wren gives you the flexibility to control the modules you load through the config bindForeignMethodFn and bindForeignClassFn.

However, the CLI is meant to run programs and not for embedded scripting. As such, it needs a way to load user-written native modules.

I suggest the following:

  • There will be a function, named, say, Module.importNative(), which will take a normal module path.
  • When invoked, it'll search for a .dll or .so file with the same name (depending on the operating system).
  • If not found, this is an error.
  • If found, it'll load the module into memory and search for a certain function name, let's call it wrenInitModule().
  • This function will take no arguments, and will return a pointer to struct WrenNativeModule. This struct will be distributed in a .h file, and any native module will include it.
  • The struct will have four fields:
    • classes array of type WrenNativeClass, and its length (alternatively, we can pass an entry filled with NULLs at last. This is the strategy used in CPython).
    • methods array of type WrenNativeMethod and its length (ditto).
  • The structs will be defined as follows:
typedef struct
{
  const char* module;
  const char* className;
  WrenForeignClassMethods classMethods;
} WrenNativeClass;

typedef struct
{
  const char* module;
  const char* className;
  bool isStatic;
  const char* signature;
  WrenForeignMethodFn method;
} WrenNativeMethod;
  • Module.importNative() will append these arrays to vectors of WrenNativeClass and WrenNativeMethod, accordingly.
  • Future import of the same module will be ignored (as with the Wren import system).
  • In the implementation of bindForeignClassFn and bindForeignMethodFn, the CLI will look at its defined native methods and return the one appropriate, or NULL if not found.
    • A side note: I know that the VM caches calls to these methods, but I don't know whether it caches only successful calls or even when not found. If the second assumption is the correct, we need to change it.
  • This will also be a drop-in replacement for the current implementation, and we'll just implement the standard library as any other native module.

A complete example:

Let's say we want to implement a method that squares its argument, but as native.

Let's start with the C code:

#include <stdbool.h>

#include "wren.h"
#include "wren-cli.h" // Or wherever the above defined

#if defined(_WIN32) || defined(__WIN32__)
#define EXPORT extern "C" __declspec(dllexport)
#else
#define EXPORT extern "C"
#endif

extern "C" void myMathSquare(WrenVM* vm)
{
  double arg = wrenGetSlotDouble(vm, 1);
  wrenSetSlotDouble(vm, 0, arg * arg);
}

static const WrenNativeModule module = {
  // Classes
  { }, 0
  // Methods
  {
    { "myMath", "MyMath", true, "square(_)", myMathSquare }
  },
  1
};
EXPORT const WrenNativeModule* wrenInitModule()
{
  return &module;
}

Now, define myMath.wren:

Module.importNative("./myMath-native")

class MyMath {
  foreign static square(a)
}

Now, compile the C code to a dll or so under the name myMath-native. Our Wren code is serving a bridge between Wren's foreign interface and the user code, so it can use it just as if it would be a normal Wren module:

import "myMath" for MyMath

System.print(MyMath.square(15))

Once the discussion around this topic is finished, I would like to implement this myself.

Wren is SO AMAZING!!!!! 🎉 🎉 🎉 Best regards.

Edit:

We'll still need one special case handling in bindForeignMethodFn - the Module.importNative() function itself...

@ChayimFriedman2 ChayimFriedman2 changed the title User-defined native modules Proposal: User-defined native modules Aug 6, 2020
@CohenArthur
Copy link

I'm assuming that by "native-code" you mean code that the user can write in C/C++ but use in Wren ? Akin to how Python uses libraries written in C where performance is important ? (Sorry, I'm unfamiliar with those terms)

While I think it's important to eventually offer that, I'm not sure it's the team's main focus right now. Performance does not seem to be too much of a concern, as it's already pretty good. However, I'm sure that if you wrote it and submitted a PR, they would be delighted to review it. I don't think you can go wrong with this, and I agree that it would be pretty nice even when embedding wren in other applications. I can imagine a game benefiting from an extra boost in the scripts where it's needed

@CohenArthur
Copy link

Another nice thing to work on in terms of performance would be the ability to save and load bytecode. Bypassing the front-end of the interpreter would surely give a nice boost. But there are again multiple matters at hand and it's not as easy as it sounds

@ChayimFriedman2
Copy link
Author

ChayimFriedman2 commented Aug 6, 2020

Performance are a nice side benefit, but they're certainly not the main concern.

How do you control the mouse in Python (or Wren)? Well, you have no way except native extensions. Native code is not only for performance, but also to communicate with another native code, often of the underlying platform (i.e. OS) that you have no other way to talk to, but not only: can you implement libuv in wren, given the raw API of the OS? yes, but it will be effectively reinventing the wheel. Many code was already written, and any language needs a way to use it. Native code is the bridge - you can communicate even between Python and Wren with native code. Of course we can theoretically create bindings for any language, but practically it's not possible. The only way is to build a bridge from every language to one pre-agreed language, and the only one that meets this criteria is C (or sometimes C++, like with V8).

@ChayimFriedman2
Copy link
Author

And why I don't just send a PR, let me cite you the official docs:

If nothing there suits your fancy, new ideas are welcome as well! If you have an idea for a significant change or addition, please file a proposal to discuss it before writing lots of code. Wren tries very very hard to be minimal which means often having to say “no” to language additions, even really cool ones.

This is a significant change, and moreover, it includes design choices. I don't want to take the responsibility, nor to design Wren - Bob does the job best. I'm just proposing my ideas, based on my personal and professional experience and what was made in other programming languages, and suggesting help with their implementation.

@ChayimFriedman2
Copy link
Author

ChayimFriedman2 commented Aug 6, 2020

And a last side note: when the VM implementation is good (and it is), native code generally slow down the perf. I saw V8 benchmarks where trials to native-ize the code just made it slower. This is because the communication between the VM and the native code adds overhead. And yes, V8 does JIT and optimizes the code so the overhead will be much smaller in Wren, but it'll still be there. Maybe native code will make things faster, but the complexity doesn't worth it. Generally, it only good at expensive and CPU-intensive operations. NumPy is a great example, but I can't see Wren steps into this area. Maybe the problem is with my imagination, who knows 😜

@ChayimFriedman2
Copy link
Author

ChayimFriedman2 commented Aug 6, 2020

I promised a last comment, but excuse me: Did you mention games? Games make use of the GPU. Can you access the GPU natively in Wren?...

@CohenArthur
Copy link

How do you control the mouse in Python (or Wren)? Well, you have no way except native extensions. Native code is not only for performance, but also to communicate with another native code, often of the underlying platform (i.e. OS) that you have no other way to talk to, but not only: can you implement libuv in wren, given the raw API of the OS? yes, but it will be effectively reinventing the wheel. Many code was already written, and any language needs a way to use it. Native code is the bridge - you can communicate even between Python and Wren with native code. Of course we can theoretically create bindings for any language, but practically it's not possible. The only way is to build a bridge from every language to one pre-agreed language, and the only one that meets this criteria is C (or sometimes C++, like with V8).

That's a very good point. I haven't looked at it, but DOME and luxe most certainly have solved that issue, so it might be interesting to look at the source.

And why I don't just send a PR, let me cite you the official docs:

If nothing there suits your fancy, new ideas are welcome as well! If you have an idea for a significant change or addition, please file a proposal to discuss it before writing lots of code. Wren tries very very hard to be minimal which means often having to say “no” to language additions, even really cool ones.

This is a significant change, and moreover, it includes design choices. I don't want to take the responsibility, nor to design Wren - Bob does the job best. I'm just proposing my ideas, based on my personal and professional experience and what was made in other programming languages, and suggesting help with their implementation.

Also a good point when it comes to proving that I haven't read the documentation enough haha. Sorry about that

And a last side note: when the VM implementation is good (and it is), native code generally slow down the perf. I saw V8 benchmarks where trials to native-ize the code just made it slower. This is because the communication between the VM and the native code adds overhead. And yes, V8 does JIT and optimizes the code so the overhead will be much smaller in Wren, but it'll still be there. Maybe native code will make things faster, but the complexity doesn't worth it. Generally, it only good at expensive and CPU-intensive operations. NumPy is a great example, but I can't see Wren steps into this area. Maybe the problem is with my imagination, who knows 😜

This is very interesting. I wasn't aware that saving/loading bytecode would actually make interpreters slower. I'm going to look into it. I was almost certain that this was one of the strategies used by CPython, but I guess I understood that wrong.

I promised a last comment, but excuse me: Did you mention games? Games make use of the GPU. Can you access the GPU natively in Wren?...

That's true haha. Again, I wonder how DOME and Luxe do it.

@ChayimFriedman2
Copy link
Author

@CohenArthur I took a look at DOME: It does not solves this issues. It does contain native module to control the mouse etc., but does not contain an ability to use user-defined native modules. And that's important, because obviously you cannot provide a builtin module for anything the user will like to do. This is the reason why languages contain bunch of useful modules, and let ecosystem to complete it.

And about GPU: I don't know whether they make use of it, but if so, they can implement it just as C code...

@avivbeeri
Copy link

avivbeeri commented Aug 9, 2020

DOME does have a mechanism for this, actually, but it's not a standard feature. You can link to arbitrary compiled DLLs via libffi, but it requires you to create a special API spec for any structs and function calls you need.

Further more, DOME is designed so that it's fairly easy to modify, so you can add your own modules if you really need to.

@ChayimFriedman2
Copy link
Author

OK. Now I read DOME's docs, and you can use the ffi module for that. The advantage of this method is that you don't have to create a wrapper around your native module. The disadvantages are speed, type checking, and inability to work with Wren's types, for examples fibers.

I strictly prefer my suggestion because it is natural to the Wren user as it's identical to how you embed Wren (mostly), and can be used to emulate the DOME's method. For example, CPython uses C extension similar to what I proposed, but also has the built-in module ctypes which enables you to work with dynamic loaded libraries directly from your Python Code. Such a module can be built without much hardness in Wren, too.

@ChayimFriedman2
Copy link
Author

@munificent or @ruby0x1 ?

@avivbeeri
Copy link

An approach to FFI I tried in DOME before libffi was to use a DLL with a well defined interface for providing Wren module source and setting up foreign method bindings for the VM, but these are all problems a Wren VM user has to solve themselves. I'm not sure it's the job of the VM to have this built in, as it's very application specific.

A downside to this approach is you need to compile a special binding module in order to use other people's DLLs, which is less convenient.

That's why I took the libffi approach, because you could work with arbitrary DLLs.

@ChayimFriedman2
Copy link
Author

ChayimFriedman2 commented Aug 12, 2020

As I said, one that have native modules support can build a module similar to libffi (Python's ctypes is an example).

I've already specified the downsides of your approach, which are why I prefer mine. A simple proof that it's better will be to inspect other languages: all languages I know uses the DLL approach.

Of course the VM should not handle this which is why I opened this issue in the CLI repo 😄 It's explicitly stated in my first comment (moreover, if the VM would expose such a method it'll be a security hole, since a Wren script in, say, your word processor, will have access to the full filesystem etc.).

@avivbeeri
Copy link

Sorry, still getting used to the CLI/VM split.

Having reviewed your suggestion again with a clearer head, it's almost exactly what I had experimented with in DOME, except a nicer interface and better explained.

I was thinking about the case where you wanted to integrate a library such as libcurl or similar. You'd need a precompiled DLL for that, and a seperate one for your Wren-libcurl bindings, which felt excessive to me.

The libffi does away with that approach, but you do lose the potential for type safety.

@ChayimFriedman2
Copy link
Author

I already answered you two times before: first, you can implement libffi with native modules whilst the opposite is not true. Second, speed. Third, type safety. Fourth, access to Wren types (Fiber for example). Last but not least, one need to write the bindings once and everyone will use forever. Not so much work.

And thanks for the compliments 💛

@joshgoebel
Copy link
Contributor

I assume this would be fully compatible with foreign classes as well since you'd just add <finalize> and <allocate> signatures to their classMethods?

This will also be a drop-in replacement for the current implementation, and we'll just implement the standard library as any other native module.

This looks pretty great. 💛 I like that it gives us a way to de-couple things a bit more and could take the pressure off of what belongs in the standard library vs what can be provided by the ecosystem. I was imagining things that "touch the OS" MUST be compiled into the CLI directly, but this entirely changes the dynamic.

I dunno about the linking/compiling/loading DLL side of this but it sounds like you know how that would work. Ideally it'd be easy to compile these modules (and eventually perhaps we'd have some tooling for installing them)... once compiled the source and binaries would just need to be placed in a place where they could be found - as in the proposal in #78, correct?

Wren is SO AMAZING!!!!!

Ditto. ❣️❣️❣️

@joshgoebel
Copy link
Contributor

alternatively, we can pass an entry filled with NULLs at last.

This is how the current structures are setup in the CLI.

I'd suggest also adding a fifth field: version ... that defaults to 1... so we can increment it if we ever have a reason binary compatibility needs to break so the CLI could detect incompatible libraries (and suggest upgrading) instead of crashing.

@joshgoebel
Copy link
Contributor

joshgoebel commented Apr 29, 2021

We'll still need one special case handling in bindForeignMethodFn - the Module.importNative() function itself...

We discussed this on channel and it seems this complexity may not be necessary. We can just overload the module loader in the CLI itself by adding the library name to the module prefix:

import "networking:tcp" for TcpSocket

The library itself (tcp.wren here) would have nothing special about it (no special declarations other than normal Wren).

This statement would result in:

  • locate the networking library
  • load tcp.wren
  • load networking native module if available and register all modules

I was originally thinking that libraries would hold SINGLE modules but it does seem more flexible if a library can consist of many modules... ie, networking includes the modules tcp, http, etc... and tcp modules includes classes TCPSocket, TCPServer, etc... I guess having one large DLL is more "traditional"?


So I've been looking at the minimal changes to CLI to make this happen and I have added the concept of libraries:

typedef struct 
{
  const char* name;

  ModuleRegistry (*modules)[MAX_MODULES_PER_LIBRARY];
} LibraryRegistry;

static LibraryRegistry libraries[MAX_LIBRARIES] = {
  { "core", &coreCLImodules},
  { "added", &moreModules},
  { NULL, NULL }
};

So what was modules before has become coreCLImodes and you can register many libraries (which are searched sequentially to find modules). One imagines that DLL loading would consider of something like a call to registerLibrary which would add the library to the global registry.

So the shared library would need only to export it's LibraryRegistry.


Now all I need is a compilable shared native-module to test this with. Throwing ball back to you. :) It doesn't need to be much, just a class with a foreign C function that printed some output to the string would probably be sufficient?

@ChayimFriedman2
Copy link
Author

I can do that, through it's definitely not something to push into production. We need growing arrays.

I'll do that in some hours. Don't have a mind for that now, sorry.

@joshgoebel
Copy link
Contributor

joshgoebel commented Apr 29, 2021

I'll do that in some hours. Don't have a mind for that now, sorry.

No rush!

I can do that, through it's definitely not something to push into production. We need growing arrays.

That's fair long term. I'm not sure what "production" means in this context... I'd say this could be merged with reasonable static limits for starters - with a todo to make it dynamic later. The CLI currently has static limits for everything. This type of dynamism has tremendous utility - even if it was capped. Would growing arrays be a reasonable requirement for 1.0, absolutely. :-) Perhaps that's all you meant, unsure. I think if this landed in 0.4 or 0.5 with limited that'd still be great.

Perhaps some of the code for this already exists in Wren proper and we could just pull over the "magic dynamic growing array" code from there?

I feel like what I'm doing here is helping validate your proof of concept. I really don't love C (though I'm getting more familiar with it again). I'm not really productive/effective in C. My module resolver is all Wren. :-) I think the interesting/hard piece here is the actual DLL loading/registration piece. If a Wren resolver isn't workable then someone can write the whole thing in C - it just won't be me. :-)

I'll muddle thru the DLL loading and registration though if it's clear cut and then at least we'll have something to test/play around with. This also touched on #78 as we need somewhere to load these libraries from - though I suppose they also could be local.

@ChayimFriedman2
Copy link
Author

I can do that when I'll find time. It also has some design difficulties (for example, exposing libuv API). But yours is good as a PoC.

@joshgoebel
Copy link
Contributor

It also has some design difficulties (for example, exposing libuv API). But yours is good as a PoC.

It doesn't need to be exposed (as in foreign), all the native module stuff (or at least the DLL/registration pieces) would be purely on the C side of things. Unless I'm misunderstanding you?

@joshgoebel
Copy link
Contributor

joshgoebel commented May 1, 2021

@ChayimFriedman2 It was all much easier than I thought (thanks to premake). I have the whole stack working. I'm loading a compiled dynamic library into the CLI dynamically. And if we assume (for now) that the module code also lives inside the DLL then they are 100% self-contained - so you don't need to have any .wren files at all, just the dynamic library. We'll still have to figure out where the CLI looks for libs (~/.wren probably), but after I clean it up it's probably not even that large of a patch.

// import time module from the essentials library
import "essentials:time" for Time
System.print(Time.now())
System.print(Time.highResolution())

https://github.com/joshgoebel/wren-essentials

You can build the library. I will try to clean up and push my wren-cli branch over the weekend.

@joshgoebel
Copy link
Contributor

joshgoebel commented May 1, 2021

Ok, I removed all the work I was doing on custom resolvers and left just the binary library stuff:

https://github.com/joshgoebel/wren-cli/tree/binary-libs

The key portions of interest in vm.c:

  if (pathIncludesLibrary(module)) {
    loadSharedLibrary(libName);
void loadSharedLibrary(char* libName) {
    uv_lib_t *lib = (uv_lib_t*) malloc(sizeof(uv_lib_t));
    int r = uv_dlopen("lib/libwren_essentials.dylib", lib);
    if (r !=0) { fprintf(stderr, "error with dlopen"); }
    registryGiverFunc registryGiver;
    if (uv_dlsym(lib, "returnRegistry", (void **) &registryGiver)) {
        fprintf(stderr, "dlsym error: %s\n", uv_dlerror(lib));
    }
    ModuleRegistry* m = registryGiver();
    registerLibrary(libName, m);
}

The path is hard coded now, but it all works. I've been using it inside my wren-essentials directory. Someone with more C foo than me needs to run with this and clean this up the rest of the way:

  • obviously libName should be used to find and load the correct library
  • if there is some nicer way to do the string manipulation with the :
  • we need to save these extra pointers somewhere so we can free them before termination (perhaps module Registry needs a few extra fields?)
  • how can we make C crash/abort when we hit those errors? exit() ? Actually this may not be necessary because the VM will stop anyways when it fails to load the modules (because the dylib couldn't be loaded).

@joshgoebel
Copy link
Contributor

Following a long conversation over on wren-lang/wren#868 I've ported @mhermier amazing mirror module work into my proof of concept wren-essentials dynamic binary library. Mostly I just wanted to get a feel for what was involved in such a port and how hard it might be to maintain going forward. If you're curious what was involved you can see my last post over on that thread. wren-lang/wren#868 (comment)

@joshgoebel
Copy link
Contributor

I do wonder if the "module registry" data-structure approach used in the CLI or more direct bindForeignMethodFn and config.bindForeignClassFn API is better... context: porting libraries that need to also live as Core patches (people might want to use them with embedded Wren in addition to Wren CLI)... perhaps instead of exposing a potentially size-limited "module/class/method registry" binary libraries just exposed bindForeignMethodFn and config.bindForeignClassFn hooks. Then it wouldn't matter whether there was 1 foreign method of 200.

This would be very similar to the approach taken by Core for it's optional modules. I guess nothing mandates they do it this way, just it's the pattern that both random and meta use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants