Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save (and load) compiled bytecode? #535

Open
jockm opened this issue Apr 27, 2018 · 51 comments
Open

Save (and load) compiled bytecode? #535

jockm opened this issue Apr 27, 2018 · 51 comments
Labels
api The VM's embedding API enhancement New feature or refinement to existing one vm The core Wren bytecode engine

Comments

@jockm
Copy link

jockm commented Apr 27, 2018

I am looking to use Wren on a microcontroller. I have done a quick and dirty test to prove that it runs and that reasonable enough scripts can run within the limited memory, however since I want to load scripts from an SD card I am left with an issue:

  • I don't want to waste precious RAM to load the script into ram, just to have it compiled (at run time on a 80Mhz CPU), so it can run.

I am giving the VM 50K (out of 64), but that means I don't have a lot of space left over for buffers and strings and such.

So what I would really like is a way to stream compiled bytecode into the system.


I may try an tackle this myself. It occurs to me that the "cheap and easy" way to tackle this might be to load and compile the source, and then save the VM as a binary that can be loaded. This isn't as ideal as having a wrenc compiler (ala luac), but it would be a useful first step.

Are there any impediments to this working, such as address fixups when saving/loading or the like?

@munificent munificent added the enhancement New feature or refinement to existing one label Apr 28, 2018
@munificent
Copy link
Member

Ah, I was wondering when this would come up. :)

I definitely think it's reasonable to want to separate out compilation from execution, especially for limited memory devices. There are a couple of challenges with it:

  1. The bytecode format is currently an implementation detail of the VM, and isn't a public, stable API. This is nice because it gives us the freedom to change the instruction whenever we want to add new features, optimize, etc. If we support compiling to external bytecode and loading it, that will put pressure on us to keep that format stable. We can declare that it's not supposed to be used as a long-term serialization format for code, but I expect users will still try to do that even if we tell them not to.

  2. I'm somewhat worried about the implementation complexity. Right now, we gain a lot of simplicity by assuming that the compiler has access to the runtime while it's compiling. Things like constants can be turned into runtime values right at compile time and stored in constant tables. We have access to the already-compiled runtime representation of imported modules, etc.

  3. This is kind of an extension of Range class should validate input parameters #2, but the REPL very much takes advantage of the fact that we can execute new code in the context of an existing module. Putting a firmer firewall between compilation and execution may make that difficult.

I'm going to leave this open because I do think there are valid use cases for it, and it would be cool to have. We could speed up start-up time of the VM if Wren's own core libraries were compiled ahead of time to bytecode and loaded directly from that.

I am giving the VM 50K (out of 64), but that means I don't have a lot of space left over for buffers and strings and such.

I wonder how much of that is used by debug information: function names and line number buffers. We could potentially save memory by being able to turn those off (at the expense of losing stack traces at runtime).

Also, the current representation of line info is very much not optimal.

@munificent munificent added the vm The core Wren bytecode engine label Apr 28, 2018
@jockm
Copy link
Author

jockm commented Apr 28, 2018

We can declare that it's not supposed to be used as a long-term serialization format for code, but I expect users will still try to do that even if we tell them not to.

Yeah for some use cases this is more or less essential.

Wren is such a good fit for what I am trying to do, but the performance and memory overhead of needing to compile at run time essentially quashes my ability to use it. Which is a shame.

We can declare that it's not supposed to be used as a long-term serialization format for code, but I expect users will still try to do that even if we tell them not to.

Because sometimes there is no other choice ;)

I would encourage you to take the approach that you use a version numbering scheme for the VM where minor numbers are improvements that don't break compatibility and major numbers do. The binary format for compiled binaries would carry this so the VM could check at load. Java does something like this.


I can tell you that if I decide to go down the road of serializing the state of the VM to disk that I will have to write the version number to that image so that it would reject loading of an image with a mismatched version.

Even that, with all the limitations it implies would be a fast improvement that shipping source and compiling it every single time the applet in question needs to be run.

@jockm
Copy link
Author

jockm commented Apr 29, 2018

I just realized the situation of using Wren in a low memory environment is even worse then I first thought.

On the page wren.io/embedding, you write:

You execute a string of Wren source code like so:

WrenInterpretResult result = wrenInterpret(vm,
"System.print(\"I am running in a VM!\")");

The string is a series of one or more statements separated by newlines. Wren copies the string, so you can free it after calling this.

So if I have a 8K source file I read into memory, this will consume 16K since you are making a copy of it.

This is exactly the behavior I want in a desktop or server environment; but is not what you want to hear when you are running in a memory restricted environment.

@munificent
Copy link
Member

Yeah, that API would be easy to change (or to add another one). There's nothing fundamental about why Wren copies the source string. It just needs to ensure the passed in string has the right lifetime. But we could support an API where you pass ownership with the string and then it doesn't need to copy.

@jockm
Copy link
Author

jockm commented Apr 29, 2018

I would personally vote for an additional "unsafe" api. Outside of resource constrained environments, the existing API is what you want

@mhermier
Copy link
Contributor

Argument 1 is a false excuse. if you take a look at lua, it make zero promise on the VM opcodes to stay consistant. So technically only the scripts are valid precious source/data.

Argument 2 and 3 are only true because of Meta.eval. Having the REPL builtin is nice, but maybe that should be optional. It would help to split the VM and minimal runtime from the quite noisy compilation infrastructure.

For wrenInterpret I think we should go wrenInterpret(WrenVM*, const char*, unsigned flags) in the general API, this would brings opportunities to also pass other parameters. Along with the string life control flag, I would see some extra options like compiler output control, optional disassemble or debugger enabling.

@munificent
Copy link
Member

Argument 1 is a false excuse. if you take a look at lua, it make zero promise on the VM opcodes to stay consistant.

Fair enough.

Argument 2 and 3 are only true because of Meta.eval. Having the REPL builtin is nice, but maybe that should be optional.

The meta module is already optional. What I meant here was more things like:

  • When compiling a literal, we produce the runtime value for it and store that directly in the constant table for the function being compiled. We don't have to define a serialization mechanism for constants so that they can be loaded later.

  • Constants, especially for things like functions, can refer to other objects in memory, and we can take advantage of the fact that the GC can find and trace all of those objects.

  • When compiling a method call, we can look up the method symbol in the VM's global table of method IDs.

The REPL does affect things because when compiling REPL expressions, we the compiler takes advantage of being able to tell what top-level variables are already in scope. I consider that an important use case for Wren in general.

Of course, we don't need to support that for offline compiling. We could say that if you're going to save and load bytecode, you have to do an entire module at a time.

For wrenInterpret I think we should go wrenInterpret(WrenVM*, const char*, unsigned flags)

Yeah, I expect that API to get a little more sophisticated over time.

@jockm
Copy link
Author

jockm commented Apr 29, 2018

The meta module is already optional. What I meant here was more things like:...

I have already discussed this for my quick and dirty approach (still digging in to look at feasibility), but I think the idea of saving and loading the VM's configuration, state, and memory image would be a good approach. And is very similar to how most implementations of smalltalk work.

So imagine we could issue the command:

./wren --freeze [imageFileName] wrenFile {wrenFile...}

So that wren will load and compile all the source and imported modules into memory — perhaps without the debugging information you mentioned previously — and then save that image out to a file.

This would mean the user could write and test as much as possible on the desktop, then make an image that would be loaded in the final environment.

This also means that the idea of having compile bytecode of mismatching versions would be less of an issue. It would be a good and useful step along the way, and would be immediately useful for people who don't want to distribute their core wren source for whatever reason.

@mhermier
Copy link
Contributor

mhermier commented Apr 29, 2018 via email

@jockm
Copy link
Author

jockm commented Apr 29, 2018

@mhermier I did call out the similarity to smalltalk and said it should be a step along the way, not the endpoint

As for what the command line option should be called, that is semantics and I trust @munificent to pick something appropriate

@munificent munificent added the api The VM's embedding API label Apr 30, 2018
@mhermier
Copy link
Contributor

Looked at some WebAssembly videos, and found some interesting ideas. We are near to be able to produce binary modules. We need 3 changes:

  • Don't hard-code injections of the core symbols in compileInModule, if we make the regular imports instead it would allow serialize module code.
  • Make the compiler do the same trick as method binding to module binding for globals, and patch methods get/set global offset fixing, so we make module code consistent and not loading order dependent.
  • Do the serialization before patching evidently :)

@Xed89
Copy link

Xed89 commented Jun 18, 2020

Is there any news or plan on this issue? I was wondering if i could use wren on a microcontroller too and would be interested :)

@ruby0x1
Copy link
Member

ruby0x1 commented Jun 18, 2020

Nothing on the compilation process has changed at this time so the above discussion is where it's at for now.

@jockm
Copy link
Author

jockm commented Jun 19, 2020 via email

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@jockm
Copy link
Author

jockm commented Jun 19, 2020

@mhermier You would compile the bytecode on a computer and then load it on the microcontroller. The bytecode is smaller and faster to load and also (and sometimes the most importantly) you aren't distributing your source code

Also as Wren is implemented now you have to load the entire source into memory and then compile it, so you need enough RAM for both bytecode and source in memory at startup. This very well might not be possible in some embedded environments

Additionally you don't have the compile time overhead at startup.

Remember that this isn't a feature request in a vacuum: MicroPython already does all of this and runs in as little as 8K of SRAM; eLua is similar; there are multiple JavaScript interpreters that run in less than 64K (and compile and load from bytecode); Squirrel has been made to run on some microcontrollers (see the Electric Imp); Tessel cross compiles JavaScript to Lua, then compiles the lua to bytecode, and executes the bytecode on the microcontroller; etc.

Nor in the desktop environment is it uncommon to be able to load and distribute just the compiled bytecode with scripting languages, because again sometime you don't want to distribute the source

For example I do a lot of projects with microcontrollers that have sub 1K of SRAM, but also ones with 32K-256K of SRAM.

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@jockm
Copy link
Author

jockm commented Jun 19, 2020

@mhermier Sadly no, because of the whole need to distribute source combined with the fact that the source had to be loaded into memory to compile it, I deemed wren unfit for purpose in 2018 after I raised this issue and it was clear @munificent wasn't interested in implementing this feature in the short term.

At the time I could prove that the need to load all of source into memory would break the project I was working on. I switched to micropython with a couple of projects using DukTape (a Javascript interpreter). Both of which have other features that make it useful for running in memory constrained environments

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@jockm
Copy link
Author

jockm commented Jun 19, 2020

@mhermier I hear you, but that person isn't me. I haven't looked at wren in years now. I only saw the updates to this issue because I forgot to turn notifications off.

I appreciate your data driven approach but to my mind there are only two facts you need to know to make this a compelling feature:

  1. A heavily commented program could be larger than the available memory but compile to a fraction of the original source size
  2. Not everyone wants to distribute their source

As for the second, who cares if it benefits embedded or not?

Most of the projects I work on (embedded or not) for clients that use a scripting language cannot distribute with source.

Wren is cool, but not more compelling than Python, or Javascript, or... if it doesn't meet my needs

@Xed89
Copy link

Xed89 commented Jun 19, 2020

I sense i might be of some help here, maybe in the coming months i'll catch up with the inner workings and see what i can contribute :D

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@jockm
Copy link
Author

jockm commented Jun 19, 2020

@mhermier Well micropython (note the lack of a dash) is written in C so there are no exceptions at the interpreter level per se. However the you will get an error code back and can generate stack trace using the C API

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@jockm
Copy link
Author

jockm commented Jun 19, 2020

Yes though it isn't as useful since positions aren't known, just as in regular python

@jockm
Copy link
Author

jockm commented Jun 19, 2020

Remember that bytecode compilation and saving is something common in many "full sized" languages as well. Nor does compilation have to happen on a file by file basis. DukTape — for example — lets you load all of your source and then save the current compiled state of everything to be loaded later. You suggested this as an option back in '18 IIRC

This model might suit Wren better. There are multiple ways of solving the problem

@mhermier
Copy link
Contributor

mhermier commented Jun 19, 2020 via email

@hiperiondev
Copy link

hiperiondev commented Jun 9, 2022

Hi to all,
I have started a port of wren for ESP32. The first test works fine. (https://github.com/hiperiondev/wren_esp32)

Has there been any progress on sending only the byte code by externally compiling the program? It would also be interesting, for this application, that the compiler could have an optional compilation, since resources are scarce in a microcontroller.

@mhermier
Copy link
Contributor

mhermier commented Jun 9, 2022 via email

@hiperiondev
Copy link

It saddens me to hear that.
Without these characteristics, it is unlikely that it can be adopted for low resource consumption developments.
The repo is this, I had written it wrong:
https://github.com/hiperiondev/wren_esp32

@jockm
Copy link
Author

jockm commented Sep 28, 2022

There are indeed various strategies. But until someone really stress/test the implementation everything needs to be done.

Look I understand and empathize with that statement, but it is also deeply frustrating to hear especially when @munificent when into various spaces to say that he thought Wren would be ideal for embedded devices... apparently without much understanding of the need, or why one wouldn't want to distribute source or have users endure runtime compilation in a microcontroller based device

And while everything always needs to be done, the need to always distribute source means that Wren is disqualified for so many potential uses, and still is more than 4 years after I first brought this up.

There is this chicken and egg problem with FLOSS projects, if Wren had been able to load precompiled bytecode (or a saved VM state) then I could have used it and been much more likely to test it, contribute to it and make it better.

I don't know what decisions what went into the triage of what to address first, but it feels like a case of letting perfect be the enemy of good enough

@mhermier
Copy link
Contributor

@jockm Please describe your use case. Embedded devices can rage from a system from 1kb of ram to Megabytes of ram and various clock speeds and other consideration.

That said, updated wrenalyzer so it translate a (not finished) dialect near to official wren that suits my needs so it transpile to wren (for now). I don't plan to provide full compilation for now and even if reached it will not be public for some time.

Compiling is far from an impossible task, but unless you bring solid arguments for it to happen, it will not magically happens because you think it is better. It should be tested to see if the benefits are marginals or not. Where are the numbers of the cost of a runtime compilation ? Can we have a better description of what we/you are trying to improve ?

@ruby0x1
Copy link
Member

ruby0x1 commented Sep 28, 2022

thought Wren would be ideal for embedded devices

Just wanted to note I've seen people say this when Wren is ideal for embedding. Embedded != embedding contextually, where Wren is really good is embedding in other applications and games for scripting. This can apply to portable and embedded stuff but as mentioned it can vary wildly due to hardware.

Here's Wren running on an Arduino M4 Grand Central board and the code for it that went into getting it to fit. Here's a thread on the sizes of things in the VM and how they brought it down to 50% and down to 39k etc.

I mention this specifically because it shows that Wren is not designed for embedded, it makes trade offs to favour speed over memory use, simplicity over complexity and hackability and relies on 64 bit doubles etc. This fits a wide range of uses (including some embedded), and is primarily designed to be embedded in other applications, not designed for embedded hardware per se 💯

So while bytecode would be neat, it's just one part of the puzzle. It would almost surely make embedded stuff more possible, but since that's not really where Wren shines, there'll be a bunch more stuff come up immediately after and it's not going to be a straight path a lot of the time.

@jockm
Copy link
Author

jockm commented Sep 28, 2022

@jockm Please describe your use case. Embedded devices can rage from a system from 1kb of ram to Megabytes of ram and various clock speeds and other consideration.

Please refer to the original issue

But again, even without space/performance issues: the need to distribute source is a showstopper for most non FLOSS embedded devices, no?

@ruby0x1
Copy link
Member

ruby0x1 commented Sep 28, 2022

Not really no. You don't have to distribute the plain source, you can obfuscate/encrypt whatever the contents first (I do this, but typically they're also packed into binary pack files anyway).

The issues of the client having the code and it being in client memory will never go away. This means they can just dump the code from memory trivially, whether it's source or bytecode. Writing a bytecode -> wren converter is also trivial. It's a trade off on how much time you're willing to spend on something like that. For me it's quite low.

@mhermier
Copy link
Contributor

Do you have some better description of the hardware ? an emulator or something else ? 64Kb is really constrained, I'm already surprised it fit there. I would analyze the binary to see where memory is really used, and start to made trades of from there.

@jockm
Copy link
Author

jockm commented Sep 28, 2022 via email

@jockm
Copy link
Author

jockm commented Sep 28, 2022

@ruby0x1

Not really no. You don't have to distribute the plain source, you can obfuscate/encrypt whatever the contents first (I do this, but typically they're also packed into binary pack files anyway).

That is still distributing source, and I may have multiple reasons including requirements from my clients that prevent that. Wanting to distribute only bytecode is a perfectly reasonable request, as well as not wanting to go through startup compilation every time

@ruby0x1
Copy link
Member

ruby0x1 commented Sep 28, 2022

It's not unreasonable (I don't remember suggesting that :)). I want it to exist, it's on the list of things that would be nice to have.

It's a lot of change to get there, the way the vm + compiler works doesn't allow it as is.

@jockm
Copy link
Author

jockm commented Dec 1, 2022

Do you have some better description of the hardware ? an emulator or something else ? 64Kb is really constrained, I'm already surprised it fit there. I would analyze the binary to see where memory is really used, and start to made trades of from there.

There are multiple microcontrollers that fit the description i gave. At the time I first filed this issue I was using an STM32L432 microcontroller. 256K of FLASH, 64K of SRAM, operating at 80Mhz. The device I was making for the client also had a microsd that held data files, micropython bytecode, etc

@aosenkidu
Copy link

aosenkidu commented Jan 30, 2023

I can tell you that if I decide to go down the road of serializing the state of the VM to disk that I will have to write the version number to that image so that it would reject loading of an image with a mismatched version.

I'm half assed working on serializing/suspending the running image to disk and being able to restart it later again. Think about Smaltalk images. But it is a slow work in progress as I suffer a bid from post Covid "mind fog".

What I did so far - on Mac OS only, no make/Cmake support yet, is translating the https://github.com/wren-lang/wren-cli project with C++ compiler settings. As such it is treating the star-dot-c files as C++. Caused a few hundred errors regarding assigning void-pointers (and a few simple pointer type mistakes) to a lhs-pointers of a well defined type. So I added casts. Wanted to script that with an awk or perl, but was to foggy - see above. So I did it manually. AFAIKT all works fine. Did not run the build in tests yet.

My reasoning to switch to C++ is, to use a kind of smart pointer (implemented as template in C++), which is addressing the VM relatively from the VM memory base address.

Not sure how I personally would treat source code, I actually find the idea to keep all sources (and probably even make them modifyable and recompilable kind of neat) - would a special garbage collection, or a special memory region for the compiler do the trick?

I also upgraded the Wren-CLI to load an arbitrary number of files on command line - instead of only one. Executing all of them (in the same VM obviously), and staying in REPL mode afterwards.

@aosenkidu
Copy link

Most of the projects I work on (embedded or not) for clients that use a scripting language cannot distribute with source.

I'm pretty sure it can. It is just that your customer or his customer does not want, too. If part of the logic on the embedded device can be done in a very high level language it helps everyone.

There is nothing really more easy than reverse engineer some compiled code on a micro controller. Most of the time you even know what C compiler one would use, and the "macros" that one is using to compile the code.

Converting a binary into a readable C program on a small device is at max a few days work. Giving the source out usually does not hide any "trade secret". However mixing a scripting engine in - and only delievering byte code, might increase the effort needed for reversal.

What I want to say: not giving access to the source on a small device, does not really protect a trade secret. A contract however will.

@jespa007
Copy link

Hi,
First my apologies if I post a question that it hasn't nothing to do with this thread. I wonder why any interpreted language as wren, lua, micropython, etc is really a requirement to be on a constrained hardware as described in this thread (i.e 64K ROM and 20KB RAM). Even though any interpreted languange it fits in the ROM and executes (like micropython) it will not manage the memory and speed execution as better as it does in C. Why don't do it in C directly ?

@jockm
Copy link
Author

jockm commented Aug 17, 2023 via email

@jespa007
Copy link

Why not? If pure speed isn't a concern, then why not work in a language that is more pleasant and less prone to memory issues and other landmines? In my case I often use micropython with a native core and it allows for much more rapid experimentation

I very much agree with you that interpret language highs productivity , so I would use that too.

But why do you think C is magically better? Why do you assume speed is of paramount importance?

I don't say the C is magically better but if the execution of the main program is a for-while-loop that runs forever in interpreted language it would consumes more CPU cycles and more memory rather if it was on native code ... so in a long term of using it will consume more power and resources and so the lifetime of the component.

But if you are telling me that the main loop is done in C and other parts done on interpreted language to load or test some parts of the program this is perfect :-)

I posted this question from my ignorance. I've never tried execute an interpreted language on a limited hardware.

@jockm
Copy link
Author

jockm commented Aug 17, 2023

Your wording really came off as presupposing that C was better, at least to me. But also you seem to be defining better as fast. Very few embedded programs are compute bound. Most programs spend most of their time waiting for input, either from another part of the system or from a user. If it is from a user then the time between inputs is glacial to the CPU.

The microcontroller I used in my example was a ARM Cortex M4F, which means it ran at around 1.25/Mhz. So even if a user hammered on a button 100 times a second, the vast majority of the time would still be spent idling waiting on an interrupt.

The next most intensive thing it did was write to a display, but that was just setting up a DMA transfer and sending it out, so it effectively happened in the background. So the compute bound tasks were building the display, and a bunch of decision logic. While it would be faster in C, the user and overall system would never see the difference between compiled and interpreted code.

Finally on the topic of performance. I just want to share some information to give some context on speed. The Xerox Alto, the machine the some of the first GUIs were written for, had 64Kilowords of RAM (so 128K) and ran at about 1/500th the speed of that M4F (if I am doing my math right). Speed is relative. Would I want to compute pi to 10,000 digits in micropython? No. But then I wouldn't want to use a microcontroller for that task as well.

@mhermier
Copy link
Contributor

mhermier commented Aug 17, 2023 via email

@jockm
Copy link
Author

jockm commented Aug 17, 2023

@mhermier Not of the top of my head, and it gets a bit complicated. Since we are talking about in order, little to no speculative execution, no burst, non superscalar, etc cores; most of the power consumption comes from: the io subsystems, how much current is being pulled by the gpio pins, and how long the CPU is allowed to sleep waiting for an interrupt.

Computational load doesn't really matter for current consumption as much as the duration between sleeps. Sleeping wouldn't be a function of the interpreter, but a native call run by the program when it wants to sleep. Though I have seen some interpreters get clever and set a timer and then sleep for things like delay().

So in theory an interpreted program would run longer between calls to put the system to sleep, but in the grand scheme of things you aren't talking about that much difference.

This also assumes the embedded system is trying to maximize for power consumption, this is far from the case.

To give you a real world example, the microcontroller in my original example was pulling about 10ma most of the time when active, and that included the display. Something like a microSD can pull as much as 500ma, when active, but most I tested are less than 5ma on read.

But something like the Nordic nRF53 series (a very popular bluetooth microcontroller) pulls 0.5ma when the UART is on, and much less than that when it is off at 64Mhz.

@mhermier
Copy link
Contributor

mhermier commented Aug 17, 2023 via email

@jockm
Copy link
Author

jockm commented Aug 17, 2023 via email

@whoozle
Copy link

whoozle commented Feb 23, 2024

👍 I'm really keen to have it as well - I'd really like to ship binary code to the production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api The VM's embedding API enhancement New feature or refinement to existing one vm The core Wren bytecode engine
Projects
None yet
Development

No branches or pull requests

9 participants