Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Module Loading #2746

Closed
zephyrbot opened this issue Nov 10, 2016 · 33 comments
Closed

Dynamic Module Loading #2746

zephyrbot opened this issue Nov 10, 2016 · 33 comments
Assignees
Labels
area: ARC ARC Architecture area: ARM ARM (32-bit) Architecture area: NIOS2 NIOS2 Architecture area: X86 x86 Architecture (32-bit) area: Xtensa Xtensa Architecture Feature A planned feature with a milestone

Comments

@zephyrbot
Copy link
Collaborator

zephyrbot commented Nov 10, 2016

Reported by XiaoHua Yang:

As a user, I hope zephyr support dynamic loading and linking of modules at run-time. This is useful in applications in which the behavior is intended to be changed after deployment. The module loader can load, relocate, and link standard ELF files that can optionally be stripped off their debugging symbols to keep their size down.

(Imported from Jira ZEP-1264)

@zephyrbot zephyrbot added area: ARC ARC Architecture area: ARM ARM (32-bit) Architecture area: NIOS2 NIOS2 Architecture area: X86 x86 Architecture (32-bit) Enhancement Changes/Updates/Additions to existing features labels Sep 23, 2017
@nashif nashif added Feature Request A request for a new feature and removed Enhancement Changes/Updates/Additions to existing features labels Oct 20, 2017
@nashif nashif added this to Parking lot in Release Plan Feb 7, 2018
@nashif nashif moved this from Parking lot to v1.14 in Release Plan Mar 10, 2018
@nashif nashif moved this from v1.14 to Parking lot (1 year outlook) in Release Plan Jun 19, 2018
@hongshui3000
Copy link
Contributor

I wonder if there is any plan to support it in future roadmap?

@mgiaco
Copy link

mgiaco commented Feb 20, 2020

I am too

@carlescufi carlescufi removed this from Parking lot (1 year outlook) in Release Plan Mar 10, 2020
@dcpleung
Copy link
Member

Adding the ability to dynamically allocate stack should be relatively easy at this point.

As for memory space, with MMU, each module can be assigned a virtual memory region. At runtime, the MMU takes care of the mapping. Without MMU, it is going to be tricky.

@hongshui3000
Copy link
Contributor

hongshui3000 commented Sep 22, 2021

In the old version of contiki-ng, an elf loader is included. The version after contiki-ng 4.0 does not seem to be supported anymore.
The implementation of the old version should be used as a reference
https://github.com/contiki-ng/contiki-ng/releases/tag/3.x
contiki-ng-3.x.zip\contiki-ng-3.x\core\loader

@nashif nashif added this to LTS3 in Release Plan Sep 22, 2021
@galak
Copy link
Collaborator

galak commented Sep 22, 2021

I think it would be good to explain the use cases a bit more. For example do you need to unload the module and load other modules? Do you need to load more than one module?

There could be some simplifying assumptions that we could make vs having a general module loader.

@0Grit
Copy link

0Grit commented Sep 22, 2021

I have always wanted to use PIE/PIC (Position Independent Executable / Position independent code) with microcontrollers.
Regardless for an MCU some form of dynamic linking will be needed if this use case expects to do it purely within a C/C++ runtime? We are working with a non-virtualized sometimes not contiguous address space.

My initial use case for dynamic linking as a POC would be to create a sort of 3rd stage bootloader that can manage flash partitions, link, and launch generically formatted firmware images.

@galak
Copy link
Collaborator

galak commented Sep 22, 2021

@attzonko how much portability of "modules" is needed? would there need to be ability to build a module that could run against any zephyr build?

@yonsch
Copy link
Contributor

yonsch commented Sep 23, 2021

I tried doing something similar a few weeks ago. In my case, I wanted a way to run "scripts" on a running MCU - that is, to reserve a buffer in RAM for the code and data, write some code, compile it, send it to the MCU in some way (e.g. UDP) and receive the result of running that code. I did it mostly for ease of development - If I want to experiment with something (e.g. I'm writing a driver for a new device), I can write code and load it, without having to recompile and reflash everything. Similar to experimenting with IPython before writing actual code. (This could also be taken further, to implement a REPL on the MCU).
Another use for this is to debug a running system (without access to a debugger). If something doesn't work, I can use the dynamic module to retrieve some information (that I didn't know I would need when compiling), like reading the value of some register.
With a slightly more complicated loader that allows for storing a few modules in flash and loading them, I can think of two more use cases:

  • Separating the application to one main application and a few modules, so that if one module needs an upgrade, it could be upgraded without requiring the other modules and the main application to be upgraded (useful for OTA upgrades).
  • Allowing a user to write plugins for a product.

@hongshui3000
Copy link
Contributor

hongshui3000 commented Sep 24, 2021

Let me add one point:
Some applications can reduce the amount of RAM usage, such as an audio player application, which may need to encode and decode multiple audio formats. We can load a certain audio codec when needed, instead of loading all the codecs to the instruction ram at once, if the system supports dynamic loading of modules

@paulrouget
Copy link

Would using wasm help in any way?

https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wamr.md

@nashif nashif added Feature A planned feature with a milestone and removed Feature Request A request for a new feature labels Oct 29, 2021
@chen-png
Copy link
Collaborator

As for memory space, with MMU, each module can be assigned a virtual memory region. At runtime, the MMU takes care of the mapping. Without MMU, it is going to be tricky.

hi @dcpleung when I load a module into memory, how do I allocate a memory space for those platforms without MMU?
for x86 and some arm platforms, they have MMU and we could use k_mem_map to allocate a virtual memory, but if without MMU, how do I allocate a memory space? Could I reserve a buffer at build time, i.e, I want to define a buffer and link it into a RWE section to load modules, is this OK?

@0Grit
Copy link

0Grit commented Nov 23, 2021

Probably would need to implement some advanced dynamic memory allocation service?

@dcpleung
Copy link
Member

One way to do this is to dedicate the free memory into a mem_slab and allocate blocks from there.

@chen-png
Copy link
Collaborator

One way to do this is to dedicate the free memory into a mem_slab and allocate blocks from there.

if we plan to use mem_slab, then I think all platforms could support it, needn't to care about MMU.
And, I think using heap or mem_slab, it should be same, they all reserve a memory statically at build time, maybe heap is better, because some sections have address alignment requirement, and heap could allocate aligned memory using k_heap_align_alloc API.

@paulrouget
Copy link

FYI I managed to implement dynamic loading (very similar to dlopen()) from a blob on Flash thanks to this little tool: https://github.com/bogdanm/udynlink

@beriberikix
Copy link

@paulrouget nice! Any sample code you can share?

@chen-png
Copy link
Collaborator

hi all, I tried to implement a draft PR #41700 for this, could you help review it and give some suggestions, about the feature requirements or technical details, all are welcome, thanks.

@erik-ks
Copy link

erik-ks commented Mar 6, 2022

Hi - Some thoughts on (initial) use cases and feature requirements. They assume base kernel/system is non-PIC.

Use cases - leverage dynamic module to implement

  • a reduced base image size (ie part that needs to be dual-banked)
  • a factory self-test that is discarded after successful completion
  • load/update of a new AI model
  • alternate codecs
  • alternate control interfaces
  • TBD if additional storage can be implemented as a dynamic module

Requirements

  1. system shall be able to operate even if all dynamic modules fail to load. Motivation: All modules could be single-banked and updated/retrieved after base system has been upgraded.
  2. system shall provide a method for securely validating modules before they are loaded. May be performed at time module is installed.
  3. system shall optionally support remotely retrieving and loading a new dynamic module. Only pre-loading or asynchronous loading required.
  4. Shall support systems both with and without MMU, where systems without MMU shall minimally meet all non-optional requirements (or is it sufficient to meet at least one of identified use cases?).
  5. TBD if (some) device drivers may be implemented as dynamic modules
  6. Performance req: implementation shall not require an interpreter. (Need a more succinct definition. Intent is that code execution of a dynamic module is close to that of non-PIC code)

Note: This likely needs to be broken up into several features.

@povergoing
Copy link
Member

Hi all,
Out of curiosity, is there any timeline or plan, or deadline for this feature?
But are there any conflicts between this dynamic modules feature and the descriptions here

Security Functionality with a focus on cryptographic algorithms and protocols. Support for cryptographic hardware is scoped for future releases.The Zephyr runtime architecture is a monolithic binary and removes the need for dynamic loaders , thereby reducing the exposed attack surface.

@rockdevice
Copy link

So after read all above, does this RTOS has any plan actually is on-going for this features "dynamic elf module loader"? Thanks. It is really usefual for smart-watch & smart-home IoT device, which needs to update firmware functionality from time-to-time..

@vChavezB
Copy link
Contributor

vChavezB commented Dec 9, 2022

In case you will support this, there is good starting point from here (ARM)

https://github.com/bogdanm/udynlink

Dynamic loading for MCUs requires some scripting to generate the modules with PIE (e.g. a custom image header + binary data).

In addition other questions to be thought of are, would they be run as kernel modules or user applications (i.e. non-priviledge/ privileged-mode). If the modules need access to the zephyr api, all the symbols that the "module" can access need to be defined and linked at run-time. Also if the modules access the zephyr api you would probably need to define which version of the API it uses as function prototypes might change and can create undefined behaviour if modules call API from distinct versions. However, if the modules are intended just for algorithms without access to any type of API then it would be simpler.

Oh and lets not forget about debugging our dynamic "modules". You would need a python GDB script to load the correct debugging files (DWARF) automatically so you could access the external modules while debugging.

My personal opinion: Im not sure how this would scale as Zephyr supports more architectures. Im sure you would need to write assembly for each port (ARM, xtensa, risc) to load correctly the modules. And then the modules would need to be compiled depending on the architecture they are going to be loaded to (perhaps some metadata + crc + signature).

Just my 2 cents on somewhat my experience with ARM dynamic loading.

@hongshui3000
Copy link
Contributor

hongshui3000 commented Jan 7, 2023

PIC (Position Independent Executable / Position independent code)
reference implementation :https://www.tockos.org/blog/2016/dynamic-loading/#fn:1
https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/

@teburd teburd self-assigned this Apr 20, 2023
@teburd teburd added the area: Xtensa Xtensa Architecture label Apr 20, 2023
@teburd
Copy link
Collaborator

teburd commented Apr 20, 2023

I intend on taking this on and continuing the great work started by @chen-png

@EternityForest
Copy link

I think this would still be highly useful even if it only supported one or two platforms, since right now, this kind of functionality isn't really available on MCUs at all. I'm actually not even sure position independent code is needed at all, everything could be done with a configurable fixed number of "Slots".

Imagine you wanted to make a game console. You would just need 2 slots, the main one that could either be a launcher, or a complete game(I'm assuming there would be a way for a program to ask the OS to replace it with another file), and a supervisory slot that one could use to return to the main menu or handle any other tasks that had to be always on.

Even the supervisory slot could be just built into the firmware.

Or, suppose one wanted to make a smart outlet. You'd probably just need the main "App" that talks to the smart platform, that you'd probably want to be able to switch out to avoid obsolescence, and maybe one extra accessory app to add some fancy extra timing thing.

Even if there was only one module you could load at a time it would still be rather useful for a large number of tasks, because you could just put everything you want to do right into that module.

Imagine if it was as easy as making an Arduino sketch and exporting as a module file you could save on an SD card!

@yonsch
Copy link
Contributor

yonsch commented Jul 12, 2023

I tried doing something similar a few weeks ago. In my case, I wanted a way to run "scripts" on a running MCU - that is, to reserve a buffer in RAM for the code and data, write some code, compile it, send it to the MCU in some way (e.g. UDP) and receive the result of running that code.

So today I finally got back to this idea and got a POC working.

In my application, I have a large buffer plugin_buffer, together with functions for loading it with code and running it. The plugin is compiled (see below), loaded into the buffer, and executed. Execution is performed by casting the buffer into a function pointer and then calling it.
To compile the plugin, I have a script that analyzes the running executable (zephyr.elf) and generates a linker script, that is used to compile the plugin. The linker script provides the address of the plugin buffer, but also the addresses of all symbols that are currently in the executable. This means that the plugin can call functions that are already inside the executable such as z_impl_k_sleep or printf. For illustration:

MEMORY
  {
    RAM [rwx] : ORIGIN = 0x2001400, LENGTH = 1024
  }
...
PROVIDE(z_impl_k_sleep = 0x800a003);
...

This can be seen as a RAM function that is linked separately from the rest of the application.

It works nicely as a POC, but it's a bit problematic:

  • Security - This is remote code execution by definition
  • Safety - A faulty plugin can crash the system
  • Portability - There are many challenges to solve for every platform this runs on - caches, MPUs, long jumps, etc.
  • Complexity - Compiling the plugin requires knowledge of the original executable and how it was compiled. Without it compilation may be impossible.

On the other hand, it works, and it has a very small overhead - no dynamic loading and linking is needed. This is very useful for constrained devices.
Let me know what you think, if you find it worthwhile I might submit this as a pull request.

@eanderlind
Copy link

For the security and safety aspects, would it be possible to scan the code during the compile/link phases to e.g. assign a security class and then define a signing mechanism that the running host could verify and decide if wants to load it?
(this assumes host has a dynamic loader that can verify the image or at least invoke such a call)
One could eg limit the ability to perform long-jumps to within the RAM buffer you defined. Script could
potentially also scan script for other unsafe calls (eg no direct calls to kernel code)

A while back I saw a paper that uses indirection pointers instead of direct jumps. Idea is that instead of deciding
jump addresses during build phase, could reserve memory location and populate them during the load phase or on first invocation using a symbol mapping table. This would e.g. permit a loaded module to call out to two alt host images (at different FLASH/RAM offsets) depending on which slot in a dual-slot system is active.

@ngm0
Copy link
Contributor

ngm0 commented Jul 31, 2023

Let me know what you think, if you find it worthwhile I might submit this as a pull request.

@yonsch sure, it would be great to see how your POC works, even if not ready for merging; securing the environment will obvious take a bunch more work, but it's possible to foresee a "loader" that checks code signatures etc. before executing any code.

@ujur007
Copy link

ujur007 commented Aug 2, 2023

Nice discussions and some initial results as well. I was here to look if OpenCV models can be loaded to tiny MCU platforms. Cause this model is big in size, could be that dynamic linking can reduce the overall size.

@mkschreder
Copy link

As for memory space, with MMU, each module can be assigned a virtual memory region. At runtime, the MMU takes care of the mapping. Without MMU, it is going to be tricky.

hi @dcpleung when I load a module into memory, how do I allocate a memory space for those platforms without MMU? for x86 and some arm platforms, they have MMU and we could use k_mem_map to allocate a virtual memory, but if without MMU, how do I allocate a memory space? Could I reserve a buffer at build time, i.e, I want to define a buffer and link it into a RWE section to load modules, is this OK?

MMU is not needed. Without MMU these will have to be compiled as position independent code that never uses absolute references and can be placed anywhere in the physical ram. It just needs to be linked at runtime to kernel symbols.

@mkschreder
Copy link

I tried doing something similar a few weeks ago. In my case, I wanted a way to run "scripts" on a running MCU - that is, to reserve a buffer in RAM for the code and data, write some code, compile it, send it to the MCU in some way (e.g. UDP) and receive the result of running that code.

So today I finally got back to this idea and got a POC working.

In my application, I have a large buffer plugin_buffer, together with functions for loading it with code and running it. The plugin is compiled (see below), loaded into the buffer, and executed. Execution is performed by casting the buffer into a function pointer and then calling it. To compile the plugin, I have a script that analyzes the running executable (zephyr.elf) and generates a linker script, that is used to compile the plugin. The linker script provides the address of the plugin buffer, but also the addresses of all symbols that are currently in the executable. This means that the plugin can call functions that are already inside the executable such as z_impl_k_sleep or printf. For illustration:

MEMORY
  {
    RAM [rwx] : ORIGIN = 0x2001400, LENGTH = 1024
  }
...
PROVIDE(z_impl_k_sleep = 0x800a003);
...

This can be seen as a RAM function that is linked separately from the rest of the application.

It works nicely as a POC, but it's a bit problematic:

  • Security - This is remote code execution by definition
  • Safety - A faulty plugin can crash the system
  • Portability - There are many challenges to solve for every platform this runs on - caches, MPUs, long jumps, etc.
  • Complexity - Compiling the plugin requires knowledge of the original executable and how it was compiled. Without it compilation may be impossible.

On the other hand, it works, and it has a very small overhead - no dynamic loading and linking is needed. This is very useful for constrained devices. Let me know what you think, if you find it worthwhile I might submit this as a pull request.

Very nice. Can you link to code?

Regarding your concerns, unfortunately without memory manager in hardware it is not possible to achieve complete isolation. This is the benefit of MMU. On the other hand, all code running on a system with raw physical memory should be trusted code. You may as well write that code into external flash and then execute in place (xip). I would very much like to see your solution and see if it can be adopted for such use case.

@nashif
Copy link
Member

nashif commented Dec 19, 2023

@teburd let's close this as done.

@teburd teburd closed this as completed Dec 19, 2023
@kinsamanka
Copy link

If anyone is looking for a udynlink POC implementation, here's mine:

https://github.com/kinsamanka/udynlink-app

@vChavezB
Copy link
Contributor

vChavezB commented Jan 7, 2024

If anyone is looking for a udynlink POC implementation, here's mine:

https://github.com/kinsamanka/udynlink-app

Just for your info @kinsamanka , they already implemented a dynamic module in subsys as llext

https://docs.zephyrproject.org/latest/services/llext/index.html#linkable-loadable-extensions-llext

regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARC ARC Architecture area: ARM ARM (32-bit) Architecture area: NIOS2 NIOS2 Architecture area: X86 x86 Architecture (32-bit) area: Xtensa Xtensa Architecture Feature A planned feature with a milestone
Projects
No open projects
Release Plan
  
LTS3
Development

No branches or pull requests