Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long living engine consumes endless memory when modules are loaded from file #4377

Open
webmaster128 opened this issue Dec 28, 2023 · 10 comments
Assignees
Labels
bug Something isn't working 📦 lib-api About wasmer 📦 lib-engine About wasmer-engine priority-medium Medium priority issue project-confio
Milestone

Comments

@webmaster128
Copy link
Contributor

Describe the bug

When using a single long running headless engine in combination with many loads from a file system cache, the memory usage increases constantly. If modules are just instantiated from in-memory Module instances, this does not happen. This was observed by CosmWasm users.

The issue can be reproduced using the following adapted Wasmer example: master...webmaster128:demo-memory-growth. What this is doing is:

  1. Compile a module and store it to disk in a compiling engine. Singlepass/Cranelift makes no difference. Drop that engine.
  2. Create a headless engine for executing modules
  3. Load modules from disk and drop them later

Steps to reproduce

Expected behavior

The memory usage of the example stays below a reasonable bound because the deserialized Modules are dropped

Actual behavior

Running the above example leads to the following memory profile:
Bildschirmfoto 2023-12-28 um 09 30 19

A few notes here:

  • This slows allocation tracking by xcrun xctrace record --template 'Allocations' --launch ./target/release/examples/engine-headless
  • This run peaks at 27GB memory usage, but we can get arbitrarily high by increasing the number of module deserializations
  • Where the blue line is the systems starts to swap
  • There is a 5s pause at the beginning and end of the program. Once the runtime engine is dropped, memory is low again.

A different memory profiling approach shows the route through which the majority of allocations are created:

Bildschirmfoto 2023-12-23 um 10 30 03

Additional context

OS:

  • reproducible example is developed on ARM-based macOS
  • same problem occurs on x86_64-based GNU Linux systems

Wasmer version:

commit 04600507844bae978239531c8e6265d410aefe6d (HEAD -> master, tag: v4.2.5, wasmerio/master, wasmerio/HEAD, origin/master)
Merge: 1b803c93a7 04c49aee4e
Author: Arshia001 <Arshia001@users.noreply.github.com>
Date:   Sat Dec 23 13:30:03 2023 +0400

    Merge pull request #4374 from wasmerio/release-4.2.5
    
    Release 4.2.5

This problem started since we refactored our caching solution such that we only use a single long living runtime engine instead of having to cache (Module, Engine) pairs. It might have been in Wasmer for a long time.

Copy link

linear bot commented Dec 28, 2023

@webmaster128
Copy link
Contributor Author

webmaster128 commented Dec 30, 2023

Turns out the static global FRAME_INFO is increasing every time a module is deseralized from file (through register_frame_info aka. fn register in lib/compiler/src/engine/trap/frame_info.rs).

The following debug logs

diff --git a/lib/compiler/src/engine/trap/frame_info.rs b/lib/compiler/src/engine/trap/frame_info.rs
index 4222acc1ad..7d714ea746 100644
--- a/lib/compiler/src/engine/trap/frame_info.rs
+++ b/lib/compiler/src/engine/trap/frame_info.rs
@@ -239,5 +239,15 @@ pub fn register(
         },
     );
     assert!(prev.is_none());
+
+    let frame_info_ranges = info.ranges.len();
+    let total_functions: usize = info.ranges.values().map(|mifi| mifi.functions.len()).sum();
+    let total_frame_info: usize = info
+        .ranges
+        .values()
+        .map(|mifi| mifi.frame_infos.len())
+        .sum();
+    println!("Elements of FRAME_INFO.ranges: {frame_info_ranges}, total functions: {total_functions}, total frame infos: {total_frame_info}");
+
     Some(GlobalFrameInfoRegistration { key: max })
 }

print the output

[...]
Creating headless Universal engine 9992...
Elements of FRAME_INFO.ranges: 9992, total functions: 4606312, total frame infos: 4606312
Creating headless Universal engine 9993...
Elements of FRAME_INFO.ranges: 9993, total functions: 4606773, total frame infos: 4606773
Creating headless Universal engine 9994...
Elements of FRAME_INFO.ranges: 9994, total functions: 4607234, total frame infos: 4607234
Creating headless Universal engine 9995...
Elements of FRAME_INFO.ranges: 9995, total functions: 4607695, total frame infos: 4607695
Creating headless Universal engine 9996...
Elements of FRAME_INFO.ranges: 9996, total functions: 4608156, total frame infos: 4608156
Creating headless Universal engine 9997...
Elements of FRAME_INFO.ranges: 9997, total functions: 4608617, total frame infos: 4608617
Creating headless Universal engine 9998...
Elements of FRAME_INFO.ranges: 9998, total functions: 4609078, total frame infos: 4609078
Creating headless Universal engine 9999...
Elements of FRAME_INFO.ranges: 9999, total functions: 4609539, total frame infos: 4609539
Creating headless Universal engine 10000...
Elements of FRAME_INFO.ranges: 10000, total functions: 4610000, total frame infos: 4610000

This probably means the GlobalFrameInfoRegistration instance is not dropped:

/// An RAII structure used to unregister a module's frame information when the
/// module is destroyed.
pub struct GlobalFrameInfoRegistration {

Debugging the drop implementation of GlobalFrameInfoRegistration shows that all the unregistrations are done when the engine is dropped, not when the artifact is dropped. This explains why we have a low memory usage at the end of the program. This is probably due to this code:

        artifact
            .internal_register_frame_info()
            .map_err(|e| DeserializeError::CorruptedBinary(format!("{:?}", e)))?;
        if let Some(frame_info) = artifact.internal_take_frame_info_registration() {
            engine_inner.register_frame_info(frame_info);
        }

Now I wonder if this is a bug or the desired behaviour.

@Arshia001 Arshia001 self-assigned this Jan 2, 2024
@Arshia001 Arshia001 added bug Something isn't working 📦 lib-api About wasmer 📦 lib-engine About wasmer-engine priority-medium Medium priority issue labels Jan 2, 2024
@Arshia001
Copy link
Member

@webmaster128 thank you for reporting this issue. I will look into this.

@Arshia001 Arshia001 added this to the v4.3 milestone Jan 2, 2024
@webmaster128
Copy link
Contributor Author

Do you have some initial thoughts if and how this can be fixed? Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

@Arshia001
Copy link
Member

Optimally, I want to fix this in time for the next release.

Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

Tagging @theduke @Michael-F-Bryan to answer this, I don't have a full picture of the implications.

@jost-s
Copy link

jost-s commented Jan 5, 2024

I'm tracing your steps here, @webmaster128 ;-) I'm running into the same issue during benchmarks with a long-lived runtime engine. Thanks for reporting this and the reproduction is spot on too. It doesn't need to be deserialization from file by the way, the same happens for modules kept in memory.

fn create_module(runtime_engine: &Engine) -> anyhow::Result<Module> {
    let compiling_engine = Engine::default();
    let wat = r#"
    (module
        (func (export "test"))
    )"#;
    let mut module = Module::new(&compiling_engine, wat.as_bytes())?;
    let serialized_module = module.serialize()?;

    for _ in 0..100 {
        module = unsafe { Module::deserialize(runtime_engine, serialized_module.clone())? };
    }

    Ok(module)
}

pub fn wasm_module(c: &mut Criterion) {
    let mut group = c.benchmark_group("wasm_module");

    let runtime_engine = Engine::headless();
    group.bench_function(BenchmarkId::new("wasm_module", "module"), |b| {
        b.iter(|| create_module(&runtime_engine))
    });

    group.finish()
}

criterion_group!(benches, wasm_module,);

criterion_main!(benches);

Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

It seems to me that if you're deserializing modules that are dropped like in the above example, it's a memory leak holding on to the artifacts that are not accessible any longer.

@webmaster128
Copy link
Contributor Author

it's a memory leak holding on to the artifacts that are not accessible any longer

The artifacts are dropped as expected at the right time (which I tested too along the way). The problem is that there is a bunch of memory for each deserialization that is owned by the engine instead of the artifact. So it lives as long as the engine. Once the engine is dropped, everything is nice and clean.

@jost-s
Copy link

jost-s commented Jan 6, 2024

Okay, then the memory should be freed when the artifact is dropped, shouldn't it? What else is it kept for? Or the memory should be owned by the artifact instead and will be dropped along with the artifact that way.

@webmaster128
Copy link
Contributor Author

I can confirm that caching a (Module, Engine) pair is a workaround for this issue because then the Engine does not live much longer than the artifact. We did this here: CosmWasm/cosmwasm#1982. Ideally we can revert once this is solved in Wasmer.

@jost-s
Copy link

jost-s commented Jan 12, 2024

I realize that is not a workaround for my scenario where I keep the module and thus engine for the entire time the system is running. It wouldn't make a difference storing the engine with the module or not, since the module is never dropped.

On a positive note, in my use case deserialization is only done once at startup and there are less than 20 modules, so the memory usage shouldn't go through the roof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 📦 lib-api About wasmer 📦 lib-engine About wasmer-engine priority-medium Medium priority issue project-confio
Projects
None yet
Development

No branches or pull requests

4 participants