Excessive compile time for "simple" module

OSS-Fuzz has had an [open bug for this](https://oss-fuzz.com/testcase-detail/4971399285899264) for quite some time but I'm only just now getting around to filing an issue about this. The gist of the fuzz bug is that Wasmtime times out in the `compile` test target where the only thing that target does is compile a module. A timeout here means that the 60s time limit is exceeded, and the time limit here is with sanitizers and parallelism disabled on OSS-Fuzz infrastructure. This can be roughly approximated by running locally in release mode with `-Cparallel-compilation=n` and multiplying the result by ~30.

The module in question is:

```wasm
(module 
    (func)
    (func)
    ;; ... 119248 times ...
    (func)
)
```

aka this is just a giant module of a lot of empty functions. [foo.wasm.gz](https://github.com/user-attachments/files/19739342/foo.wasm.gz) is the compresed version of this module.

Locally I see:

```
$ time wasmtime compile -C parallel-compilation=n foo.wasm
wasmtime compile -C parallel-compilation=n foo.wasm  1.05s user 0.15s system 99% cpu 1.205 total
```

which, for being a bunch of empty functions, is quite a lot! There's a lot of functions in this module but 10 microseconds for an empty function feels a bit excessive regardless. While optimizing this probably won't help out too too much in the long term, it's perhaps worthwhile to still try to improve this if not just for oss-fuzz timeouts and fuzzing.

A profile of the compilation looks [like this](https://share.firefox.dev/4ikGBoK) which notably spends the most amount of "self" time in memmove. There's also notably a fair amount of allocation traffic as well. I'm not sure how to improve the regalloc2 parts myself but for memmove that I do know how to improve.

## Moving `MachBuffer` less

The basic problem I've seen is that we're pretty liberal in Cranelift about moving data structures by ownership between phases, notably the `MachBuffer<T>`. This type is very large (lots of `SmallVec`) and is created/moved quite a lot throughout a compilation. This I believe adds up to quite a lot of memmove costs.

The movements I've seen are:

* [`MachBuffer::finish`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/cranelift/codegen/src/machinst/buffer.rs#L1539)
* [`TargetIsa::compile_function`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/cranelift/codegen/src/isa/x64/mod.rs#L76-L96)
* [`Context::compile_stencil`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/cranelift/codegen/src/context.rs#L124-L143)
* [`Context::compile`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/cranelift/codegen/src/context.rs#L203-L217)
* [`compile_uncached`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/crates/cranelift/src/compiler.rs#L725-L733)
* [`Compiler::finish_with_info`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/crates/cranelift/src/compiler.rs#L1020)

In general I don't think rustc/LLVM are even capable of eliding most of these copies which means that for each function we're copying a this very large structure ~6 times (ish). Multiply that by ~100k and the size of the structure and that's a lot of memory moving around and can probably explain at least a good portion of the second of compile time for this module.

Ideally we would refactor cranelift to require much less movement of the `MachBuffer` type. In an ideal world we could even reuse `MachBuffer` structures between compiling functions too. In any case we can probably get a long-ish way restructuring things and ownership of the `MachBuffer`

## Other compiler structures

I've seen other compiler structures in the profile be quite large, such as [`CompilerContext`](https://github.com/bytecodealliance/wasmtime/blob/e6e03c34cfb4f616e437bfcaa67575c3ab3a2e0c/crates/cranelift/src/compiler.rs#L46-L51), which are moved around a lot. Ideally we could perhaps `Box` up some contexts and or make movements cheaper to work with.

---

I'm sure there's other parts of the profile to dig in to as well, but I wanted to at least file an issue in case anyone's interested in chipping away at some pieces here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Excessive compile time for "simple" module #10576

Moving `MachBuffer` less

Other compiler structures

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Excessive compile time for "simple" module #10576

Description

Moving MachBuffer less

Other compiler structures

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Moving `MachBuffer` less