Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(1365): CPU bound instances don't spread on all CPU cores #1376

Merged
merged 6 commits into from
Feb 15, 2024

Conversation

JT117
Copy link
Contributor

@JT117 JT117 commented Jan 23, 2024

Feature or Problem

Fix the cpu spread on all cores.
For me in my mind:
The change for the actor is to put in a spawn the wasmtime module clone, so the clone can happen on any core
The change on the rpc bus is to handle each nats event in a separate spawn in order to spread on all core

Related Issues

#1365

Release Information

next

Consumer Impact

Instances will use 100% of cpu

Testing

none

Unit Test(s)

none

Acceptance or Integration

none

Manual Verification

1/ set the wasmcloud RPC timeout high enough to actually wait for the response :
export WASMCLOUD_RPC_TIMEOUT_MS=10000
2/ start wasmcloud 0.81.0
wash up
3/ deploy the wadm file
wash app deploy hello-world/wadm.yaml
4/ bench with any kind of app
observe the cpu core loads

…ats event. Also encapsulate the .clone on the wasmtime module.

After this two modification the workload spread on all core of the CPU.
Relates to issue wasmCloud#1365

Signed-off-by: Julien Teruel <julien.teruel@gmail.com>
Signed-off-by: Julien Teruel <julien.teruel@gmail.com>
@JT117 JT117 marked this pull request as ready for review January 24, 2024 10:04
@JT117 JT117 requested a review from a team as a code owner January 24, 2024 10:04
@JT117 JT117 marked this pull request as draft January 24, 2024 10:05
Copy link
Contributor

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments I noticed while looking at this. If we do choose to go down this route, I'd really prefer to have comments explaining why this works. I suspect it is because spawn forces the future to possibly run on a different worker queue, but as far as I could tell, await should do this too. We're using the multithread tokio runtime, and according to the docs, if a task is on the queue (which is what I assume happens when it yields with await) it should have the possibility of being stolen by another local queue.

Can you remind me, what operating system does this problem happen on and what are its specs? Edit: found in issue

crates/host/src/wasmbus/mod.rs Show resolved Hide resolved
crates/host/src/wasmbus/mod.rs Outdated Show resolved Hide resolved
crates/runtime/src/actor/module/mod.rs Outdated Show resolved Hide resolved
@thomastaylor312
Copy link
Contributor

I still don't quite understand why this is happening because for_each_concurrent uses FuturesUnordered which as far as I can tell should yield properly and cause tokio to spread the work between work queues. However, I think that we might as well just get the fix in so I'd be ok with merging this once marked ready for review and comments are addressed!

@thomastaylor312
Copy link
Contributor

@JT117 Even though we aren't quite sure why this works, it does work. Would you mind addressing the comments, rebasing, and marking as ready for review? Then we can get this in to the next patch release

Signed-off-by: Julien Teruel <julien.teruel@gmail.com>
…ly the instance instead of wrapping the instance's components in a future

Signed-off-by: Julien Teruel <julien.teruel@gmail.com>
@JT117 JT117 marked this pull request as ready for review February 14, 2024 08:20
Copy link
Contributor

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for adding the comments as well!

@thomastaylor312 thomastaylor312 enabled auto-merge (rebase) February 15, 2024 00:49
@thomastaylor312 thomastaylor312 merged commit c6fa704 into wasmCloud:main Feb 15, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants