Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Action plugin execution blocks daemon plugins #2436

Merged
merged 1 commit into from
Dec 11, 2020
Merged

Conversation

jgrund
Copy link
Member

@jgrund jgrund commented Dec 11, 2020

A long running action plugin will block the polling of all daemon plugins until it completes.

This is because the Sessions.message method takes a read lock here:

if let State::Active(active) = state.read().await.deref() {

Which is then held for the entirety of the plugin invocation as the plugin is resident inside that RWLock.

At the same time, A write lock is attempted on the plugins here:

Arc::clone(&state).write().await.deref_mut(),

When the loop tries the write lock on the same plugin that is running the message, it will block until the read lock is released.

Depending on what the plugin is doing, this could potentially be many hours.

To solve this, we should break the dependency on a RWLock in general at the Session level. This can be done by making sure all DaemonPlugin are Clone. Most plugins have all their state in Arc<Mutex<...>> so this is not too big of a change.

In addition, when calling Sessions.message we should clone the plugin before invoking on_message so there will be no reliance on the read lock for the duration of the call.

Fixes #2435.

Signed-off-by: Joe Grund jgrund@whamcloud.io


This change is Reviewable

A long running action plugin will block the polling of all daemon plugins until it completes.

This is because the `Sessions.message` method takes a read lock here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/5e71ee9d2240b64afe640b44546303a32ad06a5a/iml-agent/src/http_comms/session.rs#L154

Which is then held for the entirety of the plugin invocation as the plugin is resident inside that `RWLock`.

At the same time, A write lock is attempted on the plugins here:

https://github.com/whamcloud/integrated-manager-for-lustre/blob/691b4e035bbae94857eb58bafa558504fd32d4ba/iml-agent/src/poller.rs#L118

When the loop tries the write lock on the same plugin that is running the message, it will block until the read lock is released.

Depending on what the plugin is doing, this could potentially be many hours.

To solve this, we should break the dependency on a `RWLock` in general at the `Session` level. This can be done by making sure all `DaemonPlugin` are `Clone`. Most plugins have all there state in `Arc<Mutex<...>>` so this is not to big of a change.

In addition, when calling `Sessions.message` we should clone the plugin before invoking `on_message` so there will be no reliance on the read lock for the duration of the call.

Fixes #2435.

Signed-off-by: Joe Grund <jgrund@whamcloud.io>
@jgrund jgrund added the bug label Dec 11, 2020
@jgrund jgrund requested a review from a team December 11, 2020 16:15
@jgrund jgrund self-assigned this Dec 11, 2020
@jgrund jgrund merged commit 8bf2a50 into master Dec 11, 2020
@jgrund jgrund deleted the fix-message-locking branch December 11, 2020 21:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants