pm: device_runtime: Use its own queue #87496

ceolin · 2025-03-21T20:22:36Z

Device runtime is using the system workqueue to do operations that are mostly blockers (suspend a device). This should not happen.

This commit defines a queue for the device runtime async operations.

The test for this API was assuming that the system workqueue priority (which is cooperative) so we need to change the test priority to be lower than the device runtime workqueue priority.

ceolin · 2025-03-21T20:23:24Z

@bjarki-andreasen thanks for pointing this problem out

bjarki-andreasen

Really nice!

JordanYates · 2025-03-22T03:43:08Z

Should this be optional? Asynchronous PM is not necessarily used at all in an application, but this will incur the additional thread regardless. I'm really not a fan of these "one job" workqueues. What is the use case that triggered this change?

bjarki-andreasen · 2025-03-22T06:09:36Z

Should this be optional? Asynchronous PM is not necessarily used at all in an application, but this will incur the additional thread regardless. I'm really not a fan of these "one job" workqueues. What is the use case that triggered this change?

Async will become optional as well.

A usecase that triggered this change is the modem subsystem, which is async, and build around the system workqueue. The work queue is not allowed to be blocking, otherwise it can't be shared, work has to be async. pm_device_runtime_get() and put() are blocking actions, yet often users call these from system work queue items, either directly from their app, or indirectly through pm_device_put_async(). This breaks drivers and subsystems which use the system workqueue internally, since it can't both be blocked waiting for the modem driver to suspend, and do the work required to suspend the modem driver, so its deadlocked.

A design decision that triggered the change is simply that no blocking API may be called from a shared work queue, so pretty much all device driver APIs and pm actions are not allowed.

JordanYates · 2025-03-22T09:55:39Z

A design decision that triggered the change is simply that no blocking API may be called from a shared work queue, so pretty much all device driver APIs and pm actions are not allowed.

Hard disagree on this one, Bluetooth tried to enforce such a black and white rule and it caused many problems, see #84282.
Please provide the RFC or architecture meeting notes for the decision, it is certainly not documented on https://docs.zephyrproject.org/latest/kernel/services/threads/workqueue.html#system-workqueue

ceolin

Should this be optional? Asynchronous PM is not necessarily used at all in an application, but this will incur the additional thread regardless. I'm really not a fan of these "one job" workqueues. What is the use case that triggered this change?

I am not a fan either. The reason that triggered it is what @bjarki-andreasen explained. There is a resource penalty in having its own queue but make this code less error prone, to minimize the drawbacks I plan to make it optional (async support) since a lot of applications may not need it.

bjarki-andreasen · 2025-03-23T08:47:15Z

A design decision that triggered the change is simply that no blocking API may be called from a shared work queue, so pretty much all device driver APIs and pm actions are not allowed.

Hard disagree on this one, Bluetooth tried to enforce such a black and white rule and it caused many problems, see #84282.
Please provide the RFC or architecture meeting notes for the decision, it is certainly not documented on https://docs.zephyrproject.org/latest/kernel/services/threads/workqueue.html#system-workqueue

The system work queue is serviced by a single thread. It can only execute one work item at a time. If item "a" blocks until item "b" is executed, item "b" is never executed. Since any module can use the sys work queue internally, there is no guarantee that item "a" will not end up blocked by item "b".

If the above statement is correct, there is nothing to disagree with, it simply can't be used safely with blocking APIs.

I will create a PR to add this to the docs. We can discuss from there :)

bjarki-andreasen · 2025-03-23T09:23:43Z

@JordanYates There is an entry in the docs here warning about the deadlock https://docs.zephyrproject.org/latest/kernel/services/threads/workqueue.html#work-item-lifecycle:

"A handler function can use any kernel API available to threads. However, operations that are potentially blocking (e.g. taking a semaphore) must be used with care, since the workqueue cannot process subsequent work items in its queue until the handler function finishes executing."

Its not that strongly worded, "must be used with care" is essentially what I'm referring to. Calling pm_device_action_run() is not using with care, its hoping the driver does not depend on a work item internally to perform the action...

A user has no way of knowing why calling a pm device API on a particular device suddenly makes seemingly random parts of the system stop responding, its a particularly unintuitive behavior which is really hard to narrow down.

bjarki-andreasen · 2025-03-23T14:20:49Z

PR for documenting and asserting safe use of system work queue #87522

nashif · 2025-03-24T19:05:58Z

can this be made a choice, i.e a choice between using system workq and a dedicated workq? This way, depending on the application and resources available you can use existing system workq or have one dedicated for PM if system workq become crowded with blocking consumers and you are able to afford another thread?

ceolin · 2025-03-24T22:47:41Z

can this be made a choice, i.e a choice between using system workq and a dedicated workq? This way, depending on the application and resources available you can use existing system workq or have one dedicated for PM if system workq become crowded with blocking consumers and you are able to afford another thread?

it can be done. There is possible issues associated with this but we can make it optional with default to have its own queue.

bjarki-andreasen · 2025-03-25T06:49:57Z

can this be made a choice, i.e a choice between using system workq and a dedicated workq? This way, depending on the application and resources available you can use existing system workq or have one dedicated for PM if system workq become crowded with blocking consumers and you are able to afford another thread?

Not safely, it breaks drivers which use the system workqueue internally. Its not a question of "crowded", its a question of deadlocking :)

ceolin · 2025-04-02T22:49:30Z

@nashif I have added a new commit making the async operation optional (consequently the need for the queue).

@bjarki-andreasen Would you mind take a another look ? @JordanYates ^

bjarki-andreasen

Love it!

JordanYates · 2025-04-03T03:48:38Z

This doesn't allow using pm_device_runtime_async without using this new dedicated queue.
So existing users, that operate perfectly fine without the dedicated queue, will incur the RAM penalty.
e.g.

zephyr/drivers/flash/spi_nor.c

Lines 818 to 819 in 29dd014

    
           /* Release flash power requirement */ 
        
           (void)pm_device_runtime_put_async(dev, K_MSEC(ACTIVE_DWELL_MS));

There are also hidden RAM costs for extra threads as well, for example coredumps. Extra RAM needs to be reserved there for each thread in the system.

My understanding is that the dedicated queue is only needed when the PM actions themselves rely on the system workqueue. That seems like a relatively rare situation?

ceolin · 2025-04-03T04:48:27Z

This doesn't allow using pm_device_runtime_async without using this new dedicated queue. So existing users, that operate perfectly fine without the dedicated queue, will incur the RAM penalty. e.g.

Yes, that is a drawback indeed. I am debating about allowing to continue to use the system work queue but I am afraid that doing it will hide possible issues. It is going to be hard for an application to know if devices do block operations or not.

zephyr/drivers/flash/spi_nor.c

Lines 818 to 819 in 29dd014

/* Release flash power requirement */

(void)pm_device_runtime_put_async(dev, K_MSEC(ACTIVE_DWELL_MS));

There are also hidden RAM costs for extra threads as well, for example coredumps. Extra RAM needs to be reserved there for each thread in the system.

There is definitely a cost associated with the new thread. The question is whether or not it is acceptable in favor of a more robust solution.

My understanding is that the dedicated queue is only needed when the PM actions themselves rely on the system workqueue. That seems like a relatively rare situation?

The other case is when the pm action itself blocks because it blocks the system work queue.

JordanYates · 2025-04-03T05:27:44Z

The other case is when the pm action itself blocks because it blocks the system work queue.

That is only problematic if the blocking is "unreasonably long". Obviously the threshold is rather vague, but personally I don't count blocking the work queue for 2ms while waiting for a "IDLE" SPI command to complete as unreasonable. In my own usage of runtime PM, I haven't come across many drivers that have complex SUSPEND actions.

The only case where it is a clear problem is

when the PM actions themselves rely on the system workqueue

ceolin

@bjarki-andreasen @JordanYates can you take another look please ?

JordanYates

I must have missed something, why is this PR suddenly making the entire async PM infrastructure optional (with no docs)?

On the actual topic of this PR:

We could have a choice to use the system workqueue or the dedicated one. I think the default should be the dedicated one, so we can add a note to the choice stating that if you use the system workqueue and some PM ops suddently time out that may be why :)

Wouldn't it make more sense to only enable the dedicated workqueue in drivers that are known to be problematic when run from the system workqueue?

bjarki-andreasen · 2025-04-22T06:40:08Z

I must have missed something, why is this PR suddenly making the entire async PM infrastructure optional (with no docs)?

This should be its own PR yeah :)

On the actual topic of this PR:

We could have a choice to use the system workqueue or the dedicated one. I think the default should be the dedicated one, so we can add a note to the choice stating that if you use the system workqueue and some PM ops suddently time out that may be why :)

Wouldn't it make more sense to only enable the dedicated workqueue in drivers that are known to be problematic when run from the system workqueue?

As in, select the dedicated workqueue option if drivers use the system workqueue internally for PM? otherwise default to sys workqueue? that could work as well :)

ceolin · 2025-04-24T22:06:18Z

I must have missed something, why is this PR suddenly making the entire async PM infrastructure optional (with no docs)?

This should be its own PR yeah :)

Sorry for this guys, it grew from the original and I missed it. Going to add the documentation.

On the actual topic of this PR:

We could have a choice to use the system workqueue or the dedicated one. I think the default should be the dedicated one, so we can add a note to the choice stating that if you use the system workqueue and some PM ops suddently time out that may be why :)

Wouldn't it make more sense to only enable the dedicated workqueue in drivers that are known to be problematic when run from the system workqueue?

As in, select the dedicated workqueue option if drivers use the system workqueue internally for PM? otherwise default to sys workqueue? that could work as well :)

I can't see much benefits in this. If you have one driver that needs you have already added the overhead of dedicated work queue in the system. If all drivers you are using don't need it you can just use the system work queue.

I think it will add more complexity without a major advantage. Am I missing something?

JordanYates · 2025-04-25T01:47:31Z

I think it will add more complexity without a major advantage. Am I missing something?

That if you don't use one of the rare drivers that requires the system workqueue in the PM action you don't add the extra workqueue by default?

ceolin · 2025-05-14T05:51:25Z

I think it will add more complexity without a major advantage. Am I missing something?

That if you don't use one of the rare drivers that requires the system workqueue in the PM action you don't add the extra workqueue by default?

I don't get the point. The work queue is in the subsystem and not per driver. If you have only one device that needs to block, you will pay the work queue cost. If we do one per driver, it is even worse.... If you don't have any driver that needs it, you can opt to use the system work queue, and there will be no penalty.

@JordanYates @bjarki-andreasen I have updated the pr adding the documentation about it. Please chime in if you find it incomplete or something wrong.

JordanYates · 2025-05-15T00:32:45Z

I don't get the point. The work queue is in the subsystem and not per driver. If you have only one device that needs to block, you will pay the work queue cost. If we do one per driver, it is even worse.... If you don't have any driver that needs it, you can opt to use the system work queue, and there will be no penalty.

I'm not suggesting to create a workqueue per driver. I'm suggesting that the default should be to use the system workqueue for the PM options, with the dedicated workqueue only used if a problematic driver is enabled. e.g.

config PM_DRIVER_NEEDS_DEDICATED_WORKQ
    bool

choice PM_WORKQUEUE
    default PM_WORKQUEUE_DEDICATED if PM_DRIVER_NEEDS_DEDICATED_WORKQ
    default PM_WORKQUEUE_SYS

config PM_WORKQUEUE_SYS

config PM_WORKQUEUE_DEDICATED

endchoice

config SOME_COMPLICATED_DRIVER
    select PM_DRIVER_NEEDS_DEDICATED_WORKQ

By making the dedicated workqueue opt-in, we only incur the RAM penalty when it is actually needed.

ceolin · 2025-05-16T17:24:17Z

I don't get the point. The work queue is in the subsystem and not per driver. If you have only one device that needs to block, you will pay the work queue cost. If we do one per driver, it is even worse.... If you don't have any driver that needs it, you can opt to use the system work queue, and there will be no penalty.

I'm not suggesting to create a workqueue per driver. I'm suggesting that the default should be to use the system workqueue for the PM options, with the dedicated workqueue only used if a problematic driver is enabled. e.g.
config PM_DRIVER_NEEDS_DEDICATED_WORKQ
    bool

choice PM_WORKQUEUE
    default PM_WORKQUEUE_DEDICATED if PM_DRIVER_NEEDS_DEDICATED_WORKQ
    default PM_WORKQUEUE_SYS

config PM_WORKQUEUE_SYS

config PM_WORKQUEUE_DEDICATED

endchoice

config SOME_COMPLICATED_DRIVER
    select PM_DRIVER_NEEDS_DEDICATED_WORKQ
By making the dedicated workqueue opt-in, we only incur the RAM penalty when it is actually needed.

I still think that a dedicated queue is safer but I understand the point here. I am ok with this approach especially because this is the current behavior and it is not like we looking a lot of issues with it.

ceolin · 2025-05-16T22:19:49Z

@JordanYates done :)

Can you review that again please ?

doc/releases/release-notes-4.2.rst

subsys/pm/device_runtime.c

ceolin

    * :kconfig:option:`CONFIG_PM_DEVICE_DRIVER_NEEDS_DEDICATED_WQ`

The symbol is actually CONFIG_PM_DRIVER_NEEDS_DEDICATED_WG. To be accurate it should be CONFIG_PM_DEVICE_RUNTIME_ASYNC_NEEDS_DEDICATED_WQ but that is really a huge symbol name, but I prefer that over CONFIG_PM_DEVICE_DRIVER_NEEDS_DEDICATED_WQ. What you think ?

Device runtime is using the system workqueue to do operations that are mostly blockers (suspend a device). This should not happen. This commit adds an option to use dedicated queue for the device runtime async operations. The test for this API was assuming that the system workqueue priority (which is cooperative) so we need to change the test priority to be lower than the device runtime workqueue priority. Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>

Async now uses its own work queue, which means it consumes more resources. Since not all applications need the async API, we can make it optional without any penalty for those applications. Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>

Add new configuration options for runtime PM to the release notes, including stack size, priority, and system work queue usage. Update the runtime PM documentation to explain the implications of using the system work queue and disabling asynchronous operations. Include a new version of the sequence diagram for asynchronous operations. Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>

ceolin · 2025-05-27T00:27:22Z

@JordanYates I believe I have addressed all comments, can you take another look please ?
@bjarki-andreasen ^

sonarqubecloud · 2025-05-27T01:17:07Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bjarki-andreasen

Given the context of ASYNC being the reason we need to choose how to delegate the suspend action, I think it makes sense to include the option of excluding it entirely in this PR despite my previous comment. Nice work :)

zephyrbot added the area: Power Management label Mar 21, 2025

zephyrbot requested review from bjarki-andreasen, JordanYates, nashif, teburd and tmleman March 21, 2025 20:23

zephyrbot assigned ceolin and bjarki-andreasen Mar 21, 2025

ceolin force-pushed the pm/device-runtime/async branch from 2d0a7ec to 71ed1f3 Compare March 21, 2025 20:50

bjarki-andreasen previously approved these changes Mar 22, 2025

View reviewed changes

ceolin commented Mar 23, 2025

View reviewed changes

ceolin dismissed bjarki-andreasen’s stale review via ebef785 April 2, 2025 22:48

ceolin force-pushed the pm/device-runtime/async branch from 71ed1f3 to ebef785 Compare April 2, 2025 22:48

bjarki-andreasen previously approved these changes Apr 3, 2025

View reviewed changes

ceolin dismissed bjarki-andreasen’s stale review via afd53d5 April 3, 2025 04:48

ceolin force-pushed the pm/device-runtime/async branch from ebef785 to afd53d5 Compare April 3, 2025 04:48

ceolin commented Apr 20, 2025

View reviewed changes

JordanYates requested changes Apr 21, 2025

View reviewed changes

ceolin force-pushed the pm/device-runtime/async branch from 24400a1 to f1eedcf Compare May 14, 2025 05:46

github-actions bot added the Release Notes label May 14, 2025

github-actions bot requested review from danieldegrasse, dkalowsk and kartben May 14, 2025 05:47

ceolin force-pushed the pm/device-runtime/async branch 2 times, most recently from 8de63ec to 9087802 Compare May 14, 2025 16:26

ceolin force-pushed the pm/device-runtime/async branch 3 times, most recently from c8096f4 to e6d5c71 Compare May 16, 2025 20:14

JordanYates requested changes May 16, 2025

View reviewed changes

doc/releases/release-notes-4.2.rst Outdated Show resolved Hide resolved

subsys/pm/device_runtime.c Show resolved Hide resolved

ceolin commented May 17, 2025

View reviewed changes

ceolin added 3 commits May 26, 2025 17:19

pm: device_runtime: Make async optional

7176ce2

Async now uses its own work queue, which means it consumes more resources. Since not all applications need the async API, we can make it optional without any penalty for those applications. Signed-off-by: Flavio Ceolin <flavio@hubblenetwork.com>

ceolin force-pushed the pm/device-runtime/async branch from e6d5c71 to 46a7cc6 Compare May 27, 2025 00:21

bjarki-andreasen approved these changes May 27, 2025

View reviewed changes

JordanYates approved these changes May 27, 2025

View reviewed changes

kartben merged commit 287984f into zephyrproject-rtos:main May 27, 2025
26 checks passed

pm: device_runtime: Use its own queue #87496

pm: device_runtime: Use its own queue #87496

Uh oh!

Conversation

ceolin commented Mar 21, 2025

Uh oh!

ceolin commented Mar 21, 2025

Uh oh!

bjarki-andreasen left a comment

Choose a reason for hiding this comment

Uh oh!

JordanYates commented Mar 22, 2025

Uh oh!

bjarki-andreasen commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JordanYates commented Mar 22, 2025

Uh oh!

ceolin left a comment

Choose a reason for hiding this comment

Uh oh!

bjarki-andreasen commented Mar 23, 2025

Uh oh!

bjarki-andreasen commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjarki-andreasen commented Mar 23, 2025

Uh oh!

nashif commented Mar 24, 2025

Uh oh!

ceolin commented Mar 24, 2025

Uh oh!

bjarki-andreasen commented Mar 25, 2025

Uh oh!

ceolin commented Apr 2, 2025

Uh oh!

bjarki-andreasen left a comment

Choose a reason for hiding this comment

Uh oh!

JordanYates commented Apr 3, 2025

Uh oh!

ceolin commented Apr 3, 2025

Uh oh!

JordanYates commented Apr 3, 2025

Uh oh!

ceolin left a comment

Choose a reason for hiding this comment

Uh oh!

JordanYates left a comment

Choose a reason for hiding this comment

Uh oh!

bjarki-andreasen commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ceolin commented Apr 24, 2025

Uh oh!

JordanYates commented Apr 25, 2025

Uh oh!

ceolin commented May 14, 2025

Uh oh!

JordanYates commented May 15, 2025

Uh oh!

ceolin commented May 16, 2025

Uh oh!

ceolin commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

ceolin left a comment

Choose a reason for hiding this comment

Uh oh!

ceolin commented May 27, 2025

Uh oh!

sonarqubecloud bot commented May 27, 2025

Quality Gate passed

Uh oh!

bjarki-andreasen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bjarki-andreasen commented Mar 22, 2025 •

edited

Loading

bjarki-andreasen commented Mar 23, 2025 •

edited

Loading

bjarki-andreasen commented Apr 22, 2025 •

edited

Loading