Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsys: pm: add check for device busy in policy #61726

Merged
merged 1 commit into from
Sep 15, 2023

Conversation

leifuzhao
Copy link
Contributor

Should add check for device busy here because one or more devices may still in busy, otherwise problem may happen due to system go into low power mode with some devices still in busy. Issue is observed in Intel ISH platform.

Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check is already done here

if (!device_is_ready(dev) || pm_device_is_busy(dev) ||

@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks)
uint8_t num_cpu_states;
const struct pm_state_info *cpu_states;

#if CONFIG_PM_DEVICE
if (pm_device_is_any_busy()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, take a look at the state locking API. It's the device driver code that needs to prevent system sleep if it's busy doing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out Intel ISH platform requires all devices are not busy to enter into system low power. Add state locking in every device driver may not be a suitable solution. Is it possible to add a Kconfig option in pm that enable such property, for example config NEED_ALL_DEVICES_UNBUSY, if this config is y then runs this pm_device_is_any_busy() checking in policy.c?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a bad case:

  • a driver sets device busy, starts one transaction, then waits the completion semaphore with a long timeout or even FOREVER.
  • Zephyr runs to idle thread and gets a long idle time, so PM policy returns a deep sleep state.
  • PM suspends all other devices, then SoC goes into very low PM state, maybe suspend to disk.
  • SoC detects there's device interrupt pending, so wakes up immediately.
  • do all resume works.
  • device interrupt gets served.
    in such a case, low-power states should be blocked. Any driver should avoid it. It's like a common thing. It seems not good for all drivers to call pm state lock APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your device is waiting on something, and needs to prevent deep sleep, simply lock deep sleep states while waiting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that most drivers have similar working model:

  • set register to trigger a transaction.
  • wait complete semaphore which will be given in ISR.
  • complete the transaction.
    all those drivers should call PM state lock APIs to lock all low power states after step 1. as a common driver, I've afraid they should call PM state lock API multi times to lock PM_STATE_SUSPEND_TO_IDLE, PM_STATE_STANDBY, PM_STATE_SUSPEND_TO_RAM and PM_STATE_SUSPEND_TO_DISK, then call the API multi times to unlock them after transaction done.
    this seems not good at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current solution doesn't solve the problems I mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine for me having this option wrapped around a Kconfig option if that is a common needed for devices in a platform. If it is particular to one device, I do stay with the state lock solution, but if that is common thing for the platform, I think this solution is reasonable, it less error prone.

Copy link
Collaborator

@kwd-doodling kwd-doodling Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all ISH devices need it, except three on AON domain, HPET/RTC/IPC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM.

agree, why not use a custom policy instead of change the default policy. PM_NEED_ALL_DEVICES_IDLE seems to be very specialized and soon enough we will be wondering what is the usecase behind it without too much context and details in the documentation. See https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/subsys/pm/power_mgmt/src/main.c#L222C29-L222C49 for an example of a custom policy.

@leifuzhao
Copy link
Contributor Author

check is already done here

if (!device_is_ready(dev) || pm_device_is_busy(dev) ||

This check is not enough for system level suspend. As can be seen from this code below, when one device is busy, the code runs continue statement, it means this device busy doesn't prevent system suspend which is wrong because system level suspend should happen when no device is busy, otherwise will surely create problem which has already been seen in Intel ISH platform.

	if (!device_is_ready(dev) || pm_device_is_busy(dev) ||
	    pm_device_state_is_locked(dev) ||
	    pm_device_wakeup_is_enabled(dev) ||
	    pm_device_runtime_is_enabled(dev)) {
		continue;
	}

@gmarull
Copy link
Member

gmarull commented Aug 23, 2023

check is already done here

if (!device_is_ready(dev) || pm_device_is_busy(dev) ||

This check is not enough for system level suspend. As can be seen from this code below, when one device is busy, the code runs continue statement, it means this device busy doesn't prevent system suspend which is wrong because system level suspend should happen when no device is busy, otherwise will surely create problem which has already been seen in Intel ISH platform.

	if (!device_is_ready(dev) || pm_device_is_busy(dev) ||
	    pm_device_state_is_locked(dev) ||
	    pm_device_wakeup_is_enabled(dev) ||
	    pm_device_runtime_is_enabled(dev)) {
		continue;
	}

Then just use the state locking APIs as suggested above.

@ceolin
Copy link
Member

ceolin commented Aug 23, 2023

check is already done here

if (!device_is_ready(dev) || pm_device_is_busy(dev) ||

This check is not enough for system level suspend. As can be seen from this code below, when one device is busy, the code runs continue statement, it means this device busy doesn't prevent system suspend which is wrong because system level suspend should happen when no device is busy, otherwise will surely create problem which has already been seen in Intel ISH platform.

	if (!device_is_ready(dev) || pm_device_is_busy(dev) ||
	    pm_device_state_is_locked(dev) ||
	    pm_device_wakeup_is_enabled(dev) ||
	    pm_device_runtime_is_enabled(dev)) {
		continue;
	}

Then just use the state locking APIs as suggested above.

Agree, if that is the case, the right way to prevent the system to sleep is putting a constraint.

subsys/pm/Kconfig Outdated Show resolved Hide resolved
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks)
uint8_t num_cpu_states;
const struct pm_state_info *cpu_states;

#if defined(CONFIG_PM_DEVICE) && defined(CONFIG_PM_NEED_ALL_DEVICES_IDLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pollutes CONFIG_PM code with PM_DEVICE related stuff, so to me this is a -1. As explained, there are other more precise means to solve this issue, that is, with state locking (maybe refining its implementation to accommodate new requirements). There's also no need to enable CONFIG_PM_DEVICE to modify the behavior of system PM, this is why we have policy APIs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarull I have no knowledge of other SoCs/platforms, but Intel ISH's SoC core low-power mode and device/peripheral power do have relationship. When CPU goes into deep sleep, devices' power will be gated.
So ISH does need this change, appreciate your understanding.

Copy link
Collaborator

@kwd-doodling kwd-doodling Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarull Hi Gerald, we do need this patch for production. could you approve it? or if you have any more concern, let me know. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expressed my concerns, and they have not been addressed. Let me know if this patch has further updates.

Copy link
Collaborator

@kwd-doodling kwd-doodling Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarull Hi Gerald, just get your point. Previously I thought CONFIG_PM_DEVICE is a sub-config of CONFIG_PM because CONFIG_PM_DEVICE has prefix CONFIG_PM, but it seems they're independent from subsystem/pm/Kconfig.
but this seems weird to me. Shouldn't Device PM be part of PM (Power Management)? @ceolin @nashif
Do you know there is any SoC that needs only CONFIG_PM_DEVICE but no CONFIG_PM? I wonder if there's any such need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarull I am not following what you are objecting to exactly, the code has changed since this:

This pollutes CONFIG_PM code with PM_DEVICE related stuff, so to me this is a -1

@leifuzhao leifuzhao force-pushed the mainout1 branch 2 times, most recently from 32d275e to 54a51bf Compare September 7, 2023 06:04
kwd-doodling
kwd-doodling previously approved these changes Sep 7, 2023
Copy link
Collaborator

@tmleman tmleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking since it's hidden behind #ifdef.

@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks)
uint8_t num_cpu_states;
const struct pm_state_info *cpu_states;

#if CONFIG_PM_DEVICE
if (pm_device_is_any_busy()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM.

ceolin
ceolin previously approved these changes Sep 11, 2023
Copy link
Member

@ceolin ceolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is an acceptable constraint / behavior for the policy. Devices and SoC are not completely disconnect things in some cases, so having it protected by a build option is ok for me.

subsys/pm/Kconfig Outdated Show resolved Hide resolved
subsys/pm/Kconfig Outdated Show resolved Hide resolved
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks)
uint8_t num_cpu_states;
const struct pm_state_info *cpu_states;

#if CONFIG_PM_DEVICE
if (pm_device_is_any_busy()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM.

agree, why not use a custom policy instead of change the default policy. PM_NEED_ALL_DEVICES_IDLE seems to be very specialized and soon enough we will be wondering what is the usecase behind it without too much context and details in the documentation. See https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/subsys/pm/power_mgmt/src/main.c#L222C29-L222C49 for an example of a custom policy.

Add check for device busy when CONFIG_PM_NEED_ALL_DEVICES_IDLE is
set to y because one or more devices may still in busy and causes
problem when system go into low power in Intel ISH platform.

Signed-off-by: Leifu Zhao <leifu.zhao@intel.com>
@kwd-doodling
Copy link
Collaborator

We've considered CONFIG_PM_POLICY_CUSTOM. That patch is just a copy of current policy function with changes in this PR. Leifu wondered weather it's acceptable by community for duplicated code.
@ceolin @nashif either way we're OK. Please give your point which is better.

@ceolin
Copy link
Member

ceolin commented Sep 12, 2023

We've considered CONFIG_PM_POLICY_CUSTOM. That patch is just a copy of current policy function with changes in this PR. Leifu wondered weather it's acceptable by community for duplicated code. @ceolin @nashif either way we're OK. Please give your point which is better.

I recommend using custom policy when you have very unique needs, like actions depending on specific cores, specialized constraints, ... This one didn't look a super unique case for me. The drawback for custom policy is that you have to account state and latency constraints in your policy for those APIs work correctly.
That's said, since it looks controversial and you are ok in have a custom policy, that is probably the easiest/best approach now.

@leifuzhao
Copy link
Contributor Author

We've considered CONFIG_PM_POLICY_CUSTOM. That patch is just a copy of current policy function with changes in this PR. Leifu wondered weather it's acceptable by community for duplicated code. @ceolin @nashif either way we're OK. Please give your point which is better.

I recommend using custom policy when you have very unique needs, like actions depending on specific cores, specialized constraints, ... This one didn't look a super unique case for me. The drawback for custom policy is that you have to account state and latency constraints in your policy for those APIs work correctly. That's said, since it looks controversial and you are ok in have a custom policy, that is probably the easiest/best approach now.

Agree. POLICY_CUSTOM is not feasible solution at least right now. It is not simple as just copy and modify as imagined. Just copy pm_policy_next_state() and add check_all_busy() doesn't work since there are several variables defined as static outside the pm_policy_next_state() function in subsys/pm/policy.c, such as next_event_cyc, max_latency_cyc. If copy whole subsys/pm/policy.c file and add check_all_busy(), it also doesn't work. This is becasue subsys/pm/policy.c defines several API functions such as pm_policy_latency_request_update() etc, then there will be multiple definitions of same function. But if remove them in the new copied file, for example remove pm_policy_latency_request_update(), then pm_policy_latency_request_update() in subsys/pm/policy.c will call static function update_max_latency() which manipulates and updates local copy of max_latency_cyc. The max_latency_cyc variable used by pm_policy_next_state() in new copied file doesn't get updated which is wrong. So we have to change and add code to deal with and this brings risk to already working well ish PM.

@kwd-doodling
Copy link
Collaborator

could #62550 be merged? I think it's good to solve the issue Leifu mentioned when implementing POLICY_CUSTOM.

@kwd-doodling
Copy link
Collaborator

could #62550 be merged? I think it's good to solve the issue Leifu mentioned when implementing POLICY_CUSTOM.
@ceolin @nashif is it possible to get #62550 merged quickly? ISH really needs it.

@nashif
Copy link
Member

nashif commented Sep 15, 2023

could #62550 be merged? I think it's good to solve the issue Leifu mentioned when implementing POLICY_CUSTOM.

was trying to show possible solutions, I know it is not great, and I think we can do better than that introducing hooks rather than reimplementation of complete functions.

Copy link
Member

@nashif nashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have a followup proposal to improve use of custom policies without duplicating much code...

@nashif nashif dismissed gmarull’s stale review September 15, 2023 14:06

no response, please revisit.

@ceolin
Copy link
Member

ceolin commented Sep 15, 2023

Hi, I'll re-enforce the +1 here. I think this is a reasonable constraint and it is protected under a build option, it is nothing crazy. The custom policy is supposed (or at least documented) to be used by the application, so doing it here would not allow applications to override its behavior. I was thinking about it ... much of the logic inside the default policy should be available to be used by custom policies ... the way it is now is, at least. ... painful

@nashif nashif merged commit 61ab3a8 into zephyrproject-rtos:main Sep 15, 2023
17 checks passed
@gmarull
Copy link
Member

gmarull commented Sep 18, 2023

So yet another abuse of powers mergings PRs, and then we are opening proposals like this #61358, I guess they're a joke.

gmarull added a commit to teslabs/zephyr that referenced this pull request Sep 18, 2023
I disagree on the way the subsystem is maintained. For more context
refer to zephyrproject-rtos#61726 or
https://discord.com/channels/720317445772017664/997527108844798012/
1153288380231208960

Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
nashif pushed a commit that referenced this pull request Sep 18, 2023
I disagree on the way the subsystem is maintained. For more context
refer to #61726 or
https://discord.com/channels/720317445772017664/997527108844798012/
1153288380231208960

Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
coreboot-org-bot pushed a commit to coreboot/zephyr-cros that referenced this pull request Sep 19, 2023
I disagree on the way the subsystem is maintained. For more context
refer to zephyrproject-rtos/zephyr#61726 or
https://discord.com/channels/720317445772017664/997527108844798012/
1153288380231208960

(cherry picked from commit 4ba97c2)

Original-Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
GitOrigin-RevId: 4ba97c2
Change-Id: I2cf897da2b65b4068a111ec0cd3016ad2d351f5b
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/4875292
Reviewed-by: Al Semjonovs <asemjonovs@google.com>
Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com>
Commit-Queue: Al Semjonovs <asemjonovs@google.com>
Tested-by: Al Semjonovs <asemjonovs@google.com>
Dat-NguyenDuy pushed a commit to nxp-zephyr/zephyr that referenced this pull request Sep 20, 2023
I disagree on the way the subsystem is maintained. For more context
refer to zephyrproject-rtos#61726 or
https://discord.com/channels/720317445772017664/997527108844798012/
1153288380231208960

Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
coran21 pushed a commit to coran21/zephyr that referenced this pull request Sep 21, 2023
I disagree on the way the subsystem is maintained. For more context
refer to zephyrproject-rtos#61726 or
https://discord.com/channels/720317445772017664/997527108844798012/
1153288380231208960

Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Power Management platform: Intel ISH Intel Corporation, Integrated Sensor Hub platform: X86 x86 and x86-64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants