Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][SDW][CML] clock stop prepare issue - Speaker cannot work after suspend/resume #2606

Closed
bardliao opened this issue Dec 7, 2020 · 27 comments
Assignees
Labels
bug Something isn't working Clock Stop prepare failed SoundWire Clock Stop prepare failed CML Applies to Comet Lake platform P3 Low-impact bugs or features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@bardliao
Copy link
Collaborator

bardliao commented Dec 7, 2020

Canonical reported an issue that speaker can't work on CML + dual rt1308 laptop.
The issue happens only once so far.
The error kernel log is

rt1308 sdw:1:25d:1308:0: Clock Stop prepare failed for slave: -61
rt1308 sdw:1:25d:1308:0: pre-prepare failed:-61
intel-master sdw-master-1: prepare clock stop failed -61
intel-master sdw-master-1: cannot enable clock stop on suspend
...
rt1308 sdw:1:25d:1308:0: Unable to sync register 0xc030. -22
intel-master sdw-master-1: pm_runtime_get_sync failed in intel_startup, ret -22
intel-master sdw-master-1: ASoC: can't open DAI SDW1 Pin2: -22
SDW1-Playback: ASoC: BE open failed -22
SDW1-speakers: ASoC: failed to start some BEs -22

The issue is that somehow we didn't get ACK from codec and then cause clock stop prepare failed.
The question is that do we have a way to recover from no ack received?

@plbossart
Copy link
Member

can we have a longer dmesg trace @bardliao ?

What we could do in this case is just to mark the device as not functional, suspend anyways and reset the bus on startup.

But that looks like a corner case for now.

@kv2019i kv2019i added bug Something isn't working SDW Applies to SoundWire bus for codec connection labels Dec 8, 2020
@plbossart plbossart changed the title [BUG][SDW][CML] Speaker cannot work after suspend/resume [BUG][SDW][CML] clock stop prepare issue - Speaker cannot work after suspend/resume Dec 11, 2020
@mengdonglin mengdonglin added CML Applies to Comet Lake platform P2 Critical bugs or normal features Clock Stop prepare failed SoundWire Clock Stop prepare failed labels Dec 14, 2020
@mengdonglin
Copy link
Collaborator

@bardliao @plbossart It seems this issue can be reproduced with SDCA codec ALC1316 on TGL-U laptop device. Please check CI report 1118 on 2020-12-10 (UTC). Failed case: check-suspend-resume-with-playback-25.sh

dmesg

kernel: [   61.208533] PM: suspend exit
kernel: [   61.683079] sof-audio-pci 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG
kernel: [   61.683106] sof-audio-pci 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG
kernel: [   62.183129] sof-audio-pci 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG
kernel: [   62.183153] sof-audio-pci 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG
kernel: [   64.876111] intel-sdw intel-sdw.2: Msg ignored for Slave 1
kernel: [   64.876120] rt1316-sdca sdw:2:25d:1316:1: Clock Stop prepare failed for slave: -61
kernel: [   64.876130] rt1316-sdca sdw:2:25d:1316:1: pre-prepare failed:-61
kernel: [   64.876135] intel-sdw intel-sdw.2: prepare clock stop failed -61
kernel: [   64.876141] intel-sdw intel-sdw.2: cannot enable clock stop on suspend

@mengdonglin mengdonglin added TGL Applies to Tiger Lake platform P1 Blocker bugs or important features duplicate This issue or pull request already exists and removed P2 Critical bugs or normal features labels Dec 14, 2020
@mengdonglin
Copy link
Collaborator

mengdonglin commented Dec 14, 2020

@plbossart @bardliao Can we take this issue as a duplicate of #2621 ?

@mengdonglin mengdonglin added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Dec 14, 2020
@plbossart
Copy link
Member

@mengdonglin please keep issues separate. it's not useful if we have the same bugs for CML+SoundWire 1.1 devices and TGL+SDCA. We don't know if the issues are the same or not. For now let's have a different bug for each sighting, thank you.

@mengdonglin mengdonglin added P1 Blocker bugs or important features P2 Critical bugs or normal features and removed duplicate This issue or pull request already exists TGL Applies to Tiger Lake platform P2 Critical bugs or normal features P1 Blocker bugs or important features labels Dec 21, 2020
@bardliao
Copy link
Collaborator Author

@shumingfan said that there are only three scenarios that codec will not ack to the command.

  1. Device number is wrong
  2. Codec is powered down
  3. Trying to write a read only register.

We have checked device number by #2634 and the device number is correct. And we don't think it is case 3 unless there is something wrong in transportation. So the highest possibility is case 2.

@plbossart
Copy link
Member

@bardliao I wonder if there's something in the clock stop prepare that turns something off, and it's not restored on resume?
Alternatively, what can happen is that the codec loses sync on clock stop, and that would cause the clock stop to fail.

@bardliao
Copy link
Collaborator Author

@bardliao I wonder if there's something in the clock stop prepare that turns something off, and it's not restored on resume?
Alternatively, what can happen is that the codec loses sync on clock stop, and that would cause the clock stop to fail.

@plbossart But shouldn't we get -ETIMEDOUT instead of -ENODATA if the codec loses sync?

@plbossart
Copy link
Member

@bardliao the 'COMMAND_IGNORED' reply is provided immediately, at the end of the 48-bit command word. It's really a 'no reply' and it's clearly not deferred as a -ETIMEDOUT hint at.

@bardliao
Copy link
Collaborator Author

I tried to read SDW_SCP_STAT (0x44) in some places to check when codec starts to not ack. Based on the test result now, we can read SDW_SCP_STAT properly at the beginning of intel_suspend_runtime(), and -ENODATA can be seen before sdw_bus_prep_clk_stop call sdw_slave_clk_stop_callback. So I think we lost codec in the sdw_cdns_clock_stop() function and I added some sdw_bread_no_pm_unlocked(&cdns->bus, 1, SDW_SCP_STAT); in sdw_cdns_clock_stop() to see if I can find the exactly command that cause the problem. The test is still on going hope I can get the result in 24 hr.

@bardliao bardliao self-assigned this Jan 14, 2021
@bardliao
Copy link
Collaborator Author

The -61 error happened after cdns_config_update. I am wondering if any cdns configuration is different between error and normal cases.
I added cdns register 0x0 to 0x1c dump before cdns_config_update. Hope I can see something in the test.

plbossart added a commit to plbossart/sound that referenced this issue Jun 23, 2021
Where the SoundWire manager is pm_runtime suspended with the
clock-stop mode enabled, we currently do nothing on Intel platforms
before entering system suspend.

This is problematic since the hardware does not seem to fully reset
during system resume, we have e.g. been tracking issues related to
this clock stop mode for the last 6 months.

The power management framework already defines a
DPM_FLAG_SMART_SUSPEND flag. This patch exposes a bus .suspend
callback that will handle a pm_runtime_resume when this flag is not
set.

FIXME: need to find a way to set this flag or not depending on parent
capabilities.

BugLink: thesofproject#2606
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jun 25, 2021
Where the SoundWire manager is pm_runtime suspended with the
clock-stop mode enabled, we currently do nothing on Intel platforms
before entering system suspend.

This is problematic since the hardware does not seem to fully reset
during system resume, we have e.g. been tracking issues related to
this clock stop mode for the last 6 months.

The power management framework already defines a
DPM_FLAG_SMART_SUSPEND flag. This patch exposes a bus .suspend
callback that will handle a pm_runtime_resume when this flag is not
set.

FIXME: need to find a way to set this flag or not depending on parent
capabilities.

BugLink: thesofproject#2606
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jun 25, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jun 28, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jun 28, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jun 29, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jul 6, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart
Copy link
Member

let's close this issue, we haven't seen any problems except on CML_RVP so far, and we already track clock stop issues in #3012

plbossart added a commit to plbossart/sound that referenced this issue Jul 13, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jul 15, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jul 16, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jul 20, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jul 26, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jul 27, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jul 27, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 2, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Aug 11, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Aug 12, 2021
Intel validation reported an issue where the HW_RST self-clearing bit
is not cleared in hardware, which as a ripple effect creates issues
with the clock stop mode.

This happens is a specific sequence where the Intel manager is
pm_runtime suspended with the clock-stop mode enabled. During the
system suspend, we currently do nothing, which can lead to potential
issues on system resume and the following pm_runtime suspend,
depending on the hardware state.

This patch suggests a full resume (parent+child devices) if the
clock-stop mode is used. This may require extra time but will make the
suspend/resume flows completely symmetric. This also removes a race
condition where we could not access SHIM registers if the parent was
suspended as well. Resuming the link also resumes the parent by
construction.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
plbossart added a commit that referenced this issue Aug 16, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
ujfalusi pushed a commit that referenced this issue Aug 17, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: #2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Aug 18, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
roxell pushed a commit to roxell/linux that referenced this issue Aug 24, 2021
Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210818024954.16873-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
@marc-hb marc-hb added the suspend resume Issues related to suspend resume (e.g. rtcwake) label Jul 19, 2022
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 2, 2022
…tem suspend

Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject/linux#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210818024954.16873-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit 029bfd1)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I7d32151bc2e378c5a6a3923e9c57fccbbbca1972
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 4, 2022
…tem suspend

Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject/linux#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210818024954.16873-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit 029bfd1)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I7d32151bc2e378c5a6a3923e9c57fccbbbca1972
sboyanex pushed a commit to aketi-subbu/Backport-series that referenced this issue Sep 12, 2022
…tem suspend

Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject/linux#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210818024954.16873-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit 029bfd1)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I1c5c43bdade09f58ed0b9e208d67790596fa0acf
sboyanex pushed a commit to BijinaSafiyaa/Backport-series-jairaj that referenced this issue Sep 13, 2022
…tem suspend

Intel stress tests reported issues with the clock stop mode,
specifically when trying to do a system suspend while the link is
already pm_runtime suspended.

In this case, we need to disable the shim wake, but when the PCI
parent device is also pm_runtime suspended the SHIM registers are not
accessible.

Since this is an invalid corner case, this patch suggests a pm_runtime
resume of the entire bus to full power (parent+child devices) before
the system suspend so that the shim wake can be disabled.

Unlike the suspend operation, the .prepare callbacks are propagated
from root device to leaf devices. By adding a .prepare callback at the
SoundWire link level, we can double-check the pm_runtime status of the
device as well as its parent PCI device. When the problematic
configuration is detected, the device is pm_runtime resumed - which by
construction also resume its parent.

An additional loop is added to resume all child devices. In theory we
only need to restart the link, but doing so will also cause the
physical devices to synchronize and re-initialize, while their Linux
devices remain pm_runtime suspended. It's simpler to make sure the
codec devices are fully resumed so that we don't have to deal with
zombie states.

This additional loop could have been avoided by adding a .prepare
callback in SoundWire codec drivers. Functionally this would have been
equivalent. The rationale for implementing a loop at the link level is
only to reduce the amount of code required to deal at the codec level
with an Intel corner case - in other words keep codec drivers
independent from Intel platform-specific programming sequences.

BugLink: thesofproject/linux#2606
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20210818024954.16873-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit 029bfd1)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I2de96f33061b828a3da947e0508390a8c224aab3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Clock Stop prepare failed SoundWire Clock Stop prepare failed CML Applies to Comet Lake platform P3 Low-impact bugs or features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

No branches or pull requests

5 participants