New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SoundWire: TGL: IO error in rt711_jack_detect_handler #3459
Comments
not sure if this is related to #3401 |
Not able to reproduce with short tests on Dell SKU 0A3E that uses RT711 on link1
@bardliao can you try to see if this bug can be reproduced on your TGL devices with IPC3? |
@plbossart I tested it on my Dell SKU 0A3E and it passed. I run the test twice and both iterations are passed. |
A race condition can happen when the peripheral devices are already suspended but a hardware peripheral attaches to the link before we turn it off. In that case, the rt711 driver schedules workqueues, which can lead to transactions happening when the hardware is turned off. This patch makes sure the workqueues are scheduled only when the peripheral devices have completed their resume routine, or are already at full power. BugLink: thesofproject#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
A race condition can happen when the peripheral devices are already suspended but a hardware peripheral attaches to the link before we turn it off. In that case, the rt711 driver schedules workqueues, which can lead to transactions happening when the hardware is turned off. This patch makes sure the workqueues are scheduled only when the peripheral devices have completed their resume routine, or are already at full power. BugLink: thesofproject#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
P1 because it's happening consistently: https://sof-ci.01.org/sofpr/PR5485/build12238/devicetest/?model=TGLU_RVP_SDW&testcase=check-suspend-resume-with-playback-5 These test failures happened on different units. What changed recently? Big kernel merge? Interesting that searching for something as specific as |
@marc-hb can you help find the first report of a failure? I am down to a spurious pm_runtime resume after a system suspend, no sure what is happening. |
I think this came with 5.17, maybe with According to the TCR database, the very first check-suspend-playback/capture failure on TGLU_RVP_SDW happened on January 29th: This was with
Starting from January 29th I see one of two daily-tests failing: PASS, FAIL, PASS, FAIL,... There's no branch information in the TCR database but it looks like the From February 2nd to February 24th, daily tests are consistently passing again. After February 24th, daily tests are consistently failing except this one (which was on 5.16):
The first SOF Pull Requests started failing these two tests on TGLU_RVP_SDW on March 1st and then they almost never passed. #3461 was merged on Feb 28th |
@marc-hb, @bardliao, @plbossart, I think the reason for the error is:
When it fails:
So in case of error print the When there is no error we also receive interrupt, which is expected or not, I don't know. We might be missing a cancel work calls in case the sdw is disabling the link or we need a check if the link is up at all? |
@plbossart @marc-hb @ujfalusi It took me more than 6 hours to bisect this issue, lucky me, I found the bad commit:
I tried to revert this commit on tip of sof-dev and verified this issue again, 3*30 iterations passed, no reproductions. |
@keqiaozhang, wow! What I have found that the issue is really easy to reproduce on tglu-rvp-sdw:
Don't forget to
If you would try Can you see if this is working as well? For me this is 100% reproduction rate. |
@keqiaozhang, I have checked as well and indeed the kernel with revered e38f9ff is fine. Let me attach the two dmesg, which is recorded before executing the rtcwake. @plbossart, @bardliao FYI. Thanks @keqiaozhang !! |
yes, the issue is when we are playing on the speakers while the jack out path controlled by rt711 is suspended. |
https://en.wikipedia.org/wiki/Butterfly_effect |
@plbossart, #3509 will show what is skipped after that patch, when the results are in just: The patch should exclude some CIDs in case the HID is not set for the device, right? So those CIDs are not going to be The issue I think is that the rt711 got woken resumed without sdw resume. Later on I think it is a bit different since there sdw is resuming and then the rt711. Or not. Let's wait for the results and check the dmesg, I hope it will give some hints. Bdw: is this only happening with RVP? Do we have other device with rt711 (not the rt711-sdca)? Can it be broken ACPI table? |
Is it worth comparing the output of acpidump with and without e38f9ff maybe? I diff-ed two acpidumps recently for a totally different reason and that wasn't too hard even with my ACPI ignorance. |
We have a list on TGLU_RVP_SDW device:
These are not |
I think this is the problem, the only two devices on RVP that don't have an _HID are audio things (SoundWire and USB Audio offload).
In theory it's not valid to have both _ADR and _CID, but that has been the practice for many moons already. |
@ujfalusi things are fine because CML_RVP_SDW uses a different RT700 codec where we don't schedule a jack detection workqueue. It's a combination of a) doing a pm_runtime_resume before suspend (which is really not necessary if the codec is already pm_runtime suspended) |
Here's the TGL_RVP_SDW DSDT for reference. |
This failure has been impacting every single pull request and daily run for about 10 days. It's happening only for a specific test on a very specific product. While it seems far from resolution, its reproduction seems to be under control, it even has been bisected. Can we SKIP testing suspend/resume only on that particular product until this is fixed? We already know it keeps failing every time. |
Right, but there is ADLP_RVP_SDW, which have the exact rt711 and it schedules jack detection, yet it passes.
Hmm, but the driver cancels the delayed works on suspend. What we see in the logs is that the hardware got initialized, jack is scheduled then all sdw links got disabled, there could be a runtime suspend after this, it is not visible, but it would leave the works in flight.
It is not that uncommon to try to detect the jack state after resume (the codec was off and the user could inserted / removed / changed the connected head*). |
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit fd18fb38d6a4c726390835d45aaef08e0ef6ccd9) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
BugLink: https://bugs.launchpad.net/bugs/1982968 [ Upstream commit 6d9f2da ] commit e38f9ff ("ACPI: scan: Do not add device IDs from _CID if _HID is not valid") exposes a race condition on a TGL RVP device leading to a timeout. The detailed analysis shows the RT711 codec driver scheduling a jack detection workqueue while attaching during a spurious pm_runtime resume, and the work function happens to be scheduled after the manager device is suspended. The direct link between this ACPI patch and a spurious pm_runtime resume is not obvious; the most likely explanation is that a change in the ACPI device linked list management modifies the order in which the pm_runtime device status is checked and exposes a race condition that was probably present for a very long time, but was not identified. We already have a check in the .prepare stage, where we will resume to full power from specific clock-stop modes. In all other cases, we don't need to resume to full power by default. Adding the SMART_SUSPEND flag prevents the spurious resume from happening. BugLink: thesofproject/linux#3459 Fixes: 029bfd1 ("soundwire: intel: conditionally exit clock stop mode on system suspend") Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
[ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e557bca49b812908f380c56b5b4b2f273848b676 ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3249b1 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org>
Source: Kernel.org MR: 127843 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: 7d949774e7c1a93980db4961b898313ec09744c3 Description: [ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
Source: Kernel.org MR: 127843 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: 7d949774e7c1a93980db4961b898313ec09744c3 Description: [ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
Source: Kernel.org MR: 127843 Type: Integration Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y ChangeID: 7d949774e7c1a93980db4961b898313ec09744c3 Description: [ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Armin Kuster <akuster@mvista.com>
[ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 7996facaf0ee3829386aa388183f6fffab95e1af) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
stable inclusion from stable-v5.10.190 commit 7d949774e7c1a93980db4961b898313ec09744c3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I928UI Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d949774e7c1a93980db4961b898313ec09744c3 -------------------------------- [ Upstream commit e557bca ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>
stable inclusion from stable-5.10.190 commit 7d949774e7c1a93980db4961b898313ec09744c3 category: bugfix issue: #I9D31L CVE: NA Signed-off-by: wanxiaoqing <wanxiaoqing@huawei.com> --------------------------------------- [ Upstream commit e557bca49b812908f380c56b5b4b2f273848b676 ] In typical use cases, the peripheral becomes pm_runtime active as a result of the ALSA/ASoC framework starting up a DAI. The parent/child hierarchy guarantees that the manager device will be fully resumed beforehand. There is however a corner case where the manager device may become pm_runtime active, but without ALSA/ASoC requesting any functionality from the peripherals. In this case, the hardware peripheral device will report as ATTACHED and its initialization routine will be executed. If this initialization routine initiates any sort of deferred processing, there is a possibility that the manager could suspend without the peripheral suspend sequence being invoked: from the pm_runtime framework perspective, the peripheral is *already* suspended. To avoid such disconnects between hardware state and pm_runtime state, this patch adds an asynchronous pm_request_resume() upon successful attach/initialization which will result in the proper resume/suspend sequence to be followed on the peripheral side. BugLink: thesofproject/linux#3459 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220420023241.14335-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Stable-dep-of: c40d6b3249b1 ("soundwire: fix enumeration completion") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: wanxiaoqing <wanxiaoqing@huawei.com>
@bardliao @shumingfan FYI, that's a new one.
https://sof-ci.01.org/linuxpr/PR3454/build7233/devicetest/?model=TGLU_RVP_SDW&testcase=check-suspend-resume-with-capture-5
dmesg.log
The text was updated successfully, but these errors were encountered: