Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CML][BUG]Occurred "rt715 sdw:3:25d:715:0: Clock Stop prepare failed for slave: -61" during suspend/resume with aplay0,5 stress test. #1678

Closed
YvonneYang2 opened this issue Jan 9, 2020 · 15 comments
Labels
bug Something isn't working CML Applies to Comet Lake platform SDW_DEV SDW bug reproduced by developers, fix WIP SDW Applies to SoundWire bus for codec connection

Comments

@YvonneYang2
Copy link

YvonneYang2 commented Jan 9, 2020

Describe the bug
Occurred "rt715 sdw:3:25d:715:0: Clock Stop prepare failed for slave: -61" during suspend/resume with aplay0,5 stress test. When issue occurred, arecord0,4(RT715) cannot work.

This issue seems same as #1636, but issue cannot be reproduced in 500 iterations of suspend/resume without playback or with aplay0,0 playback stress test.

To Reproduce

  1. Boot up with options snd-sof-intel-hda-common sdw_clock_stop_quirks=0x8
  2. Run command " aplay -Dhw:0,5 -r48000 -c2 -fs16_le /dev/zero & sudo ./stress-s3.sh 500 1 1"
  3. Check logs after stress test.
    stress-s3.sh.txt

Reproduced rate
Round1: run 500 times, failed at 412th time.
Round2: run 500 times, failed at 308th time.

Expected result
No error occurred during stress test.

Actual result
Occurred "rt715 sdw:3:25d:715:0: Clock Stop prepare failed for slave: -61" during suspend/resume with aplay0,5 stress test. Arecord0,4 cannot work after issue occurred.

Test recipe:
kernel: https://github.com/thesofproject/linux/tree/integration/soundwire-latest commit: 07348fa
FW: https://github.com/thesofproject/sof/commits/cml-010-drop-stable commit: 5e5f69e
tplg: Same with FW branch, sof-cml-rt711-rt1308-mono-rt715.tplg
platform: CML-U Laptop with codec ALC711, RT1308, RT715 in SDW mode

dmesg:

[ 5413.575747] intel-sdw sdw-master-3: sdw_handle_slave_status: start
[ 5413.575751] intel-sdw sdw-master-3: Slave attached, programming device number
[ 5413.576348] intel-sdw sdw-master-3: Msg ignored for Slave 1
[ 5413.576359] rt715 sdw:3:25d:715:0: Clock Stop prepare failed for slave: -61
[ 5413.576367] rt715 sdw:3:25d:715:0: pre-prepare failed:-61
[ 5413.576374] intel-sdw sdw-master-3: prepare clock stop failed -61
[ 5413.576380] intel-sdw sdw-master-3: cannot enable clock stop on suspend
[ 5413.576572] intel-sdw sdw-master-3: SDW Slave Addr: 21025d071500
[ 5413.576578] intel-sdw sdw-master-3: SDW Slave class_id 0, part_id 715, mfg_id 25d, unique_id 1, version 2
[ 5413.576581] intel-sdw sdw-master-3: in sdw_assign_device_num
[ 5413.576584] intel-sdw sdw-master-3: in sdw_assign_device_num

test_412.log
s3_test_412_logger_all.log

@YvonneYang2 YvonneYang2 added SDW Applies to SoundWire bus for codec connection bug Something isn't working CML Applies to Comet Lake platform labels Jan 9, 2020
@YvonneYang2
Copy link
Author

YvonneYang2 commented Jan 9, 2020

Will check more times in suspend/resume with all kinds of pipline playback and without playback test.

@plbossart
Copy link
Member

This looks like a case where the RT715 becomes UNATTACHED during the clock stop sequence. Not sure how this can happen, but it's really tricky to debug

plbossart added a commit to plbossart/sound that referenced this issue Jan 9, 2020
Intel QA reported a race condition when a Slave can become UNATTACHED
during a clock stop sequence, which leads to timeouts and failed
suspend sequences.

This patch suppresses the handling of all Slave events to see if this
'fixes' the issue.

A better solution would be to delay the suspend operation or retry
later. In case the Slave does need to remain functional while in clock
stop mode, it's probably not a good idea to ignore the fact that an
error occurred.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart
Copy link
Member

I am a bit puzzled by this bug

a) It's not clear to me why keeping audio playback (be it with hw:0,0 with SoundWire of hw:0,5 with HDMI) would have an impact. In the clock_quirks=0x0 option, we play with pm_runtime reference counts so that the DSP remains powered. @YvonneYang2 can you clarify if using aplay is a requirement to see this issue?

b) This seems to be a case where we suspend, so in theory we lose state, the DSP is not powered, etc. It's not clear to me if the Slave becomes UNATTACHED on its own, or as a result of the clock being touched. It'd be interesting to have longer traces to see what happened prior to the error (full dmesg)

c) I suggested a tentative work-around to prevent the Slave from being re-enumerated, but I am not fully happy with the solution. Still if PR #1682 makes the problem go away it'd be a good way to analyze further.

@plbossart plbossart added the SDW_DEV SDW bug reproduced by developers, fix WIP label Jan 9, 2020
@RanderWang
Copy link

@YvonneYang2 Can you help to attach test_411.log ? this is important for debugging

@sinahuang
Copy link

Test 1000 times on CML-H with soudwire-latest(commit:07348fa), issue can't be reproduced.

@YvonneYang2
Copy link
Author

@RanderWang
test_411.log is as attached.

test_441.log

@RanderWang
Copy link

@RanderWang
test_411.log is as attached.

test_441.log

thanks. But it is 441.log, how about 411.log

@YvonneYang2
Copy link
Author

@RanderWang
test_411.log is as attached.
test_441.log

thanks. But it is 441.log, how about 411.log

Sorry for the wrong log. Here you are test_411.log
test_411.log

@YvonneYang2
Copy link
Author

YvonneYang2 commented Jan 10, 2020

I am a bit puzzled by this bug

a) It's not clear to me why keeping audio playback (be it with hw:0,0 with SoundWire of hw:0,5 with HDMI) would have an impact. In the clock_quirks=0x0 option, we play with pm_runtime reference counts so that the DSP remains powered. @YvonneYang2 can you clarify if using aplay is a requirement to see this issue?

Many sorry for wrong typo in bug description.. Issue is reproduced with clock_quirks=0x8 option, not clock_quirks=0x0 option. Aplay with hw:0,5 is the requirement to see this issue. Only aplay0,5 can reproduce the issue. Suspend/resume with aplay0,0 or without playback cannot reproduce it.

Update: Issue can be reproduced with aplay0,0.

@plbossart
Copy link
Member

Thanks @YvonneYang2, it actually makes more sense with the quirk 0x8. if the parent device remains active, but SoundWire is suspended things could go sideways indeed.

@YvonneYang2
Copy link
Author

Tested it 1000 times with latest integration/soundwire-latest + #1683 , issue cannot be reproduced.

@YvonneYang2
Copy link
Author

Tested it 1000 times with latest integration/soundwire-latest + #1682 , issue cannot be reproduced.

plbossart added a commit that referenced this issue Jan 10, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 10, 2020
Intel QA reported a very rare case, possibly hardware-dependent, where
a Slave can become UNATTACHED during a clock stop sequence, which
leads to timeouts and failed suspend sequences.

This patch suppresses the handling of all Slave events while this
transition happens. The two cases that matter are:

a) alerts: if the Slave wants to signal an alert condition, it can do
so using the in-band wake, so there's almost no impact with this
patch.

b) sync loss or imp-def reset: in those cases, bringing back the Slave
to functional state requires a complete re-enumeration. It's better to
just ignore this case and restart cleanly, rather than attempt a
'clean' suspend.

Validation results show the timeouts no longer visible with this patch.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 10, 2020
Intel QA reported a very rare case, possibly hardware-dependent, where
a Slave can become UNATTACHED during a clock stop sequence, which
leads to timeouts and failed suspend sequences.

This patch suppresses the handling of all Slave events while this
transition happens. The two cases that matter are:

a) alerts: if the Slave wants to signal an alert condition, it can do
so using the in-band wake, so there's almost no impact with this
patch.

b) sync loss or imp-def reset: in those cases, bringing back the Slave
to functional state requires a complete re-enumeration. It's better to
just ignore this case and restart cleanly, rather than attempt a
'clean' suspend.

Validation results show the timeouts no longer visible with this patch.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 11, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 11, 2020
Intel QA reported a very rare case, possibly hardware-dependent, where
a Slave can become UNATTACHED during a clock stop sequence, which
leads to timeouts and failed suspend sequences.

This patch suppresses the handling of all Slave events while this
transition happens. The two cases that matter are:

a) alerts: if the Slave wants to signal an alert condition, it can do
so using the in-band wake, so there's almost no impact with this
patch.

b) sync loss or imp-def reset: in those cases, bringing back the Slave
to functional state requires a complete re-enumeration. It's better to
just ignore this case and restart cleanly, rather than attempt a
'clean' suspend.

Validation results show the timeouts no longer visible with this patch.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@YvonneYang2
Copy link
Author

Tested it 300 times with topic/sof-dev(921ce33) + #1692 , issue cannot be reproduced.

plbossart added a commit to plbossart/sound that referenced this issue Jan 13, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jan 13, 2020
Intel QA reported a very rare case, possibly hardware-dependent, where
a Slave can become UNATTACHED during a clock stop sequence, which
leads to timeouts and failed suspend sequences.

This patch suppresses the handling of all Slave events while this
transition happens. The two cases that matter are:

a) alerts: if the Slave wants to signal an alert condition, it can do
so using the in-band wake, so there's almost no impact with this
patch.

b) sync loss or imp-def reset: in those cases, bringing back the Slave
to functional state requires a complete re-enumeration. It's better to
just ignore this case and restart cleanly, rather than attempt a
'clean' suspend.

Validation results show the timeouts no longer visible with this patch.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 15, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 15, 2020
Intel QA reported a very rare case, possibly hardware-dependent, where
a Slave can become UNATTACHED during a clock stop sequence, which
leads to timeouts and failed suspend sequences.

This patch suppresses the handling of all Slave events while this
transition happens. The two cases that matter are:

a) alerts: if the Slave wants to signal an alert condition, it can do
so using the in-band wake, so there's almost no impact with this
patch.

b) sync loss or imp-def reset: in those cases, bringing back the Slave
to functional state requires a complete re-enumeration. It's better to
just ignore this case and restart cleanly, rather than attempt a
'clean' suspend.

Validation results show the timeouts no longer visible with this patch.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jan 16, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
ranj063 pushed a commit that referenced this issue Jun 30, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Jul 1, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit that referenced this issue Jul 3, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jul 7, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
kv2019i pushed a commit to kv2019i/linux that referenced this issue Jul 10, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
kv2019i pushed a commit that referenced this issue Jul 15, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jul 17, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jul 21, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
ranj063 pushed a commit that referenced this issue Aug 3, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
ranj063 pushed a commit that referenced this issue Aug 12, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Aug 14, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 17, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 17, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 17, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 18, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Aug 20, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Aug 20, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Aug 27, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Aug 31, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Sep 1, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Sep 1, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Sep 1, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: #1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Sep 2, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Sep 2, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Sep 4, 2020
If we enabled the clock stop mode and suspend, we need to disable the
shim wake. We do so only if the parent is pm_runtime active due to
power rail dependencies.

GitHub issue: thesofproject#1678
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20200901150556.19432-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CML Applies to Comet Lake platform SDW_DEV SDW bug reproduced by developers, fix WIP SDW Applies to SoundWire bus for codec connection
Projects
None yet
Development

No branches or pull requests

4 participants