Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [SDW] IO error in rt700_jack_detect_handler on CML_RVP_SDW #2943

Closed
XiaoyunWu6666 opened this issue May 27, 2021 · 27 comments
Closed

[BUG] [SDW] IO error in rt700_jack_detect_handler on CML_RVP_SDW #2943

XiaoyunWu6666 opened this issue May 27, 2021 · 27 comments
Labels
bug Something isn't working CML Applies to Comet Lake platform Intel Daily tests This issue can be found in Intel internal daily tests P2 Critical bugs or normal features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@XiaoyunWu6666
Copy link

XiaoyunWu6666 commented May 27, 2021

Describe the bug
Appeared on May 27 internal daily test , 4235 . in case check-suspend-resume-50
IO error in rt700_jack_detect_handler on CML_RVP_SDW

To Reproduce
TPLG=sof-cml-rt700-4ch.tplg ~/sof-test/test-case/check-suspend-resume.sh -l 50

Reproduction Rate
100%

Model
CML_RVP_SDW

Environment
Kernel Branch: topic/sof-dev
Kernel Commit: 6dca03a
SOF Branch: main
SOF Commit: d736888

Screenshots or console output
[console]

2021-06-01 03:03:18 UTC [INFO] ===== Round(1/50) =====
2021-06-01 03:03:18 UTC [COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Jun  1 03:03:24 2021
2021-06-01 03:03:26 UTC [COMMAND] sleep for 5
2021-06-01 03:03:31 UTC [INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1622516593"
2021-06-01 03:03:31 UTC [ERROR] Caught kernel log error
===========================>>
[  526.518862] kernel: IO error in rt700_jack_detect_handler, ret -16
<<===========================
2021-06-01 03:03:31 UTC [ERROR] Caught error in kernel log

@lgirdwood lgirdwood transferred this issue from thesofproject/sof May 27, 2021
@plbossart
Copy link
Member

dmesg.txt

@plbossart
Copy link
Member

@XiaoyunWu6666 is this really 100% reproducible? If yes, can you try reverting PR #2941 to see if this makes the problem go away?

@plbossart plbossart added CML Applies to Comet Lake platform SDW Applies to SoundWire bus for codec connection labels May 27, 2021
@plbossart plbossart changed the title [BUG] IO error in rt700_jack_detect_handler on CML_RVP_SDW [BUG] [SDW] IO error in rt700_jack_detect_handler on CML_RVP_SDW May 27, 2021
@plbossart
Copy link
Member

@shumingfan @bardliao I wonder if there is a potential for a race condition. The set_jack callback is invoked from the machine driver level, and I wonder if we should have a pm_runtime_get_sync()/ pm_runtime_put_autosuspend() protection when we try to initialize the jack. The current test on hw_init is a bit flaky IMHO. I also wonder if we should set the jack to NULL when we suspend, as done for a lot of cards.

@shumingfan
Copy link

@shumingfan @bardliao I wonder if there is a potential for a race condition. The set_jack callback is invoked from the machine driver level, and I wonder if we should have a pm_runtime_get_sync()/ pm_runtime_put_autosuspend() protection when we try to initialize the jack. The current test on hw_init is a bit flaky IMHO. I also wonder if we should set the jack to NULL when we suspend, as done for a lot of cards.

@plbossart I also think it is possible that the unexpected interrupt happened after the codec suspended.
@XiaoyunWu6666 Could you test with PR #2949 and turn on the debug log of the codec driver?
If the test result still fails, please share the kernel log.

@XiaoyunWu6666
Copy link
Author

@XiaoyunWu6666 Could you test with PR #2949 and turn on the debug log of the codec driver?
If the test result still fails, please share the kernel log.

Tested with PR #2949 and the test result still failed.
The kernel log is as the attachment.
dmesg.txt

@shumingfan
Copy link

shumingfan commented May 31, 2021

Tested with PR #2949 and the test result still failed.
The kernel log is as the attachment.
dmesg.txt

@XiaoyunWu6666 Could you enable the debug log of the codec driver and share the kernel log again?

@mengdonglin mengdonglin added the bug Something isn't working label May 31, 2021
@keqiaozhang keqiaozhang added the P2 Critical bugs or normal features label May 31, 2021
@XiaoyunWu6666
Copy link
Author

@shumingfan
Copy link

@shumingfan Sure
dmesg_with_codeclog.txt

@bardliao @plbossart I checked this log. The system is so quick going to suspend from the resume state when the issue happened.

[ 170.958907] rt700 sdw:1:025d:0700:00: in rt700_dev_resume jd enable
[ 170.958987] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3501 <= 0003
[ 170.959013] soundwire_intel soundwire_intel.link.1: Slave status change: 0x40
[ 170.959079] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3501 <= 0000
[ 170.959312] rt700 sdw:1:025d:0700:00: Slave impl defined interrupt
[ 170.959316] rt700 sdw:1:025d:0700:00: rt700_interrupt_callback control_port_stat=4
[ 170.959407] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3819 <= 0000
[ 170.959676] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3821 <= 0000
[ 170.959761] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3855 <= 0000
[ 170.959765] rt700 sdw:1:025d:0700:00: in rt700_dev_suspend jd disable
[ 170.959934] rt700 sdw:1:025d:0700:00: [rt700_sdw_write] 3501 <= 0003
[ 170.959954] soundwire_intel soundwire_intel.link.1: Slave status change: 0x20
[ 170.960045] soundwire_intel soundwire_intel.link.1: intel_link_power_down: powering down all links
[ 170.960081] soundwire_intel soundwire_intel.link.0: intel_suspend: pm_runtime status: suspended
[ 170.960202] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
[ 170.960235] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
[ 170.960479] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x40010000: GLB_PM_MSG: CTX_SAVE
[ 170.960624] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
[ 170.960656] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
[ 170.960667] sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 1
[ 170.960923] sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000010 at 00000044
[ 170.960944] sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
[ 170.960949] sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D3
[ 170.963055] e1000e: EEE TX LPI TIMER: 00000011
[ 171.212657] IO error in rt700_jack_detect_handler, ret -16

There is a Slave impl defined interrupt when the resume state, however, the system went to suspend state before the interrupt handler finishes.
In the bus driver, the sdw_handle_slave_alerts function should wake up the codec driver and call the interrupt callback function.
Could we add the pm_runtime_mark_last_busy in the front of sdw_handle_slave_alerts function that keeps the codec driver active?

@bardliao
Copy link
Collaborator

bardliao commented Jun 1, 2021

It is weird. We have pm_runtime_mark_last_busy(&slave->dev); in sdw_handle_slave_alerts. rt700_dev_suspend is supposed to triggered after 3 sec.

@bardliao
Copy link
Collaborator

bardliao commented Jun 1, 2021

@XiaoyunWu6666 Can you try below patch?

diff --git a/sound/soc/codecs/rt700-sdw.c b/sound/soc/codecs/rt700-sdw.c
index ff9c081fd52a..5df68de7eac1 100644
--- a/sound/soc/codecs/rt700-sdw.c
+++ b/sound/soc/codecs/rt700-sdw.c
@@ -9,6 +9,7 @@
 #include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/mod_devicetable.h>
+#include <linux/pm_runtime.h>
 #include <linux/soundwire/sdw.h>
 #include <linux/soundwire/sdw_type.h>
 #include <linux/soundwire/sdw_registers.h>
@@ -419,6 +420,7 @@ static int rt700_interrupt_callback(struct sdw_slave *slave,
                "%s control_port_stat=%x", __func__, status->control_port);

        if (status->control_port & 0x4) {
+               pm_runtime_mark_last_busy(&slave->dev);
                mod_delayed_work(system_power_efficient_wq,
                        &rt700->jack_detect_work, msecs_to_jiffies(250));
        }

@bardliao
Copy link
Collaborator

bardliao commented Jun 1, 2021

@XiaoyunWu6666 Sorry, I meant test with #2949 + my patch

@XiaoyunWu6666
Copy link
Author

XiaoyunWu6666 commented Jun 1, 2021

@XiaoyunWu6666 Sorry, I meant test with #2949 + my patch

oh I misunderstood. Let me check it again :)

It failed again @bardliao

@shumingfan
Copy link

@XiaoyunWu6666 Would you please check the node as below after the issue happened?
$ cat /sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power/audosuspend_delay_ms
$ cat /sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power/control
$ cat /sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power/runtime_status

@XiaoyunWu6666
Copy link
Author

@shumingfan

ubuntu@sh-cml-rvp-sdw-04:/sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power$ cat autosuspend_delay_ms
3000
ubuntu@sh-cml-rvp-sdw-04:/sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power$ cat control
auto
ubuntu@sh-cml-rvp-sdw-04:/sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power$ cat runtime_status
suspended
ubuntu@sh-cml-rvp-sdw-04:/sys/bus/soundwire/drivers/rt700/sdw:1:025d:0700:00/power$

@plbossart
Copy link
Member

@bardliao @shumingfan this is a nice bug...

static int sdw_handle_slave_alerts(struct sdw_slave *slave)
{
	sdw_modify_slave_status(slave, SDW_SLAVE_ALERT);

	ret = pm_runtime_get_sync(&slave->dev);

	do {
		slave_notify = false;

		// check interrupts 


		/* Update the Slave driver */
		if (slave_notify && slave->ops &&
		    slave->ops->interrupt_callback) {
			slave_intr.sdca_cascade = sdca_cascade;
			slave_intr.control_port = clear;
			memcpy(slave_intr.port, &port_status,
			       sizeof(slave_intr.port));

			slave->ops->interrupt_callback(slave, &slave_intr);
		}

		// recheck interrupt status

		/* we can get alerts while processing so keep retrying */
	} while (stat != 0 && count < SDW_READ_INTR_CLEAR_RETRY);

	if (count == SDW_READ_INTR_CLEAR_RETRY)
		dev_warn(&slave->dev, "Reached MAX_RETRY on alert read\n");

io_err:
	pm_runtime_mark_last_busy(&slave->dev);
	pm_runtime_put_autosuspend(&slave->dev);

	return ret;

We already state the mark_last_busy at the end of the interrupts handling, so adding one more in the codec driver will do nothing.

I think we need more traces to see what happens

@plbossart
Copy link
Member

plbossart commented Jun 3, 2021

Cannot reproduce this bug on my side, added a test PR #2967 to try to get more traces. I've run 120+ cycles of suspend-resume and counting, no error

@XiaoyunWu6666 please re-run this PR if you don't mind.

@plbossart
Copy link
Member

Changing test devices, I can reproduce this bug. This is really nasty, we suspend immediately after a resume and have an interrupt after the end of the suspend without resuming.

[  316.423170] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_resume: end
[  316.423173] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_suspend: start
[  316.423175] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_suspend: end
[  316.423325] kernel: rt700 sdw:1:025d:0700:00: Slave impl defined interrupt
[  316.423326] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_interrupt_callback: start
[  316.423327] kernel: rt700 sdw:1:025d:0700:00: rt700_interrupt_callback control_port_stat=4
[  316.423329] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_interrupt_callback: end
[  316.423599] kernel: soundwire_intel soundwire_intel.link.1: intel_link_power_down: powering down all links
[  316.423670] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
[  316.423681] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
[  316.423997] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x40010000: GLB_PM_MSG: CTX_SAVE
[  316.424105] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
[  316.424137] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
[  316.424147] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 1
[  316.424403] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000010 at  00000044
[  316.424424] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
[  316.424426] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D3

[  316.678395] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_jack_detect_handler: start
[  316.678547] kernel: IO error in rt700_jack_detect_handler, ret -16

@plbossart
Copy link
Member

It turns out this is a case where we do a system_suspend immediately after a pm_runtime resume, and we get an interrupt AFTER the system suspend.

This is really interesting because we disable interrupts handling at the master level, but because of the parent-child hierarchy the child has to be suspended first, so there's a window where the child is system-suspended, but the parent isn't.

[   73.219827] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_resume: end
[   73.219828] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_pm_runtime_resume: end

[   73.219831] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_system_suspend: start
[   73.219832] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_suspend: start 
[   73.219834] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_suspend: end 
[   73.219835] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_dev_system_suspend: end

[   73.219978] kernel: rt700 sdw:1:025d:0700:00: Slave impl defined interrupt
[   73.219980] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_interrupt_callback: start
[   73.219981] kernel: rt700 sdw:1:025d:0700:00: rt700_interrupt_callback control_port_stat=5
[   73.219982] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_interrupt_callback: end

[   73.220256] kernel: soundwire_intel soundwire_intel.link.1: intel_link_power_down: powering down all links
[   73.220325] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
[   73.220336] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
[   73.220874] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x40010000: GLB_PM_MSG: CTX_SAVE
[   73.220990] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
[   73.221022] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
[   73.221032] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 1
[   73.221289] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000010 at  00000044
[   73.221310] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
[   73.221312] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D3
[   73.228789] kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
[   73.228868] kernel: sd 2:0:0:0: [sda] Stopping disk

[   73.472818] kernel: rt700 sdw:1:025d:0700:00: plb: rt700_jack_detect_handler: start
[   73.472970] kernel: IO error in rt700_jack_detect_handler, ret -16

If we have an interrupt while doing a pm_runtime suspend, there is a pm_runtime_get_sync() that will force a resume, but if we do a system suspend this will have no effect.

I think the only solution is to disable interrupts when we do a system suspend.

Wow, this is a nice one that could happen on all devices that can send an interrupt.

@bardliao @shumingfan comments welcome.

dmesg.txt

@plbossart
Copy link
Member

Updated PR #2967 with tentative fix to prevent the interrupt_callback from dealing with jack detection. Not completely happy about a custom solution when this can happen for other devices as well.

@bardliao
Copy link
Collaborator

bardliao commented Jun 4, 2021

I think the only solution is to disable interrupts when we do a system suspend.

Wow, this is a nice one that could happen on all devices that can send an interrupt.

@bardliao @shumingfan comments welcome.

I don't have a better solution, but if we disable the interrupt we will lose the wake up feature, right? For example, a product may want to wake up the system when the headset button is pressed.

@shumingfan
Copy link

I think the only solution is to disable interrupts when we do a system suspend.

Wow, this is a nice one that could happen on all devices that can send an interrupt.

@bardliao @shumingfan comments welcome.

Is it possible to remove the SET_SYSTEM_SLEEP_PM_OPS?

plbossart added a commit that referenced this issue Jun 14, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: #2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: #2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT5682-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The Fixes tag points to a 5.10 commit, there's no need to propagate
this change to earlier upstream versions.

BugLink: #2943
Fixes: 4a55000 ('ASoC: codecs: rt*.c: remove useless pointer cast')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: #2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: #2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: #2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT5682-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The Fixes tag points to a 5.10 commit, there's no need to propagate
this change to earlier upstream versions.

BugLink: #2943
Fixes: 4a55000 ('ASoC: codecs: rt*.c: remove useless pointer cast')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
plbossart added a commit that referenced this issue Jun 14, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: #2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
@plbossart
Copy link
Member

Fixed by #2978

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 16, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject#2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 16, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject#2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 16, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT5682-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The Fixes tag points to a 5.10 commit, there's no need to propagate
this change to earlier upstream versions.

BugLink: thesofproject#2943
Fixes: 4a55000 ('ASoC: codecs: rt*.c: remove useless pointer cast')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 16, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 22, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject#2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 22, 2021
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject#2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 22, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT5682-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The Fixes tag points to a 5.10 commit, there's no need to propagate
this change to earlier upstream versions.

BugLink: thesofproject#2943
Fixes: 4a55000 ('ASoC: codecs: rt*.c: remove useless pointer cast')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-5-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jun 22, 2021
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
@marc-hb marc-hb added the suspend resume Issues related to suspend resume (e.g. rtcwake) label Jul 19, 2022
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 2, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 60888ef)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I3f31ad47668e0451c29a43e333817364ef5bb0aa
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 2, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 1823637)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I97b4b14b87356353414ff5d05ace277c830164e3
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 2, 2022
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject/linux#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d2bf75f)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I4758f513765c538b0a6339a0f6ad52320f1a2cb0
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 4, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 60888ef)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I3f31ad47668e0451c29a43e333817364ef5bb0aa
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 4, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 1823637)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I97b4b14b87356353414ff5d05ace277c830164e3
sboyanex pushed a commit to dineshXadireddi/Backport-series that referenced this issue Aug 4, 2022
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject/linux#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d2bf75f)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: sboyane <shankarx.boyane@intel.com>
Change-Id: I4758f513765c538b0a6339a0f6ad52320f1a2cb0
sboyanex pushed a commit to aketi-subbu/Backport-series that referenced this issue Sep 12, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 5f2df2a ('ASoC: rt700: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 60888ef)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I3d86aaf6ee4e56abaa2cfac93044d886fedf8437
sboyanex pushed a commit to aketi-subbu/Backport-series that referenced this issue Sep 12, 2022
In previous commits we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

BugLink: thesofproject/linux#2943
Fixes: 501ef01 ('ASoC: rt711: wait for the delayed work to finish when the system suspends')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-4-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 1823637)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I25e6d185cfdc6867659200a9059c72059f1268d4
sboyanex pushed a commit to aketi-subbu/Backport-series that referenced this issue Sep 12, 2022
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject/linux#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d2bf75f)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I8d3dba2e69266e00d7dd6692aa4f33f020b4dab0
sboyanex pushed a commit to BijinaSafiyaa/Backport-series-jairaj that referenced this issue Sep 13, 2022
In the initial driver we cancelled deferred work, but there is still a
window of time where a new interrupt could result in new deferred work
executed after the link is disabled, leading to an IO error. While we
did not see this IO error on RT711-sdca-based platforms, the code pattern
is similar to the RT700 case where the IO error was noted, so the fix
is added for consistency.

This patch uses an 'disable_irq_lock' mutex to prevent new interrupts
from happening after the start of the system suspend. The choice of a
mutex v. a spinlock is mainly due to the time required to clear
interrupts, which requires a command to be transmitted by the
SoundWire host IP and acknowledged with an interrupt. The
'interrupt_callback' routine is also not meant to be called from an
interrupt context.

An additional 'disable_irq' flag prevents race conditions where the
status changes before the interrupts are disabled, but the workqueue
handling status changes is scheduled after the completion of the
system suspend. On resume the interrupts are re-enabled already by the
io_init routine so we only clear the flag.

The code is slightly different from the other codecs since the
interrupt callback deals with the SDCA interrupts, leading to a much
larger section that's protected by the mutex. The SoundWire interrupt
scheme requires a read after clearing a status, it's not clear from
the specifications what would happen if SDCA interrupts are disabled
in the middle of the sequence, so the entire interrupt status
read/write is kept as is, even if in the end we discard the
information.

BugLink: thesofproject/linux#2943
Fixes: 7ad4d23 ('ASoC: rt711-sdca: Add RT711 SDCA vendor-specific driver')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20210614180815.153711-6-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d2bf75f)

BUG=b:238403794
TEST=Test Audio use cases.

Signed-off-by: SHANKAR BOYANE <shankarx.boyane@intel.com>
Change-Id: I99ccdd5e5a0ae43dd012aaf1477f3e1324169194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CML Applies to Comet Lake platform Intel Daily tests This issue can be found in Intel internal daily tests P2 Critical bugs or normal features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants