Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [rt700] SDW Alert happens before codec is enumerated #2344

Closed
bardliao opened this issue Aug 5, 2020 · 6 comments
Closed

[BUG] [rt700] SDW Alert happens before codec is enumerated #2344

bardliao opened this issue Aug 5, 2020 · 6 comments
Assignees
Labels
bug Something isn't working CML Applies to Comet Lake platform P2 Critical bugs or normal features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@bardliao
Copy link
Collaborator

bardliao commented Aug 5, 2020

Describe the bug
We can see sdw IO transfer timed out when we do suspend test. And the reason of sdw IO transfer timed out is that an alert is rised before the codec is enumerated.

To Reproduce

run sudo rtcwake -m mem -s 5.

Reproduce rate
more than 50%

Expected result
No issue on the suspend test.

Actual result
See IO transfer timed out errors in dmesg.

jf-cml-rvp-sdw-2 kernel: [ 7295.502075] rt700 sdw:1:25d:700:0: sdw_modify_slave_status: initializing completion for Slave 1
...
jf-cml-rvp-sdw-2 kernel: [ 7296.545696] intel-sdw intel-sdw.1: IO transfer timed out, cmd 2 device 1 addr 40 len 1
jf-cml-rvp-sdw-2 kernel: [ 7296.545702] soundwire sdw-master-0: trf on Slave 1 failed:-110
jf-cml-rvp-sdw-2 kernel: [ 7296.545706] soundwire sdw-master-0: SDW_SCP_INT1 read failed:-110
jf-cml-rvp-sdw-2 kernel: [ 7296.545708] soundwire sdw-master-0: Slave 1 alert handling failed: -110
jf-cml-rvp-sdw-2 kernel: [ 7296.545730] intel-sdw intel-sdw.1: Slave status change
jf-cml-rvp-sdw-2 kernel: [ 7296.545764] soundwire sdw-master-0: Slave attached, programming device number
...
jf-cml-rvp-sdw-2 kernel: [ 7296.546367] rt700 sdw:1:25d:700:0: sdw_modify_slave_status: signaling completion for Slave 1

dmesg_cml_rvp.txt

@bardliao bardliao added the P2 Critical bugs or normal features label Aug 5, 2020
@bardliao bardliao self-assigned this Aug 5, 2020
@bardliao
Copy link
Collaborator Author

bardliao commented Aug 5, 2020

@plbossart Should we ignore the alert before codec is enumerated?
Does below change make sense?

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index d0aecf995c4f..219592cfee6f 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -1711,11 +1711,20 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
                        break;

                case SDW_SLAVE_ALERT:
+                       if (!completion_done(&slave->slave->initialization_complete))
+                               break;
                        ret = sdw_handle_slave_alerts(slave);

@plbossart
Copy link
Member

plbossart commented Aug 5, 2020

@bardliao it's not possible to have an alert if the DeviceNumber is zero and the interrupt masks are not initialized. I wonder if this is a codec problem where on hard reset the interrupt masks are not reset.

Edit: the master driver should first go through the sdw_clear_slave_status() to mark the status as UNATTACHED on resume. Then we would first deal with enumeration. I think we have a race condition here where the codec resumes before the link is reset and its status marked as UNACTTACHED.

@mengdonglin mengdonglin added bug Something isn't working CML Applies to Comet Lake platform labels Aug 6, 2020
@bardliao
Copy link
Collaborator Author

@plbossart I think I know the reason of IO transfer timed out now. We got an interrupt immediately after rt700 is enumerated. However, rt700 go into suspend right after that. See log below

[  820.942039] soundwire sdw-master-0: bard: sdw_clear_slave_status 1
[  820.942043] rt700 sdw:1:25d:700:0: sdw_modify_slave_status: initializing completion for Slave 1
[  820.942274] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume
[  820.961605] intel-sdw intel-sdw.1: Slave status change
[  820.961622] soundwire sdw-master-0: Slave attached, programming device number
[  820.961924] soundwire sdw-master-0: SDW Slave Addr: 10025d070000
[  820.961929] soundwire sdw-master-0: SDW Slave class_id 0, part_id 700, mfg_id 25d, unique_id 0, version 1
[  820.961931] soundwire sdw-master-0: Slave already registered, reusing dev_num:1
[  820.962166] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962167] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962168] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962170] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962171] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962172] intel-sdw intel-sdw.1: Msg Ack not received
[  820.962174] intel-sdw intel-sdw.1: Msg ignored for Slave 0
[  820.962176] soundwire sdw-master-0: No more devices to enumerate
[  820.962204] intel-sdw intel-sdw.1: Slave status change
[  820.962212] rt700 sdw:1:25d:700:0: sdw_modify_slave_status: signaling completion for Slave 1
[  820.962450] intel-sdw intel-sdw.1: Slave status change
[  820.962458] soundwire sdw-master-0: bard: Slave 1 status 1
[  820.962485] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync start
[  820.967985] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync end
[  820.967992] rt700 sdw:1:25d:700:0: bard: rt700_dev_suspend

Now see sdw_handle_slave_alerts(). We call pm_runtime_get_sync() while rt700 is just suspended.
I don't think rt700 is already resumed when pm_runtime_get_sync returned and then sdw_read(slave, SDW_SCP_INT1); is called while rt700 is suspended. So IO transfer timed out happens.

[  820.962450] intel-sdw intel-sdw.1: Slave status change
[  820.962458] soundwire sdw-master-0: bard: Slave 1 status 1
[  820.962485] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync start
[  820.967985] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync end
[  820.967992] rt700 sdw:1:25d:700:0: bard: rt700_dev_suspend
[  820.967998] rt700 sdw:1:25d:700:0: bard: sdw_handle_slave_alerts pm_runtime_get_sync ret 1
...
[  821.933210] intel-sdw intel-sdw.1: bard: intel_resume
[  821.933211] intel-sdw intel-sdw.1: intel_link_power_up: powering up all links
[  821.933212] intel-sdw intel-sdw.1: intel_link_power_up: first link up, programming SYNCPRD
[  821.933466] soundwire sdw-master-0: bard: sdw_clear_slave_status 1
[  821.933468] rt700 sdw:1:25d:700:0: sdw_modify_slave_status: initializing completion for Slave 1
[  821.933811] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume
...
[  822.991662] intel-sdw intel-sdw.1: IO transfer timed out, cmd 2 device 1 addr 40 len 1
[  822.991669] soundwire sdw-master-0: trf on Slave 1 failed:-110
[  822.991673] rt700 sdw:1:25d:700:0: bard: sdw_handle_slave_alerts sdw_read ret -110
[  822.991675] soundwire sdw-master-0: SDW_SCP_INT1 read failed:-110
[  822.991678] soundwire sdw-master-0: Slave 1 alert handling failed: -110

dmesg_cml_rvp_2.txt

log_diff.txt

@plbossart
Copy link
Member

Great work @bardliao

[  820.962485] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync start
[  820.967985] rt700 sdw:1:25d:700:0: bard: rt700_dev_resume cache sync end
[  820.967992] rt700 sdw:1:25d:700:0: bard: rt700_dev_suspend

I think the last suspend is a system suspend.

it looks like we have a race condition with a system suspend happening immediately while we are still dealing with a device interrupt and we have a pending transaction. On the next system resume, there is a timeout and an error, but that doesn't seem to be a real problem. The error happens in a workqueue.

I think we should use cancel_work_sync() on the master level to make sure current transactions are completed, and no alert can be handled while the master suspends.

plbossart added a commit to plbossart/sound that referenced this issue Aug 11, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends, so on
resume the previous transaction times out.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this should not happen. When doing a system suspend, or
when disabling interrupts, we should make sure the current transaction
can complete, and prevent new work from being queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart
Copy link
Member

@bardliao can you try #2354, not sure if this helps?

plbossart added a commit to plbossart/sound that referenced this issue Aug 11, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends, so on
resume the previous transaction times out.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this should not happen. When doing a system suspend, or
when disabling interrupts, we should make sure the current transaction
can complete, and prevent new work from being queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart plbossart added the SDW Applies to SoundWire bus for codec connection label Aug 11, 2020
@bardliao
Copy link
Collaborator Author

@bardliao can you try #2354, not sure if this helps?

Thanks, @plbossart it works. But it should be if (cdns->interrupt_enable) below. We need to schedule the workqueue if interrupt is enabled.

/* if the interrupt disable is in process, don't schedule a workqueue */
		if (!cdns->interrupt_enable)
			schedule_work(&cdns->work);

plbossart added a commit to plbossart/sound that referenced this issue Aug 12, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Aug 12, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
bardliao pushed a commit that referenced this issue Aug 13, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Aug 14, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 17, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit to bardliao/linux that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Aug 18, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 20, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 20, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 27, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Aug 31, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Sep 1, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
plbossart added a commit that referenced this issue Sep 1, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
bardliao pushed a commit that referenced this issue Sep 7, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: #2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
jkysela-rh pushed a commit to jkysela-rh/linux-soundwire that referenced this issue Sep 7, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/alsa-devel/20200817222340.18042-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Jaroslav Kysela <jkysela@redhat.com>
ruscur pushed a commit to ruscur/linux that referenced this issue Sep 10, 2020
…ce alerts

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20200817222340.18042-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
woodsts pushed a commit to woodsts/linux-stable that referenced this issue Oct 29, 2020
…ce alerts

[ Upstream commit d2068da ]

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject/linux#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20200817222340.18042-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w pushed a commit to frank-w/BPI-Router-Linux that referenced this issue Nov 3, 2020
…ce alerts

[ Upstream commit d2068da ]

In system suspend stress cases, the SOF CI reports timeouts. The root
cause is that an alert is generated while the system suspends. The
interrupt handling generates transactions on the bus that will never
be handled because the interrupts are disabled in parallel.

As a result, the transaction never completes and times out on resume.
This error doesn't seem too problematic since it happens in a work
queue, and the system recovers without issues.

Nevertheless, this race condition should not happen. When doing a
system suspend, or when disabling interrupts, we should make sure the
current transaction can complete, and prevent new work from being
queued.

BugLink: thesofproject/linux#2344
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20200817222340.18042-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
@marc-hb marc-hb added the suspend resume Issues related to suspend resume (e.g. rtcwake) label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CML Applies to Comet Lake platform P2 Critical bugs or normal features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

No branches or pull requests

4 participants