Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoundWire: rt711-sdca: transfers timed out on component remove/probe #3650

Closed
plbossart opened this issue May 12, 2022 · 19 comments
Closed

SoundWire: rt711-sdca: transfers timed out on component remove/probe #3650

plbossart opened this issue May 12, 2022 · 19 comments

Comments

@plbossart
Copy link
Member

In bind/unbind driver tests, the transfers can fail with a timeout during .set_jack_detect

[   89.930339] rt711-sdca sdw:0:025d:0711:01: rt711_sdca_set_jack_detect: start
[   90.430633] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   90.430648] soundwire sdw-master-0: trf on Slave 1 failed:-5 read addr 803c count 0
[   90.934652] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   90.934671] rt711-sdca sdw:0:025d:0711:01: Failed to get private value: 610003c => 467bb425 ret=-5
[   91.438629] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   91.438643] soundwire sdw-master-0: trf on Slave 1 failed:-5 read addr 8038 count 0
[   91.942625] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   91.942643] rt711-sdca sdw:0:025d:0711:01: Failed to get private value: 6100038 => 467bb425 ret=-5
[   92.446623] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   92.446638] soundwire sdw-master-0: trf on Slave 1 failed:-5 write addr a03d count 0
[   92.950620] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   92.950639] rt711-sdca sdw:0:025d:0711:01: Failed to set private value: 610003d <= ffff ret=-5
[   93.454619] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   93.454634] soundwire sdw-master-0: trf on Slave 1 failed:-5 write addr a03f count 0
[   93.958625] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   93.958646] rt711-sdca sdw:0:025d:0711:01: Failed to set private value: 610003f <= 0f12 ret=-5
[   94.462619] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   94.462635] soundwire sdw-master-0: trf on Slave 1 failed:-5 write addr a035 count 0
[   94.966613] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   94.966632] rt711-sdca sdw:0:025d:0711:01: Failed to set private value: 6100035 <= 0c60 ret=-5
[   95.470618] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   95.470633] soundwire sdw-master-0: trf on Slave 1 failed:-5 read addr 8008 count 0
[   95.974635] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   95.974651] rt711-sdca sdw:0:025d:0711:01: Failed to get private value: 2000008 => 0c60 ret=-5
[   96.478610] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   96.478625] soundwire sdw-master-0: trf on Slave 1 failed:-5 write addr a009 count 0
[   96.982613] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   96.982631] rt711-sdca sdw:0:025d:0711:01: Failed to set private value: 2000009 <= 302b ret=-5
[   97.486609] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   97.486623] soundwire sdw-master-0: trf on Slave 1 failed:-5 write addr a011 count 0
[   97.990595] soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[   97.990614] rt711-sdca sdw:0:025d:0711:01: Failed to set private value: 2000011 <= 047a ret=-5
[   98.494605] soundwire_intel soundwire_intel.link.0: IO transfer timed out, cmd 3 device 1 addr 5c len 1
[   98.494622] soundwire sdw-master-0: trf on Slave 1 failed:-110 write addr 5c count 0
[   98.998600] soundwire_intel soundwire_intel.link.0: IO transfer timed out, cmd 3 device 1 addr 5d len 1
[   98.998619] soundwire sdw-master-0: trf on Slave 1 failed:-110 write addr 5d count 0
[   98.998629] rt711-sdca sdw:0:025d:0711:01: in rt711_sdca_jack_init enable
[   98.998639] rt711-sdca sdw:0:025d:0711:01: rt711_sdca_set_jack_detect: end
@plbossart
Copy link
Member Author

plbossart commented May 12, 2022

@shumingfan @bardliao I am not sure what happens here

first, I don't understand why the component remove plays with the regcache.

static void rt711_sdca_remove(struct snd_soc_component *component)
{
	struct rt711_sdca_priv *rt711 = snd_soc_component_get_drvdata(component);
	
	regcache_cache_only(rt711->regmap, true);
	regcache_cache_only(rt711->mbq_regmap, true);
}

Why is this necessary?

unfortunately, removing this doesn't seem to solve the problem.

I am starting to wonder if there are device timing restrictions here, or something that needs to be turned on before accessing the vendor-specific registers?

@plbossart
Copy link
Member Author

8b527e9 suggests removing those regcache changes, but that doesn't fix the issue.

@plbossart
Copy link
Member Author

@bardliao can I ask you to test #3642 (commit d2e9a6d)

this needs to be build without HDaudio link/codec support in make menuconfig

test.sh.txt

I tested on SKU 0A32 (TGL SDCA)

@plbossart
Copy link
Member Author

Also need thesofproject/sof#5813 with the HDMI commented out @bardliao

@bardliao
Copy link
Collaborator

@plbossart I tested it on SKU 0A5E and no issue found. BTW, thesofproject/sof#5813 seems not work for me. I modified it as below.

--- a/tools/topology/topology1/sof-icl-rt711-rt1308-rt715-hdmi.m4
+++ b/tools/topology/topology1/sof-icl-rt711-rt1308-rt715-hdmi.m4
@@ -32,7 +32,7 @@ ifdef(`MIC_LINK',`',
 `define(MIC_LINK, `3')')

 # uncomment to remove HDMI support
-#define(NOHDMI, `1')
+define(NOHDMI, `1')

 # UAJ ID: 0, 1
 # AMP ID: 2, 3 (if EXT_AMP_REF defined)

I have to remove the HDMI part manually.

I also tried with SKU 0A32, but somehow snd_soc_sof_sdw is already in used after bootup. So I can't run the test on my SKU 0A32 device.

Module                  Size  Used by
snd_soc_sof_sdw        65536  2

@plbossart
Copy link
Member Author

sorry @bardliao, yes the HDMI part has to be removed manually in the file as you did. This is just for tests, I don't want to have a new topology maintained here, just a quick hook to remove HDMI in one shot.

For the tests, I removed pulseaudio, otherwise the card is marked as in use.

@bardliao
Copy link
Collaborator

sorry @bardliao, yes the HDMI part has to be removed manually in the file as you did. This is just for tests, I don't want to have a new topology maintained here, just a quick hook to remove HDMI in one shot.

No, sorry, @plbossart I didn't make it clear. I meant the change didn't work for me. I have to remove the HDMI related code from sof-icl-rt711-rt1308-rt715-hdmi.m4 manually. Like

-ifdef(`NOHDMI', `'
-`
-# PCM5 ---> volume ----> iDisp1
-# PCM6 ---> volume ----> iDisp2
-# PCM7 ---> volume ----> iDisp3
-')
-

and so on.

For the tests, I removed pulseaudio, otherwise the card is marked as in use.

I can run the test and found no issue on SKU 0A32, too.

@plbossart
Copy link
Member Author

@bardliao I forgot to push a local change, now fixed with thesofproject/sof@ef9883d

@bardliao
Copy link
Collaborator

@bardliao I forgot to push a local change, now fixed with thesofproject/sof@ef9883d

It works now :)

@plbossart
Copy link
Member Author

@bardliao can you reproduce the issue with the timeout?

@bardliao
Copy link
Collaborator

@bardliao can you reproduce the issue with the timeout?

No, I didn't see any issue with #3642. I will test without #3642

@bardliao
Copy link
Collaborator

@plbossart If I apply 861d91f and d2e9a6d only, I will see below error.

[   71.371408] rt1316-sdca sdw:1:025d:1316:01: ASoC: Unregistered DAI 'sdw:1:025d:1316:01'
[   72.412144] BUG: unable to handle page fault for address: ffffffffc0a4b7a8
[   72.412154] #PF: supervisor read access in kernel mode
[   72.412158] #PF: error_code(0x0000) - not-present page
[   72.412161] PGD 10c427067 P4D 10c427067 PUD 10c429067 PMD 10ebe1067 PTE 0
[   72.412170] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   72.412175] CPU: 2 PID: 188 Comm: kworker/2:2 Not tainted 5.18.0-rc2+ #732
[   72.412180] Hardware name: Dell Inc. Latitude 9420/, BIOS 99.1.37 07/30/2020
[   72.412183] Workqueue: pm pm_runtime_work
[   72.412194] RIP: 0010:sdw_bus_prep_clk_stop+0x6b/0x160 [soundwire_bus]
[   72.412207] Code: ff 49 39 c5 74 7d 66 83 bb 04 0b 00 00 00 74 e3 8b 83 a0 04 00 00 83 e8 01 83 f8 01 77 d5 48 8b 83 b0 04 00 00 48 85 c0 74 22 <48> 8b 40 28 48 85 c0 74 19 31 d2 31 f6 48 89 df ff d0 0f 1f 00 83
[   72.412213] RSP: 0018:ffffae42c0d5fd10 EFLAGS: 00010282
[   72.412218] RAX: ffffffffc0a4b780 RBX: ffffa35541600000 RCX: 0000000000000000
[   72.412221] RDX: 0000000000000000 RSI: ffffffff9d5f7e28 RDI: ffffa35545fcd830
[   72.412224] RBP: ffffa35545fcd830 R08: 00000010dc196060 R09: 0000000000000000
[   72.412226] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
[   72.412229] R13: ffffa35545fcd848 R14: 0000000000000008 R15: 0000000000000000
[   72.412232] FS:  0000000000000000(0000) GS:ffffa358bec00000(0000) knlGS:0000000000000000
[   72.412235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.412238] CR2: ffffffffc0a4b7a8 CR3: 000000010c424006 CR4: 0000000000770ee0
[   72.412241] PKRU: 55555554
[   72.412244] Call Trace:
[   72.412248]  <TASK>
[   72.412253]  sdw_cdns_clock_stop+0xaf/0x1b0 [soundwire_cadence]
[   72.412264]  intel_suspend_runtime+0x5e/0x120 [soundwire_intel]
[   72.412272]  ? dpm_sysfs_remove+0x60/0x60
[   72.412278]  __rpm_callback+0x41/0x160
[   72.412285]  ? dpm_sysfs_remove+0x60/0x60
[   72.412290]  rpm_callback+0x59/0x70
[   72.412296]  ? dpm_sysfs_remove+0x60/0x60
[   72.412300]  rpm_suspend+0x103/0x610
[   72.412306]  ? lock_acquired+0xe2/0x3b0
[   72.412314]  pm_runtime_work+0xa0/0xb0
[   72.412320]  process_one_work+0x29c/0x530
[   72.412329]  worker_thread+0x4d/0x3c0
[   72.412335]  ? process_one_work+0x530/0x530
[   72.412341]  kthread+0xf0/0x120
[   72.412346]  ? kthread_complete_and_exit+0x20/0x20
[   72.412351]  ret_from_fork+0x1f/0x30
[   72.412360]  </TASK>

@plbossart
Copy link
Member Author

plbossart commented May 18, 2022

@bardliao the error above happens when you remove the rt1316 driver, right? That's what is reported already in #3531 and is fixed by the commit e850e40 ('soundwire: revisit driver bind/unbind and callbacks').

However I have not seen the error

 rt1316-sdca sdw:1:025d:1316:01: ASoC: Unregistered DAI 'sdw:1:025d:1316:01'

what sequence did you use to see it?

EDIT: sorry, this was just a dev_dbg message
sound/soc/soc-core.c: dev_dbg(dai->dev, "ASoC: Unregistered DAI '%s'\n", dai->name);

@plbossart
Copy link
Member Author

@bardliao @shumingfan I can reproduce the error, even with #3657 applied on top.

The test has to be run in a loop though, it doesn't happen the first time.

while true; do bash ./test.sh; done

@plbossart
Copy link
Member Author

@bardliao please retry with the rebased code in #3645 commit 676ec0f

@bardliao
Copy link
Collaborator

@plbossart I run the test.sh in loop with #3642. I got no error in more than one hour.
dmesg_mod_reload.txt
And I will get BUG: kernel NULL pointer dereference, address: 00000000000003f0 when I tried sudo modprobe -r snd-soc-sof-sdw if I apply #3645 only.
dmesg_remove_machine.txt

@shumingfan
Copy link

@plbossart I also run the test.sh on Dell SKU 0A5D (non-SDCA) with #3642. The 'timed out' messages doesn't happen.
The test run over two hours.

@plbossart
Copy link
Member Author

The test fails only on the SDCA skew @bardliao @shumingfan, it's related to the rt711-sdca codec driver.

@plbossart
Copy link
Member Author

Cannot reproduce this with the changes in #3642

plbossart added a commit to plbossart/sound that referenced this issue May 20, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: thesofproject#3651
BugLink: thesofproject#3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue May 23, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: thesofproject#3651
BugLink: thesofproject#3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit to plbossart/sound that referenced this issue Jun 2, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: thesofproject#3651
BugLink: thesofproject#3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 15, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: #3651
BugLink: #3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
plbossart added a commit that referenced this issue Jun 16, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: #3651
BugLink: #3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
plbossart added a commit that referenced this issue Jun 17, 2022
Make sure that the bus and codecs are pm_runtime active when the card
is registered/created. This avoid timeouts when accessing registers.

BugLink: #3651
BugLink: #3650
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants