Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MTL: sound distortion for SDW devices on SOF topology2 #8846

Closed
mstrozek opened this issue Feb 8, 2024 · 18 comments
Closed

[BUG] MTL: sound distortion for SDW devices on SOF topology2 #8846

mstrozek opened this issue Feb 8, 2024 · 18 comments
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform P1 Blocker bugs or important features SDW SoundWire
Milestone

Comments

@mstrozek
Copy link

mstrozek commented Feb 8, 2024

Describe the bug
On MTL laptop running Fedora 40 (most recent rawhide as of 06.02.2024) ) with the following topology:

  • Link0: CS42L43 Jack and mics
  • Link2: 2x CS35L56 Speaker (amps 3 and 4, right)
  • Link3: 2x CS35L56 Speaker (amps 1 and 2, left)

occasionally the audio becomes strongly distorted (see attached files audio_example.tar.gz).
There is a chance this distortion will occur after waking the laptop from suspend, though it was observed to sometimes happen after restarting gnome sound settings GUI. It is possible that the sound can return to normal after some time or another wake/suspend cycle, but no clear pattern was observed.
The same distortion effect can be heard through both CS42L43 (jack) and CS35L56 (speakers). Also routing audio to bypass CS35L56's firmware has been tried, resulting in no change to distortion. This seems to suggest that the corruption happens before the sound is processed by CS42L43/CS35L56

To Reproduce
Try any audio output (tested mainly with gnome sound settings GUI) after waking the laptop from suspend.

Reproduction Rate
Should encounter the distortion after 5-10 attempts.

Expected behavior
No distortion should be present

Impact
Major

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
  3. Name of the platform(s) on which the bug is observed.
    • Platform: Meteor lake laptop
  4. ALSA UCM config
  5. Cirrus FW

Screenshots or console output
sdw_reg_dump.tar.gz

@mstrozek mstrozek added bug Something isn't working as expected SDW SoundWire MTL Applies to Meteor Lake platform labels Feb 8, 2024
@plbossart
Copy link
Member

We have to split variables here @mstrozek there are just too many sources of issues.

a) please first use the SOF development kernel, to rule out any backport issues.

b) avoid system suspend for now and please disable pm_runtime for the SOF PCI device and SoundWire links. This can be done with these options:

options snd-sof-pci sof_pci_debug=0x01
options soundwire_intel sdw_md_flags=0x01010101

@plbossart
Copy link
Member

plbossart commented Feb 8, 2024

@mstrozek could you also test the low-level parts with the options above disabled to see if we have any obvious kernel/firmware issues.

First copy this file
sof-dyndbg.conf.txt as /etc/modprobe.d/sof-dyndbg.conf.

git clone https://github.com/thesofproject/sof-test.git
dnf install python3-construct
dnf install python3-graphviz
dnf install python3-numpy python3-scipy
dnf install octave octave-signal
dnf install expect
dnf install octave octave-io

TPLG=/lib/firmware/intel/sof-ace-tplg/sof-mtl-cs42l43-l0-cs35l56-l23.tplg   ./sof-test/test-case/run-all-tests.sh -l1

If all the tests pass, then the issue is in an area not tested by the Intel CI, such as the PipeWire integration.

@plbossart
Copy link
Member

Another check would be to see what happens on the dmesg log when you open the Sound settings UI.

What I see on my side is that there are no sound events after the application starts. What this means is that userspace opens an audio stream and mixes all other sounds into that stream. So if we start getting bad sound, it's either

a) the application is confused in where the ALSA read/write pointers are, which could be related to some delay issues we're investigated.
b) somehow there's an xrun. That would be seen in dmesg logs by this sort of trace:

[ 1399.157954] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: trigger stream 2 dir 0 cmd 0

c) that could also be an application problem. We'd need to extract PipeWire logs to see if anything goes boink at that level.

If you see any changes on the dmesg log, it might indicate that the app is lost or there was an xrun somehow.

@mstrozek
Copy link
Author

mstrozek commented Feb 9, 2024

Hi @plbossart , thank you for your instructions!
Now I'm using the kernel from branch topic/sof-dev (with a patch for broken ACPI entries on some MTL laptops: https://lore.kernel.org/lkml/20240209111840.1543630-1-rf@opensource.cirrus.com/T/#u, corresponding to the patches from the archive I attached earlier), with those two options put into /etc/modprobe.d/alsa.conf. The distortion issue happens the same in this configuration. Also attaching a dmesg log dmesg_sof_kernel.log (with log_level=7) where the gnome sound settings UI was opened somewhere between 20th and 100th second and then repeated suspend/wake until I noticed the audio distortion - did not see any errors related to sof_pcm_trigger (unless I need to enable some debug options/config first?). Will try to look at the PipeWire debug information next.

Also tried the sof-tests with the /etc/modprobe.d/alsa.conf options disabled - please see attached output of those tests here sof-tests-output-normal.log. Looks like the sound settings UI was open during these tests and was causing some "Device or resource busy" errors , so re-run the tests again with UI closed (output here sof-tests-output-normal-no-soundUI.log) and that has broken something resulting in no soundcard being available anymore:
mstrozek@fedora:~$ aplay -l
aplay: device_list:279: no soundcards found...

and after reboot started showing HDMI audio, even though no HDMI was connected?
mstrozek@fedora:~$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: sofsoundwire [sof-soundwire], device 0: Jack Out () []
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: sofsoundwire [sof-soundwire], device 2: Speaker (
) []
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: sofsoundwire [sof-soundwire], device 5: HDMI1 () []
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: sofsoundwire [sof-soundwire], device 6: HDMI2 (
) []
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: sofsoundwire [sof-soundwire], device 7: HDMI3 () []
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: sofsoundwire [sof-soundwire], device 31: Deepbuffer Jack Out (
) []
Subdevices: 1/1
Subdevice #0: subdevice #0

Not sure what the tests are doing that the soundcard dissapears? Including a dmesg captured during the tests without sound UI here:
dmesg-during-tests.log

Also, should I also run these tests after I notice the audio distortion?

@mstrozek
Copy link
Author

mstrozek commented Feb 9, 2024

Hi @plbossart, an update: following a suggestion from @charleskeepax I tried increasing pipewire's buffer size with:
pw-metadata -n settings 0 clock.force-quantum 2048
and that has helped avoid the audio distortion (can't say it will definitely not happen with this setting, but it has helped and did not notice any distortion during my testing which was long enough to encounter the issue multiple times without the buffer size change). So looks like the issue is caused by an overrun, and the system not recovering from said overrun correctly?
Do you have any suggestions for further debug/next steps?

@plbossart
Copy link
Member

@mstrozek the sof-test uses the hw: level so you want all applications, included pipewire, to be idle. The tests do add/remove modules so it's possible that the card was removed. Just reboot and re-run the tests once you see that the PCI device is in pm_suspend mode.

@mstrozek
Copy link
Author

mstrozek commented Feb 9, 2024

@plbossart Not sure if I checked it correctly: tried cat /sys/devices/pci0000\:00/0000\:00\:1f.3/power/runtime_status until it printed "suspended", then re-run the tests. Output is here:
sof-tests-output-normal-suspended.log

@plbossart
Copy link
Member

The tests look ok, you just have errors with the module load/unload because the cirrus codecs are not included in the list

rmmod: ERROR: Module regmap_sdw is in use by: cs42l43_sdw

I had a PR to try and fix this in thesofproject/sof-test#1110, if you could apply those patches and retest the problems should go away

At any rates the problem does seem to come from PipeWire's use of the ALSA api in a way that's not tested by the Intel CI. @ujfalusi and I reproduced a similar issue on a different platform, looking into this.

@plbossart
Copy link
Member

@mstrozek I am able to reproduce a sound distortion on TGL and MTL devices without Cirrus Logic codecs.
The recipe is simple: open the 'Test Speakers' UI and do monkey-testing with left and right speakers, usually I get a distortion in less than 20s.

This seems to happen after an xrun, since at the driver level we see the stream being closed and reconfigured. This happens with the speaker output (pcm2)

root@fedora:/proc/asound/card0/pcm2p/sub0# more hw_params 
access: MMAP_INTERLEAVED
format: S32_LE
subformat: STD
channels: 2
rate: 48000 (48000/1)
period_size: 1024
buffer_size: 8192

That's not really a tight latency , 21ms periods are classic and should just work.

@plbossart
Copy link
Member

plbossart commented Feb 9, 2024

Definitively an issue with xruns and recovery, every time the sound goes boink I see this sort of PipeWire logs

journalctl -xf | grep pipewire
Feb 09 15:43:00 fedora pipewire[2004]: spa.alsa: hw:sofsoundwire,2p: (7 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 09 15:43:05 fedora pipewire[2004]: spa.alsa: hw:sofsoundwire,2p: (9 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 09 15:43:08 fedora pipewire[2004]: spa.alsa: hw:sofsoundwire,2p: (3 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 09 15:43:16 fedora pipewire[2004]: spa.alsa: hw:sofsoundwire,2p: (9 suppressed) snd_pcm_avail after recover: Broken pipe

Edit: in the same monkey testing with TGL/IPC3, I never see this sort of logs, so it's a double-whammy
a) an xrun situation is detected
b) the recovery does not reset pointers correctly

@softwarecki softwarecki self-assigned this Feb 13, 2024
@plbossart
Copy link
Member

thesofproject/linux#4816 seems to improve the xrun recovery in my tests with the Gnome Sound Settings/Test speaker cases.

The question on why we have an xrun remains open.

@mstrozek
Copy link
Author

@plbossart I can confirm it improves the recovery. I can still see snd_pcm_avail after recover: Broken pipe but now the sound doesn't stay corrupted afterwards.

@plbossart
Copy link
Member

Thanks for testing @mstrozek. we're still working on this xrun recovery, the initial PR breaks other test cases so it will need improvements/refinements.

@lgirdwood lgirdwood added this to the v2.9 milestone Feb 19, 2024
@abonislawski abonislawski added the P1 Blocker bugs or important features label Feb 20, 2024
@ranj063
Copy link
Collaborator

ranj063 commented Feb 21, 2024

Thanks for testing @mstrozek. we're still working on this xrun recovery, the initial PR breaks other test cases so it will need improvements/refinements.

@mstrozek could you please try Linux PR thesofproject/linux#4829 to test the xrun recovery. We've fixed the breaking test cases.

@mstrozek
Copy link
Author

@ranj063 merged the branch ranj063:pr4816 and looks like the issue is fixed! Did not hear any distortion during playback, but also can't see any snd_pcm_avail after recover: Broken pipe messages and did not experience any lag in playback, so looks like the xruns do not happen anymore.

@ranj063
Copy link
Collaborator

ranj063 commented Feb 21, 2024

@ranj063 merged the branch ranj063:pr4816 and looks like the issue is fixed! Did not hear any distortion during playback, but also can't see any snd_pcm_avail after recover: Broken pipe messages and did not experience any lag in playback, so looks like the xruns do not happen anymore.

@mstrozek thanks for testing!

@abonislawski
Copy link
Member

@mstrozek @ranj063 do you think we can close this issue now?

@mstrozek
Copy link
Author

mstrozek commented Mar 5, 2024

@abonislawski ok for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform P1 Blocker bugs or important features SDW SoundWire
Projects
None yet
Development

No branches or pull requests