-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Failed to start DSP on Resume (Failure to boot DSP Firmware after resume) #3609
Comments
@ujfalusi @plbossart fyi - looks like we fail to alloc the code loader DMA buffer ? |
Indeed this looks like a memory allocation error. Also: is this happened before 5.17.3 or it is started with this kernel version? |
I am going to see if I can have some script called that logs free -h before or after standby, to get a good idea of state. If I remember correctly though, I remember seeing like 3 or 4 gb of buff/cache(which should mean available) around that time, but will find out for sure. As far as suspend/resume cycles, I would say maybe around 4 or 5, will find out for sure when I get the memory log. Not 100% sure about kernel versions, I don't believe I saw issue a month ago, but wasn't actively keeping track of the versions. After getting the memory logs, I can try downgrading to 2.16.x possibly. |
@dblaber, thanks. |
@ujfalusi @lgirdwood We've had a number of 'out of memory issues' recently. That's starting to be a pattern I am afraid, even with newer kernels. |
@plbossart, I see. I will build a 5.17. stable on Monday with memory leaking debug (arch/artix kernels do not have this enabled, but they are close to upstream) and see if it finds anything. #3584 looks like tgl, but the other two are cnl (which I don't have around). |
Just reproduced it now.
Sorry, columns don't align and no fixed width fonts/mode. This was 11 sleep/resumes before this happened, uptime of 13 hours and 15 minutes. Although not much free, I imagine over 1 gb should not cause issues (a lot is swapped, buff cache 3.6 gb which I assume is usable). Ironically, I also see a iwlwifi microcode error right after the sof-audio error: [15560.989477] iwlwifi 0000:00:14.3: Microcode SW error detected. Restarting 0x0. Seems like iwlwifi somehow recovered itself eventually though. |
Attached full kernel log |
@dblaber, thanks for the report! #!/bin/bash
test="rtcwake -m mem -s5"
iteration=0
while :
do
$test || break
((iteration++))
echo "Iteration: ${iteration}"
sleep 2
free | grep -B1 Mem
sleep 3
done
echo "Failed at interation ${iteration}"
These run overnight, while I was not around, so I don't have exact matches, but 2987/1365 suspend resume cycle is quite massive. Unfortunately the distro kernel does not have debugging modules, so I went to custom kernel. Still, we are looking at Long story short: I'm not closer to find the reason for these allocation errors :( |
FYI: https://lore.kernel.org/alsa-devel/20220425122814.751-1-peter.ujfalusi@linux.intel.com/ |
@dblaber, can you see if 5.17.4 is fixing your issue? a fallback patch got backported from mainline, I think that will fix it. |
Sure, will give that a shot, won't have results until Wednesday eod est, as
can only seem to produce it when working form home.
…On Mon, Apr 25, 2022, 8:50 AM Péter Ujfalusi ***@***.***> wrote:
@dblaber <https://github.com/dblaber>, can you see if 5.17.4 is fixing
your issue? a fallback patch got backported from mainline, I think that
will fix it.
—
Reply to this email directly, view it on GitHub
<#3609 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADWMJM2F45IIIR2MVTMW63VG2IH7ANCNFSM5UBZTOMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi, dear SOF project team. I have this issue too (after resume from suspend to disk audio driver is stopped). |
@AVAtarMod you need to upgrade to 5.17.4 per the comments above and check if this is fixed. |
@plbossart I'll check later due kernel acpi bug (kernel version is 5.17.4), because of this I have kernel panic |
So far, have not reproduced this yet on 5.17.4, will give another week of testing, due to the nature of intermittent issues. |
@dblaber, thank you for the information! Let's wait a bit more time to make sure it is gone. |
I upgrade kernel to 5.17.4, and issue was gone. Thanks! |
I am not sure if this is a seperate issue, a regression, or if i should be opening a new bug report for this, but I am now on kernel 5.18.3-arch1-1 and am seeing this again: Not 100% sure, I think this was after a suspend to disk resume. |
@dblaber, it is likely due to IMR boot failure when coming out from hibernate (suspend to disk), like this: |
Hello all, please help me out if I am not provided right information (any docs, or additional debugging information I can provide). I am having this issue after multiple Sleeps (Suspend to ram) and resumes, where my audio devices are completely gone. It is intermittent, does not happen every resume, but happens often. I see this in the kernel logs when the issue occurs.
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: error: memory alloc failed: -12
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: error: dma prepare for fw loading failed
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump start ]------------
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: Failed to start DSP
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_IN_PROGRESS (2)
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: extended rom status: 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump end ]------------
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed to boot DSP firmware after resume -12
4/21/22 7:09 AM kernel PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -12
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: PM: failed to resume async: error -12
4/21/22 7:09 AM kernel usb 1-8: reset high-speed USB device number 3 using xhci_hcd
4/21/22 7:09 AM kernel OOM killer enabled.
4/21/22 7:09 AM kernel Restarting tasks ...
4/21/22 7:09 AM kernel pci_bus 0000:05: Allocating resources
4/21/22 7:09 AM kernel pci_bus 0000:2b: Allocating resources
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed to load widget HDA0.OUT
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed setting up DAI widget HDA0.OUT
4/21/22 7:09 AM kernel sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_dai_prepare on Analog CPU DAI: -19
4/21/22 7:09 AM kernel Analog Playback and Capture: ASoC: __soc_pcm_prepare() failed (-19)
4/21/22 7:09 AM kernel HDA Analog: ASoC: dpcm_be_dai_prepare() failed (-19)
4/21/22 7:09 AM kernel HDA Analog: ASoC: dpcm_fe_dai_prepare() failed (-19)
When this occurs, nothing can seem to restore the audio, reloading kmod sof-audio-pci-intel-cnl does not seem to help. I am not sure how to interpret the error (is it related to memory alloca failed, something with firmware etc).
Here are some additional details:
dmb@thinkpad> uname -a ~
Linux thinkpad.da4.org 5.17.3-arch1-1 thesofproject/sof#1 SMP PREEMPT Thu, 14 Apr 2022 01:18:36 +0000 x86_64 GNU/Linux
[1] dmb@thinkpad> lspci|grep Audio ~
00:1f.3 Audio device: Intel Corporation Comet Lake PCH-LP cAVS
Laptop info (from lshw):
description: Notebook
product: 20S4CTO1WW (LENOVO_MT_20S4_BU_Think_FM_ThinkPad P14s Gen 1)
vendor: LENOVO
version: ThinkPad P14s Gen 1
Here is what initialization looks like when freshly starting:
[130] dmb@thinkpad> sudo dmesg|grep sof-audio-pci-intel-cnl ~
[ 15.744016] sof-audio-pci-intel-cnl 0000:00:1f.3: DSP detected with PCI class/subclass/prog-if info 0x040380
[ 15.746741] sof-audio-pci-intel-cnl 0000:00:1f.3: Digital mics found on Skylake+ platform, using SOF driver
[ 15.746753] sof-audio-pci-intel-cnl 0000:00:1f.3: enabling device (0004 -> 0006)
[ 15.748115] sof-audio-pci-intel-cnl 0000:00:1f.3: DSP detected with PCI class/subclass/prog-if 0x040380
[ 17.121006] sof-audio-pci-intel-cnl 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[ 17.240874] sof-audio-pci-intel-cnl 0000:00:1f.3: use msi interrupt mode
[ 17.276673] sof-audio-pci-intel-cnl 0000:00:1f.3: hda codecs found, mask 5
[ 17.276676] sof-audio-pci-intel-cnl 0000:00:1f.3: using HDA machine driver skl_hda_dsp_generic now
[ 17.276680] sof-audio-pci-intel-cnl 0000:00:1f.3: DMICs detected in NHLT tables: 2
[ 17.279078] sof-audio-pci-intel-cnl 0000:00:1f.3: Firmware info: version 2:1:1-3964a
[ 17.279080] sof-audio-pci-intel-cnl 0000:00:1f.3: Firmware: ABI 3:21:0 Kernel ABI 3:18:0
[ 17.279082] sof-audio-pci-intel-cnl 0000:00:1f.3: warn: FW ABI is more recent than kernel
[ 17.279088] sof-audio-pci-intel-cnl 0000:00:1f.3: unknown sof_ext_man header type 3 size 0x30
[ 17.412434] sof-audio-pci-intel-cnl 0000:00:1f.3: Firmware info: version 2:1:1-3964a
[ 17.412442] sof-audio-pci-intel-cnl 0000:00:1f.3: Firmware: ABI 3:21:0 Kernel ABI 3:18:0
[ 17.412446] sof-audio-pci-intel-cnl 0000:00:1f.3: warn: FW ABI is more recent than kernel
[ 17.452639] sof-audio-pci-intel-cnl 0000:00:1f.3: Topology: ABI 3:21:0 Kernel ABI 3:18:0
[ 17.452646] sof-audio-pci-intel-cnl 0000:00:1f.3: warn: topology ABI is more recent than kernel
thesofproject/sof#5168 seems to be a very similar issue, however it is marked closed with newer kernerl version, however I am using 5.17.x, whereas that bug is referring to 5.16.
Let me know if there is anything else I can provide.
The text was updated successfully, but these errors were encountered: