Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oops, 5.17.0-rc8, sof-audio-pci-intel-cnl 0000:00:1f.3: error: memory alloc failed: -12 #3530

Closed
cmurf opened this issue Mar 21, 2022 · 8 comments

Comments

@cmurf
Copy link

cmurf commented Mar 21, 2022

This seems to be a one off, I haven't seen it before. I only noticed it because a VM that should be running got clobbered somehow.

[ 6173.499684] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6173.499687] #PF: supervisor read access in kernel mode
[ 6173.499690] #PF: error_code(0x0000) - not-present page
[ 6173.499692] PGD 0 P4D 0 
[ 6173.499695] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6173.499699] CPU: 0 PID: 1702 Comm: pipewire Not tainted 5.17.0-0.rc8.123.fc36.x86_64+debug #1
[ 6173.499703] Hardware name: LENOVO 20QDS3E200/20QDS3E200, BIOS N2HET66W (1.49 ) 11/10/2021
[ 6173.499705] RIP: 0010:dma_free_noncontiguous+0x20/0xe0
[ 6173.499712] Code: 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 41 56 41 55 41 89 cd 41 54 49 89 f4 55 48 89 fd 53 4c 8b b7 50 04 00 00 48 89 d3 <48> 8b 33 8b 52 0c 4d 85 f6 74 28 e8 00 4b 00 00 49 8b 46 28 48 85
[ 6173.499715] RSP: 0018:ffffa0b4033ff848 EFLAGS: 00010246
[ 6173.499718] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 6173.499720] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8934075190d0
[ 6173.499722] RBP: ffff8934075190d0 R08: 0000000000000001 R09: 0000000000000001
[ 6173.499724] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 6173.499726] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 6173.499728] FS:  00007f6b92f9a740(0000) GS:ffff89376ca00000(0000) knlGS:0000000000000000
[ 6173.499730] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6173.499733] CR2: 0000000000000000 CR3: 00000001b4148003 CR4: 00000000003726f0
[ 6173.499735] Call Trace:
[ 6173.499737]  <TASK>
[ 6173.499742]  cl_stream_prepare.constprop.0.cold+0x2f/0x78 [snd_sof_intel_hda_common]
[ 6173.499754]  hda_dsp_cl_boot_firmware+0x84/0x780 [snd_sof_intel_hda_common]
[ 6173.499765]  snd_sof_run_firmware+0xb7/0x290 [snd_sof]
[ 6173.499781]  sof_resume.isra.0+0xfa/0x1e0 [snd_sof]
[ 6173.499793]  ? __pci_pme_active+0x62/0x70
[ 6173.499799]  ? pci_pme_active+0x1a/0x190
[ 6173.499804]  pci_pm_runtime_resume+0xa7/0xc0
[ 6173.499808]  ? pci_pm_freeze_noirq+0xe0/0xe0
[ 6173.499811]  __rpm_callback+0x45/0x1b0
[ 6173.499816]  ? pci_pm_freeze_noirq+0xe0/0xe0
[ 6173.499820]  rpm_callback+0x5a/0x70
[ 6173.499825]  ? pci_pm_freeze_noirq+0xe0/0xe0
[ 6173.499828]  rpm_resume+0x5e2/0x810
[ 6173.499834]  __pm_runtime_resume+0x47/0x80
[ 6173.499838]  snd_soc_pcm_component_pm_runtime_get+0x30/0xb0 [snd_soc_core]
[ 6173.499866]  __soc_pcm_open+0x66/0x520 [snd_soc_core]
[ 6173.499888]  ? lock_release+0x151/0x460
[ 6173.499893]  dpcm_be_dai_startup+0x11c/0x230 [snd_soc_core]
[ 6173.499917]  dpcm_fe_dai_open+0xe7/0x880 [snd_soc_core]
[ 6173.499946]  snd_pcm_open_substream+0x523/0x880 [snd_pcm]
[ 6173.499960]  snd_pcm_open.part.0+0xaf/0x1f0 [snd_pcm]
[ 6173.499972]  ? wake_up_q+0x90/0x90
[ 6173.499978]  snd_pcm_capture_open+0x54/0x80 [snd_pcm]
[ 6173.499989]  chrdev_open+0xb1/0x210
[ 6173.499994]  ? cdev_device_add+0x80/0x80
[ 6173.499997]  do_dentry_open+0x1dc/0x380
[ 6173.500002]  path_openat+0x7b4/0xcd0
[ 6173.500007]  do_filp_open+0xa1/0x130
[ 6173.500013]  ? lock_release+0x151/0x460
[ 6173.500017]  ? _raw_spin_unlock+0x29/0x40
[ 6173.500022]  ? alloc_fd+0x14b/0x1f0
[ 6173.500027]  do_sys_openat2+0x76/0x130
[ 6173.500032]  __x64_sys_openat+0x5c/0x70
[ 6173.500035]  do_syscall_64+0x37/0x80
[ 6173.500041]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6173.500045] RIP: 0033:0x7f6b930c2c43
[ 6173.500076] Code: 1f 84 00 00 00 00 00 44 89 54 24 0c e8 36 79 f8 ff 44 8b 54 24 0c 89 da 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 79 79 f8 ff 8b 44
[ 6173.500079] RSP: 002b:00007ffd76746380 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[ 6173.500082] RAX: ffffffffffffffda RBX: 0000000000080802 RCX: 00007f6b930c2c43
[ 6173.500085] RDX: 0000000000080802 RSI: 00007ffd76746420 RDI: 00000000ffffff9c
[ 6173.500087] RBP: 00007ffd76746420 R08: 0000000000000000 R09: 0000000000000069
[ 6173.500089] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000004
[ 6173.500091] R13: 000055a8d994c348 R14: 00007ffd76746420 R15: 0000000000000000
[ 6173.500099]  </TASK>
[ 6173.500100] Modules linked in: binfmt_misc tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp bridge stp llc uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda intel_tcc_cooling snd_sof_pci snd_sof_xtensa_dsp x86_pkg_temp_thermal snd_sof soundwire_bus intel_powerclamp snd_soc_skl coretemp snd_soc_hdac_hda mei_pxp mei_hdcp snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp
[ 6173.500159]  iTCO_wdt snd_soc_acpi_intel_match intel_pmc_bxt iTCO_vendor_support snd_soc_acpi kvm_intel mei_wdt snd_soc_core kvm snd_compress iwlmvm intel_rapl_msr irqbypass ac97_bus snd_pcm_dmaengine rapl snd_hda_intel mac80211 intel_cstate snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec libarc4 intel_uncore snd_hda_core snd_hwdep snd_seq think_lmi btusb uvcvideo firmware_attributes_class snd_seq_device btrtl intel_wmi_thunderbolt videobuf2_vmalloc btbcm snd_pcm iwlwifi wmi_bmof videobuf2_memops btintel i2c_i801 videobuf2_v4l2 processor_thermal_device_pci_legacy snd_timer btmtk i2c_smbus processor_thermal_device iwlmei videobuf2_common processor_thermal_rfim thunderbolt videodev bluetooth processor_thermal_mbox cfg80211 joydev processor_thermal_rapl mc mei_me intel_rapl_common idma64 ecdh_generic mei thinkpad_acpi intel_soc_dts_iosf intel_pch_thermal ledtrig_audio platform_profile rfkill snd int3403_thermal soundcore int340x_thermal_zone acpi_pad int3400_thermal acpi_thermal_rel
[ 6173.500225]  acpi_tad vfat fat zram i915 hid_multitouch crct10dif_pclmul crc32_pclmul nvme crc32c_intel ucsi_acpi ghash_clmulni_intel typec_ucsi serio_raw e1000e nvme_core ttm typec wmi i2c_hid_acpi i2c_hid video pinctrl_cannonlake ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse
[ 6173.500253] CR2: 0000000000000000
[ 6173.500256] ---[ end trace 0000000000000000 ]---
[ 6173.500258] RIP: 0010:dma_free_noncontiguous+0x20/0xe0
[ 6173.500263] Code: 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 41 56 41 55 41 89 cd 41 54 49 89 f4 55 48 89 fd 53 4c 8b b7 50 04 00 00 48 89 d3 <48> 8b 33 8b 52 0c 4d 85 f6 74 28 e8 00 4b 00 00 49 8b 46 28 48 85
[ 6173.500265] RSP: 0018:ffffa0b4033ff848 EFLAGS: 00010246
[ 6173.500268] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 6173.500270] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8934075190d0
[ 6173.500272] RBP: ffff8934075190d0 R08: 0000000000000001 R09: 0000000000000001
[ 6173.500274] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 6173.500276] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 6173.500278] FS:  00007f6b92f9a740(0000) GS:ffff89376ca00000(0000) knlGS:0000000000000000
[ 6173.500281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6173.500284] CR2: 0000000000000000 CR3: 00000001b4148003 CR4: 00000000003726f0

full journal
journal.log

@andersk
Copy link

andersk commented Apr 2, 2022

I have been seeing this too on my ThinkPad X1 Yoga (Gen 4).

The NULL dereference is fixed by torvalds/linux@b7fb0ae, but even with that fix, sound still breaks and never recovers.

sof-audio-pci-intel-cnl 0000:00:1f.3: error: memory alloc failed: -12
sof-audio-pci-intel-cnl 0000:00:1f.3: error: dma prepare for fw loading failed
sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump start ]------------
sof-audio-pci-intel-cnl 0000:00:1f.3: Failed to start DSP
sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_IN_PROGRESS (2)
sof-audio-pci-intel-cnl 0000:00:1f.3: extended rom status:  0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff
sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump end ]------------
sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed to boot DSP firmware after resume -12
sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_pm_runtime_get on 0000:00:1f.3: -12
 Analog Playback and Capture: __soc_pcm_open() failed (-12)
 HDA Analog: ASoC: dpcm_be_dai_startup() failed at Analog Playback and Capture (-12)
 HDA Analog: dpcm_fe_dai_startup() failed (-12)
sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_pm_runtime_get on 0000:00:1f.3: -22
 Analog Playback and Capture: __soc_pcm_open() failed (-22)
 HDA Analog: ASoC: dpcm_be_dai_startup() failed at Analog Playback and Capture (-22)
 HDA Analog: dpcm_fe_dai_startup() failed (-22)
sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_pm_runtime_get on 0000:00:1f.3: -22
 Analog Playback and Capture: __soc_pcm_open() failed (-22)
 HDA Analog: ASoC: dpcm_be_dai_startup() failed at Analog Playback and Capture (-22)
 HDA Analog: dpcm_fe_dai_startup() failed (-22)

@ranj063
Copy link
Collaborator

ranj063 commented Apr 4, 2022

@andersk could you please share the full logs? It looks you have a memory allocation error and it will be good to see what lead to this.

@plbossart
Copy link
Member

There is no way we can recover here if there's not enough memory. We should really look into why there's no memory left, that's puzzling given than audio uses very little memory compared to video or graphics. Either we have a memory leak on the audio side, or other systems use way too much.

@andersk
Copy link

andersk commented Apr 4, 2022

Here’s the kernel log from that boot, but there’s nothing in the five minutes before “error: memory alloc failed: -12”. I think it happened (in the host) when a virtual machine I’d booted in virt-manager tried to play a sound.

@plbossart
Copy link
Member

Humm, this is not a random allocation failure but something that would happen when failing to allocate DMA pages.
Could it be a permission issue or a case where the virtualization uses the 'wrong' driver?

	/* allocate DMA buffer */
	ret = snd_dma_alloc_pages(SNDRV_DMA_TYPE_DEV_SG, &pci->dev, size, dmab);
	if (ret < 0) {
		dev_err(sdev->dev, "error: memory alloc failed: %d\n", ret);
		goto out_put;
	}
	/* prepare DMA for code loader stream */
	hext_stream = hda_cl_stream_prepare(sdev, HDA_CL_STREAM_FORMAT,
					    stripped_firmware.size,
					    &dmab, SNDRV_PCM_STREAM_PLAYBACK);
	if (IS_ERR(hext_stream)) {
		dev_err(sdev->dev, "error: dma prepare for fw loading failed\n");
		return PTR_ERR(hext_stream);
	}

@ujfalusi
Copy link
Collaborator

@cmurf , can you check if 5.17.4 is fixing the issue? A fallback implementation of a memory allocator got backported which hopefully going to solve the issue.
Or with latest mainline?

@plbossart
Copy link
Member

@ujfalusi can we close this? It's been a while since we've since this issue and the upstream fallback should address it.

@ujfalusi
Copy link
Collaborator

@plbossart, yes, we can close this. The fixes are in upstream and they are even backported to stable, afaik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants