[lineage-18.1] 4.4.261 #23

derfelot · 2021-03-14T02:49:19Z

No description provided.

This patch and problem analysis is specific for 4.4 LTS, due to incomplete backporting of other fixes. Later LTS series have different backports. Since v4.4.257 when CONFIG_PROVE_LOCKING=y the following triggers right after reboot of our pre-life systems which equal our production setup: Mar 03 11:27:33 icpu-test-bap10 kernel: ================================= Mar 03 11:27:33 icpu-test-bap10 kernel: [ INFO: inconsistent lock state ] Mar 03 11:27:33 icpu-test-bap10 kernel: 4.4.259-rc1-grsec+ #730 Not tainted Mar 03 11:27:33 icpu-test-bap10 kernel: --------------------------------- Mar 03 11:27:33 icpu-test-bap10 kernel: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. Mar 03 11:27:33 icpu-test-bap10 kernel: apache2-ssl/9310 [HC0[0]:SC0[0]:HE1:SE1] takes: Mar 03 11:27:33 icpu-test-bap10 kernel: (&p->pi_lock){?.-.-.}, at: [<ffffffff810abb68>] pi_state_update_owner+0x51/0xd7 Mar 03 11:27:33 icpu-test-bap10 kernel: {IN-HARDIRQ-W} state was registered at: Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81088c4a>] __lock_acquire+0x3a7/0xe4a Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81089b01>] lock_acquire+0x18d/0x1bc Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff8170151c>] _raw_spin_lock_irqsave+0x3e/0x50 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810719a5>] try_to_wake_up+0x2c/0x210 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81071bf3>] default_wake_function+0xd/0xf Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81083588>] autoremove_wake_function+0x11/0x35 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810830b2>] __wake_up_common+0x48/0x7c Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff8108311a>] __wake_up+0x34/0x46 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff814c2a23>] megasas_complete_int_cmd+0x31/0x33 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff814c60a0>] megasas_complete_cmd+0x570/0x57b Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff814d05bc>] complete_cmd_fusion+0x23e/0x33d Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff814d0768>] megasas_isr_fusion+0x67/0x74 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81091ae5>] handle_irq_event_percpu+0x134/0x311 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81091cf5>] handle_irq_event+0x33/0x51 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810948b9>] handle_edge_irq+0xa3/0xc2 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81005f7b>] handle_irq+0xf9/0x101 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81005700>] do_IRQ+0x80/0xf5 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81702228>] ret_from_intr+0x0/0x20 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff8100cab0>] arch_cpu_idle+0xa/0xc Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81083a5a>] default_idle_call+0x1e/0x20 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81083b9d>] cpu_startup_entry+0x141/0x22f Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff816fb853>] rest_init+0x135/0x13b Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81d5ce99>] start_kernel+0x3fa/0x40a Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81d5c2af>] x86_64_start_reservations+0x2a/0x2c Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81d5c3d0>] x86_64_start_kernel+0x11f/0x12c Mar 03 11:27:33 icpu-test-bap10 kernel: irq event stamp: 1457 Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last enabled at (1457): [<ffffffff81042a69>] get_user_pages_fast+0xeb/0x14f Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last disabled at (1456): [<ffffffff810429dd>] get_user_pages_fast+0x5f/0x14f Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last enabled at (1446): [<ffffffff815e127d>] release_sock+0x142/0x14d Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last disabled at (1444): [<ffffffff815e116f>] release_sock+0x34/0x14d Mar 03 11:27:33 icpu-test-bap10 kernel: other info that might help us debug this: Mar 03 11:27:33 icpu-test-bap10 kernel: Possible unsafe locking scenario: Mar 03 11:27:33 icpu-test-bap10 kernel: CPU0 Mar 03 11:27:33 icpu-test-bap10 kernel: ---- Mar 03 11:27:33 icpu-test-bap10 kernel: lock(&p->pi_lock); Mar 03 11:27:33 icpu-test-bap10 kernel: <Interrupt> Mar 03 11:27:33 icpu-test-bap10 kernel: lock(&p->pi_lock); Mar 03 11:27:33 icpu-test-bap10 kernel: *** DEADLOCK *** Mar 03 11:27:33 icpu-test-bap10 kernel: 2 locks held by apache2-ssl/9310: Mar 03 11:27:33 icpu-test-bap10 kernel: #0: (&(&(__futex_data.queues)[i].lock)->rlock){+.+...}, at: [<ffffffff810ae4e6>] do Mar 03 11:27:33 icpu-test-bap10 kernel: whatawurst#1: (&lock->wait_lock){+.+...}, at: [<ffffffff810ae53a>] do_futex+0x639/0x809 Mar 03 11:27:33 icpu-test-bap10 kernel: stack backtrace: Mar 03 11:27:33 icpu-test-bap10 kernel: CPU: 13 PID: 9310 UID: 99 Comm: apache2-ssl Not tainted 4.4.259-rc1-grsec+ #730 Mar 03 11:27:33 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019 Mar 03 11:27:33 icpu-test-bap10 kernel: 0000000000000000 ffff883fb79bfc00 ffffffff816f8fc2 ffff883ffa66d300 Mar 03 11:27:33 icpu-test-bap10 kernel: ffffffff8eaa71f0 ffff883fb79bfc50 ffffffff81088484 0000000000000000 Mar 03 11:27:33 icpu-test-bap10 kernel: 0000000000000001 0000000000000001 0000000000000002 ffff883ffa66db58 Mar 03 11:27:33 icpu-test-bap10 kernel: Call Trace: Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff816f8fc2>] dump_stack+0x94/0xca Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81088484>] print_usage_bug+0x1bc/0x1d1 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81087d76>] ? check_usage_forwards+0x98/0x98 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810885a5>] mark_lock+0x10c/0x203 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81088cb9>] __lock_acquire+0x416/0xe4a Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81089b01>] lock_acquire+0x18d/0x1bc Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81089b01>] ? lock_acquire+0x18d/0x1bc Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81700d12>] _raw_spin_lock+0x2a/0x39 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810abb68>] pi_state_update_owner+0x51/0xd7 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810ae5af>] do_futex+0x6ae/0x809 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff810ae83d>] SyS_futex+0x133/0x143 Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff8100158a>] ? syscall_trace_enter_phase2+0x1a2/0x1bb Mar 03 11:27:33 icpu-test-bap10 kernel: [<ffffffff81701848>] tracesys_phase2+0x90/0x95 Bisecting detects 47e452f in the above specific scenario using apache-ssl, but apparently the missing *_irq() was introduced in 34c8e1c. However, just reverting the old _irq() variants to a similar status than before 34c8e1c, or using _irqsave() / _irqrestore() as some other backports are doing in various places, would not really help. The fundamental problem is the following violation of the assertion lockdep_assert_held(&pi_state->pi_mutex.wait_lock) in pi_state_update_owner(): Mar 03 12:50:03 icpu-test-bap10 kernel: ------------[ cut here ]------------ Mar 03 12:50:03 icpu-test-bap10 kernel: WARNING: CPU: 37 PID: 8488 at kernel/futex.c:844 pi_state_update_owner+0x3d/0xd7() Mar 03 12:50:03 icpu-test-bap10 kernel: Modules linked in: xt_time xt_connlimit xt_connmark xt_NFLOG xt_limit xt_hashlimit veth ip_set_bitmap_port xt_DSCP xt_multiport ip_set_hash_ip xt_owner xt_set ip_set_hash_net xt_state xt_conntrack nf_conntrack_ftp mars lz4_decompress lz4_compress ipmi_devintf x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul hed ipmi_si ipmi_msghandler processor crc32c_intel ehci_pci ehci_hcd usbcore i40e usb_common Mar 03 12:50:03 icpu-test-bap10 kernel: CPU: 37 PID: 8488 UID: 99 Comm: apache2-ssl Not tainted 4.4.259-rc1-grsec+ #737 Mar 03 12:50:03 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019 Mar 03 12:50:03 icpu-test-bap10 kernel: 0000000000000000 ffff883f863f7c70 ffffffff816f9002 0000000000000000 Mar 03 12:50:03 icpu-test-bap10 kernel: 0000000000000009 ffff883f863f7ca8 ffffffff8104cda2 ffffffff810abac7 Mar 03 12:50:03 icpu-test-bap10 kernel: ffff883ffbfe5e80 0000000000000000 ffff883f82ed4bc0 00007fc01c9bf000 Mar 03 12:50:03 icpu-test-bap10 kernel: Call Trace: Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff816f9002>] dump_stack+0x94/0xca Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff8104cda2>] warn_slowpath_common+0x94/0xad Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810abac7>] ? pi_state_update_owner+0x3d/0xd7 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff8104ce5f>] warn_slowpath_null+0x15/0x17 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810abac7>] pi_state_update_owner+0x3d/0xd7 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810abea8>] free_pi_state+0x2d/0x73 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810abf0b>] unqueue_me_pi+0x1d/0x31 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810ad735>] futex_lock_pi+0x27a/0x2e8 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff81088bca>] ? __lock_acquire+0x327/0xe4a Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810ae6a9>] do_futex+0x784/0x809 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810cfa9a>] ? seccomp_phase1+0xde/0x1e7 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810a4503>] ? current_kernel_time64+0xb/0x31 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810d23c3>] ? current_kernel_time+0xb/0xf Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff810ae861>] SyS_futex+0x133/0x143 Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff8100158a>] ? syscall_trace_enter_phase2+0x1a2/0x1bb Mar 03 12:50:03 icpu-test-bap10 kernel: [<ffffffff81701888>] tracesys_phase2+0x90/0x95 Mar 03 12:50:03 icpu-test-bap10 kernel: ---[ end trace 968f95a458dea951 ]--- In order to both (1) prevent the self-deadlock, and (2) to satisfy the assertion at pi_state_update_owner(), some locking with irq disable is needed, at least in the specific call stack. Interestingly, there existed a suchalike locking just before f08a4af. This is just a quick hotfix, resurrecting some previous locks at the old places, but now using ->wait_lock in place of the previous ->pi_lock (which was in place before f08a4af). The ->pi_lock is now also taken, by the new code which had been introduced in 34c8e1c. When this patch is applied, both the above splats are no longer triggering at my prelife machines. Without this patch, I cannot ensure stable production at 1&1 Ionos. Hint for further work: I have not yet tested other call paths, since I am under time pressure for security reasons. Hint for further hardening of 4.4.y and probably some more LTS series: Probably some more systematic testing with CONFIG_PROVE_LOCKING (and probably some more options) should be invested in order to make the 4.4 LTS series really "stable" again. Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Lee Jones <lee.jones@linaro.org> Fixes: f08a4af ("futex: Use pi_state_update_owner() in put_pi_state()") Fixes: 34c8e1c ("futex: Provide and use pi_state_update_owner()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This patch and problem analysis is specific for 4.4 LTS, due to incomplete backporting of other fixes. Later LTS series have different backports. The following is obviously incorrect: static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, struct futex_hash_bucket *hb) { [...] raw_spin_lock(&pi_state->pi_mutex.wait_lock); [...] raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); [...] } The 4.4-specific fix should probably go in the direction of b4abf91, making everything irq-safe. Probably, backporting of b4abf91 to 4.4 LTS could thus be another good idea. However, this might involve some more 4.4-specific work and require thorough testing: > git log --oneline v4.4..b4abf91 -- kernel/futex.c kernel/locking/rtmutex.c | wc -l 10 So this patch is just an obvious quickfix for now. Hint: the lock order is documented in 4.9.y and later. A similar documenting is missing in 4.4.y. Please somebody either backport also, or write a new description, if there would be some differences I cannot easily see at the moment. Without reliable docs, inspection of the locking correctness may become a pain. Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Lee Jones <lee.jones@linaro.org> Fixes: 394fc49 ("futex: Rework inconsistent rt_mutex/futex_q state") Fixes: 6510e4a ("futex,rt_mutex: Provide futex specific rt_mutex API") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 26a9630c72ebac7c564db305a6aee54a8edde70e ] Currently the mask operation on variable conf is just 3 bits so the switch statement case value of 8 is unreachable dead code. The function daio_mgr_dao_init can be passed a 4 bit value, function dao_rsc_init calls it with conf set to: conf = (desc->msr & 0x7) | (desc->passthru << 3); so clearly when desc->passthru is set to 1 then conf can be at least 8. Fix this by changing the mask to 0xf. Fixes: 8cc7236 ("ALSA: SB X-Fi driver merge") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210227001527.1077484-1-colin.king@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 77516d25f54912a7baedeeac1b1b828b6f285152 ] The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT to the user if it can't complete the copy. The "st" variable only holds zero on success or negative error codes on failure so the type should be int. Fixes: 36f988e ("rsxx: Adding in debugfs entries.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit a4c8dd9c2d0987cf542a2a0c42684c9c6d78a04e upstream. According to the definition of dm_iterate_devices_fn: * This function must iterate through each section of device used by the * target until it encounters a non-zero return code, which it then returns. * Returns zero if no callout returned non-zero. For some target type (e.g. dm-stripe), one call of iterate_devices() may iterate multiple underlying devices internally, in which case a non-zero return code returned by iterate_devices_callout_fn will stop the iteration in advance. No iterate_devices_callout_fn should return non-zero unless device iteration should stop. Rename dm_table_requires_stable_pages() to dm_table_any_dev_attr() and elevate it for reuse to stop iterating (and return non-zero) on the first device that causes iterate_devices_callout_fn to return non-zero. Use dm_table_any_dev_attr() to properly iterate through devices. Rename device_is_nonrot() to device_is_rotational() and invert logic accordingly to fix improper disposition. [jeffle: backport notes] No stable writes. Also convert the no_sg_merge capability check, which is introduced by commit 200612e ("dm table: propagate QUEUE_FLAG_NO_SG_MERGE"), and removed since commit 2705c93 ("block: kill QUEUE_FLAG_NO_SG_MERGE") in v5.1. Fixes: c3c4555 ("dm table: clear add_random unless all devices have it set") Fixes: 4693c96 ("dm table: propagate non rotational flag") Cc: stable@vger.kernel.org Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 39aa009 ] Add a new force_caps module parameter to allow overriding the drivers builtin capability detection mechanism. This can be used to for example: -Disable rfkill functionality on devices where there is an AA OEM DMI record advertising non functional rfkill switches -Force loading of the driver on devices with a missing AA OEM DMI record Note that force_caps is -1 when unset, this allows forcing the capability field to 0, which results in acer-wmi only providing WMI hotkey handling while disabling all other (led, rfkill, backlight) functionality. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20201019185628.264473-4-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 0599837 ] Add function 1 DMA alias quirk for Marvell 88SS9215 PCIe SSD Controller. Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c135 Link: https://lore.kernel.org/r/20201110220516.697934-1-helgaas@kernel.org Reported-by: John Smith <LK7S2ED64JHGLKj75shg9klejHWG49h5hk@protonmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Jason Self <jason@bluehome.net> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/r/20210310132319.155338551@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Changes in 4.4.261: (8 commits) futex: fix irq self-deadlock and satisfy assertion futex: fix spin_lock() / spin_unlock_irq() imbalance ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits rsxx: Return -EFAULT if copy_to_user() fails dm table: fix iterate_devices based device capability checks platform/x86: acer-wmi: Add new force_caps module parameter PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller Linux 4.4.261

[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... Flamefire#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Flamefire#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] Flamefire#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] Flamefire#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] Flamefire#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] Flamefire#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] Flamefire#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] Flamefire#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] Flamefire#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 Flamefire#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 Flamefire#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 Flamefire#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf Flamefire#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 Flamefire#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 Flamefire#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 Flamefire#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff Flamefire#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f Flamefire#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... whatawurst#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 whatawurst#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] whatawurst#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] whatawurst#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] whatawurst#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] whatawurst#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] whatawurst#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] whatawurst#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] whatawurst#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 whatawurst#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 whatawurst#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 whatawurst#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf whatawurst#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 whatawurst#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 whatawurst#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 whatawurst#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff whatawurst#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f whatawurst#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

schoebel and others added 9 commits March 11, 2021 13:46

derfelot merged commit 099adae into whatawurst:lineage-18.1 Mar 14, 2021

derfelot deleted the lineage-18.1_update branch March 14, 2021 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lineage-18.1] 4.4.261 #23

[lineage-18.1] 4.4.261 #23

derfelot commented Mar 14, 2021

[lineage-18.1] 4.4.261 #23

[lineage-18.1] 4.4.261 #23

Conversation

derfelot commented Mar 14, 2021