Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lineage-18.1] 4.4.261 #23

Merged
merged 9 commits into from
Mar 14, 2021

Conversation

derfelot
Copy link
Member

No description provided.

schoebel and others added 9 commits March 11, 2021 13:46
This patch and problem analysis is specific for 4.4 LTS, due to incomplete
backporting of other fixes. Later LTS series have different backports.

Since v4.4.257 when CONFIG_PROVE_LOCKING=y
the following triggers right after reboot of our pre-life systems
which equal our production setup:

Mar 03 11:27:33 icpu-test-bap10 kernel: =================================
Mar 03 11:27:33 icpu-test-bap10 kernel: [ INFO: inconsistent lock state ]
Mar 03 11:27:33 icpu-test-bap10 kernel: 4.4.259-rc1-grsec+ #730 Not tainted
Mar 03 11:27:33 icpu-test-bap10 kernel: ---------------------------------
Mar 03 11:27:33 icpu-test-bap10 kernel: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
Mar 03 11:27:33 icpu-test-bap10 kernel: apache2-ssl/9310 [HC0[0]:SC0[0]:HE1:SE1] takes:
Mar 03 11:27:33 icpu-test-bap10 kernel:  (&p->pi_lock){?.-.-.}, at: [<ffffffff810abb68>] pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel: {IN-HARDIRQ-W} state was registered at:
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81088c4a>] __lock_acquire+0x3a7/0xe4a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81089b01>] lock_acquire+0x18d/0x1bc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff8170151c>] _raw_spin_lock_irqsave+0x3e/0x50
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff810719a5>] try_to_wake_up+0x2c/0x210
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81071bf3>] default_wake_function+0xd/0xf
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81083588>] autoremove_wake_function+0x11/0x35
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff810830b2>] __wake_up_common+0x48/0x7c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff8108311a>] __wake_up+0x34/0x46
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff814c2a23>] megasas_complete_int_cmd+0x31/0x33
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff814c60a0>] megasas_complete_cmd+0x570/0x57b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff814d05bc>] complete_cmd_fusion+0x23e/0x33d
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff814d0768>] megasas_isr_fusion+0x67/0x74
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81091ae5>] handle_irq_event_percpu+0x134/0x311
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81091cf5>] handle_irq_event+0x33/0x51
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff810948b9>] handle_edge_irq+0xa3/0xc2
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81005f7b>] handle_irq+0xf9/0x101
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81005700>] do_IRQ+0x80/0xf5
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81702228>] ret_from_intr+0x0/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff8100cab0>] arch_cpu_idle+0xa/0xc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81083a5a>] default_idle_call+0x1e/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81083b9d>] cpu_startup_entry+0x141/0x22f
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff816fb853>] rest_init+0x135/0x13b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81d5ce99>] start_kernel+0x3fa/0x40a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81d5c2af>] x86_64_start_reservations+0x2a/0x2c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [<ffffffff81d5c3d0>] x86_64_start_kernel+0x11f/0x12c
Mar 03 11:27:33 icpu-test-bap10 kernel: irq event stamp: 1457
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last  enabled at (1457): [<ffffffff81042a69>] get_user_pages_fast+0xeb/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last disabled at (1456): [<ffffffff810429dd>] get_user_pages_fast+0x5f/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last  enabled at (1446): [<ffffffff815e127d>] release_sock+0x142/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last disabled at (1444): [<ffffffff815e116f>] release_sock+0x34/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel:
                                        other info that might help us debug this:
Mar 03 11:27:33 icpu-test-bap10 kernel:  Possible unsafe locking scenario:
Mar 03 11:27:33 icpu-test-bap10 kernel:        CPU0
Mar 03 11:27:33 icpu-test-bap10 kernel:        ----
Mar 03 11:27:33 icpu-test-bap10 kernel:   lock(&p->pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:   <Interrupt>
Mar 03 11:27:33 icpu-test-bap10 kernel:     lock(&p->pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:
                                         *** DEADLOCK ***
Mar 03 11:27:33 icpu-test-bap10 kernel: 2 locks held by apache2-ssl/9310:
Mar 03 11:27:33 icpu-test-bap10 kernel:  #0:  (&(&(__futex_data.queues)[i].lock)->rlock){+.+...}, at: [<ffffffff810ae4e6>] do
Mar 03 11:27:33 icpu-test-bap10 kernel:  whatawurst#1:  (&lock->wait_lock){+.+...}, at: [<ffffffff810ae53a>] do_futex+0x639/0x809
Mar 03 11:27:33 icpu-test-bap10 kernel:
                                        stack backtrace:
Mar 03 11:27:33 icpu-test-bap10 kernel: CPU: 13 PID: 9310 UID: 99 Comm: apache2-ssl Not tainted 4.4.259-rc1-grsec+ #730
Mar 03 11:27:33 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
Mar 03 11:27:33 icpu-test-bap10 kernel:  0000000000000000 ffff883fb79bfc00 ffffffff816f8fc2 ffff883ffa66d300
Mar 03 11:27:33 icpu-test-bap10 kernel:  ffffffff8eaa71f0 ffff883fb79bfc50 ffffffff81088484 0000000000000000
Mar 03 11:27:33 icpu-test-bap10 kernel:  0000000000000001 0000000000000001 0000000000000002 ffff883ffa66db58
Mar 03 11:27:33 icpu-test-bap10 kernel: Call Trace:
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff816f8fc2>] dump_stack+0x94/0xca
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81088484>] print_usage_bug+0x1bc/0x1d1
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81087d76>] ? check_usage_forwards+0x98/0x98
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810885a5>] mark_lock+0x10c/0x203
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81088cb9>] __lock_acquire+0x416/0xe4a
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81089b01>] lock_acquire+0x18d/0x1bc
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81089b01>] ? lock_acquire+0x18d/0x1bc
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81700d12>] _raw_spin_lock+0x2a/0x39
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810abb68>] ? pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810abb68>] pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810ae5af>] do_futex+0x6ae/0x809
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff810ae83d>] SyS_futex+0x133/0x143
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff8100158a>] ? syscall_trace_enter_phase2+0x1a2/0x1bb
Mar 03 11:27:33 icpu-test-bap10 kernel:  [<ffffffff81701848>] tracesys_phase2+0x90/0x95

Bisecting detects 47e452f
in the above specific scenario using apache-ssl,
but apparently the missing *_irq() was introduced in
34c8e1c.

However, just reverting the old _irq() variants to a similar status
than before 34c8e1c,
or using _irqsave() / _irqrestore() as some other backports are doing
in various places, would not really help.

The fundamental problem is the following violation of the assertion
lockdep_assert_held(&pi_state->pi_mutex.wait_lock) in pi_state_update_owner():

Mar 03 12:50:03 icpu-test-bap10 kernel: ------------[ cut here ]------------
Mar 03 12:50:03 icpu-test-bap10 kernel: WARNING: CPU: 37 PID: 8488 at kernel/futex.c:844 pi_state_update_owner+0x3d/0xd7()
Mar 03 12:50:03 icpu-test-bap10 kernel: Modules linked in: xt_time xt_connlimit xt_connmark xt_NFLOG xt_limit xt_hashlimit veth ip_set_bitmap_port xt_DSCP xt_multiport ip_set_hash_ip xt_owner xt_set ip_set_hash_net xt_state xt_conntrack nf_conntrack_ftp mars lz4_decompress lz4_compress ipmi_devintf x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul hed ipmi_si ipmi_msghandler processor crc32c_intel ehci_pci ehci_hcd usbcore i40e usb_common
Mar 03 12:50:03 icpu-test-bap10 kernel: CPU: 37 PID: 8488 UID: 99 Comm: apache2-ssl Not tainted 4.4.259-rc1-grsec+ #737
Mar 03 12:50:03 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
Mar 03 12:50:03 icpu-test-bap10 kernel:  0000000000000000 ffff883f863f7c70 ffffffff816f9002 0000000000000000
Mar 03 12:50:03 icpu-test-bap10 kernel:  0000000000000009 ffff883f863f7ca8 ffffffff8104cda2 ffffffff810abac7
Mar 03 12:50:03 icpu-test-bap10 kernel:  ffff883ffbfe5e80 0000000000000000 ffff883f82ed4bc0 00007fc01c9bf000
Mar 03 12:50:03 icpu-test-bap10 kernel: Call Trace:
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff816f9002>] dump_stack+0x94/0xca
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff8104cda2>] warn_slowpath_common+0x94/0xad
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810abac7>] ? pi_state_update_owner+0x3d/0xd7
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff8104ce5f>] warn_slowpath_null+0x15/0x17
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810abac7>] pi_state_update_owner+0x3d/0xd7
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810abea8>] free_pi_state+0x2d/0x73
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810abf0b>] unqueue_me_pi+0x1d/0x31
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810ad735>] futex_lock_pi+0x27a/0x2e8
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff81088bca>] ? __lock_acquire+0x327/0xe4a
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810ae6a9>] do_futex+0x784/0x809
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810cfa9a>] ? seccomp_phase1+0xde/0x1e7
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810a4503>] ? current_kernel_time64+0xb/0x31
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810d23c3>] ? current_kernel_time+0xb/0xf
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff810ae861>] SyS_futex+0x133/0x143
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff8100158a>] ? syscall_trace_enter_phase2+0x1a2/0x1bb
Mar 03 12:50:03 icpu-test-bap10 kernel:  [<ffffffff81701888>] tracesys_phase2+0x90/0x95
Mar 03 12:50:03 icpu-test-bap10 kernel: ---[ end trace 968f95a458dea951 ]---

In order to both (1) prevent the self-deadlock, and (2) to satisfy the assertion
at pi_state_update_owner(), some locking with irq disable is needed,
at least in the specific call stack.

Interestingly, there existed a suchalike locking just before
f08a4af.

This is just a quick hotfix, resurrecting some previous
locks at the old places, but now using ->wait_lock in place
of the previous ->pi_lock (which was in place before
f08a4af).

The ->pi_lock is now also taken, by the new code
which had been introduced in
34c8e1c.

When this patch is applied, both the above splats are
no longer triggering at my prelife machines.

Without this patch, I cannot ensure stable production at
1&1 Ionos.

Hint for further work: I have not yet tested other call paths,
since I am under time pressure for security reasons.

Hint for further hardening of 4.4.y and probably some more LTS series:
Probably some more systematic testing with CONFIG_PROVE_LOCKING
(and probably some more options) should be invested
in order to make the 4.4 LTS series really "stable" again.

Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lee Jones <lee.jones@linaro.org>
Fixes: f08a4af ("futex: Use pi_state_update_owner() in put_pi_state()")
Fixes: 34c8e1c ("futex: Provide and use pi_state_update_owner()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch and problem analysis is specific for 4.4 LTS, due to incomplete
backporting of other fixes. Later LTS series have different backports.

The following is obviously incorrect:

static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
             struct futex_hash_bucket *hb)
{
[...]
	raw_spin_lock(&pi_state->pi_mutex.wait_lock);
[...]
	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
[...]
}

The 4.4-specific fix should probably go in the direction of
b4abf91,
making everything irq-safe.

Probably, backporting of b4abf91
to 4.4 LTS could thus be another good idea.

However, this might involve some more 4.4-specific work and
require thorough testing:

> git log --oneline v4.4..b4abf91 -- kernel/futex.c kernel/locking/rtmutex.c | wc -l
10

So this patch is just an obvious quickfix for now.

Hint: the lock order is documented in 4.9.y and later. A similar
documenting is missing in 4.4.y. Please somebody either backport also,
or write a new description, if there would be some differences I cannot
easily see at the moment. Without reliable docs,
inspection of the locking correctness may become a pain.
 
Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lee Jones <lee.jones@linaro.org>
Fixes: 394fc49 ("futex: Rework inconsistent rt_mutex/futex_q state")
Fixes: 6510e4a ("futex,rt_mutex: Provide futex specific rt_mutex API")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 26a9630c72ebac7c564db305a6aee54a8edde70e ]

Currently the mask operation on variable conf is just 3 bits so
the switch statement case value of 8 is unreachable dead code.
The function daio_mgr_dao_init can be passed a 4 bit value,
function dao_rsc_init calls it with conf set to:

     conf = (desc->msr & 0x7) | (desc->passthru << 3);

so clearly when desc->passthru is set to 1 then conf can be
at least 8.

Fix this by changing the mask to 0xf.

Fixes: 8cc7236 ("ALSA: SB X-Fi driver merge")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210227001527.1077484-1-colin.king@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77516d25f54912a7baedeeac1b1b828b6f285152 ]

The copy_to_user() function returns the number of bytes remaining but
we want to return -EFAULT to the user if it can't complete the copy.
The "st" variable only holds zero on success or negative error codes on
failure so the type should be int.

Fixes: 36f988e ("rsxx: Adding in debugfs entries.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a4c8dd9c2d0987cf542a2a0c42684c9c6d78a04e upstream.

According to the definition of dm_iterate_devices_fn:
 * This function must iterate through each section of device used by the
 * target until it encounters a non-zero return code, which it then returns.
 * Returns zero if no callout returned non-zero.

For some target type (e.g. dm-stripe), one call of iterate_devices() may
iterate multiple underlying devices internally, in which case a non-zero
return code returned by iterate_devices_callout_fn will stop the iteration
in advance. No iterate_devices_callout_fn should return non-zero unless
device iteration should stop.

Rename dm_table_requires_stable_pages() to dm_table_any_dev_attr() and
elevate it for reuse to stop iterating (and return non-zero) on the
first device that causes iterate_devices_callout_fn to return non-zero.
Use dm_table_any_dev_attr() to properly iterate through devices.

Rename device_is_nonrot() to device_is_rotational() and invert logic
accordingly to fix improper disposition.

[jeffle: backport notes]
No stable writes. Also convert the no_sg_merge capability check,
which is introduced by commit 200612e ("dm table: propagate
QUEUE_FLAG_NO_SG_MERGE"), and removed since commit 2705c93 ("block:
kill QUEUE_FLAG_NO_SG_MERGE") in v5.1.

Fixes: c3c4555 ("dm table: clear add_random unless all devices have it set")
Fixes: 4693c96 ("dm table: propagate non rotational flag")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 39aa009 ]

Add a new force_caps module parameter to allow overriding the drivers
builtin capability detection mechanism.

This can be used to for example:
-Disable rfkill functionality on devices where there is an AA OEM DMI
 record advertising non functional rfkill switches
-Force loading of the driver on devices with a missing AA OEM DMI record

Note that force_caps is -1 when unset, this allows forcing the
capability field to 0, which results in acer-wmi only providing WMI
hotkey handling while disabling all other (led, rfkill, backlight)
functionality.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201019185628.264473-4-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0599837 ]

Add function 1 DMA alias quirk for Marvell 88SS9215 PCIe SSD Controller.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c135
Link: https://lore.kernel.org/r/20201110220516.697934-1-helgaas@kernel.org
Reported-by: John Smith <LK7S2ED64JHGLKj75shg9klejHWG49h5hk@protonmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Jason Self <jason@bluehome.net>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/r/20210310132319.155338551@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.4.261: (8 commits)
        futex: fix irq self-deadlock and satisfy assertion
        futex: fix spin_lock() / spin_unlock_irq() imbalance
        ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
        rsxx: Return -EFAULT if copy_to_user() fails
        dm table: fix iterate_devices based device capability checks
        platform/x86: acer-wmi: Add new force_caps module parameter
        PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
        Linux 4.4.261
@derfelot derfelot merged commit 099adae into whatawurst:lineage-18.1 Mar 14, 2021
@derfelot derfelot deleted the lineage-18.1_update branch March 14, 2021 20:59
bananafunction referenced this pull request in bananafunction/android_kernel_sony_msm8998-1 May 16, 2022
[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     Flamefire#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    Flamefire#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    Flamefire#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    Flamefire#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    Flamefire#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    Flamefire#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    Flamefire#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    Flamefire#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    Flamefire#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    Flamefire#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    Flamefire#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    Flamefire#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    Flamefire#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    Flamefire#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    Flamefire#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    Flamefire#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    Flamefire#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    Flamefire#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Flamefire referenced this pull request in Flamefire/android_kernel_sony_msm8998 May 17, 2022
[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Amy07i pushed a commit to Amy07i/android_kernel_sony_msm8998 that referenced this pull request Jun 12, 2022
[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     whatawurst#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    whatawurst#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    whatawurst#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    whatawurst#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    whatawurst#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    whatawurst#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    whatawurst#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    whatawurst#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    whatawurst#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    whatawurst#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    whatawurst#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    whatawurst#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    whatawurst#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    whatawurst#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    whatawurst#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    whatawurst#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    whatawurst#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    whatawurst#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ariffjenong pushed a commit to ariffjenong/android_kernel_sony_msm8998 that referenced this pull request Sep 4, 2022
[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     whatawurst#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    whatawurst#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    whatawurst#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    whatawurst#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    whatawurst#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    whatawurst#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    whatawurst#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    whatawurst#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    whatawurst#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    whatawurst#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    whatawurst#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    whatawurst#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    whatawurst#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    whatawurst#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    whatawurst#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    whatawurst#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    whatawurst#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    whatawurst#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants