Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create voh98 #119

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Create voh98 #119

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 14, 2014

No description provided.

mikey pushed a commit to mikey/linux that referenced this pull request Sep 18, 2014
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   torvalds#6   torvalds#7
[    0.603005] .... node   #1, CPUs:     torvalds#8   torvalds#9  torvalds#10  torvalds#11  torvalds#12  torvalds#13  torvalds#14  torvalds#15
[    1.200005] .... node   #2, CPUs:    torvalds#16  torvalds#17  torvalds#18  torvalds#19  torvalds#20  torvalds#21  torvalds#22  torvalds#23
[    1.796005] .... node   #3, CPUs:    torvalds#24  torvalds#25  torvalds#26  torvalds#27  torvalds#28  torvalds#29  torvalds#30  torvalds#31
[    2.393005] .... node   #4, CPUs:    torvalds#32  torvalds#33  torvalds#34  torvalds#35  torvalds#36  torvalds#37  torvalds#38  torvalds#39
[    2.996005] .... node   #5, CPUs:    torvalds#40  torvalds#41  torvalds#42  torvalds#43  torvalds#44  torvalds#45  torvalds#46  torvalds#47
[    3.600005] .... node   torvalds#6, CPUs:    torvalds#48  torvalds#49  torvalds#50  torvalds#51  #52  #53  torvalds#54  torvalds#55
[    4.202005] .... node   torvalds#7, CPUs:    torvalds#56  torvalds#57  #58  torvalds#59  torvalds#60  torvalds#61  torvalds#62  torvalds#63
[    4.811005] .... node   torvalds#8, CPUs:    torvalds#64  torvalds#65  torvalds#66  torvalds#67  torvalds#68  torvalds#69  #70  torvalds#71
[    5.421006] .... node   torvalds#9, CPUs:    torvalds#72  torvalds#73  torvalds#74  torvalds#75  torvalds#76  torvalds#77  torvalds#78  torvalds#79
[    6.032005] .... node  torvalds#10, CPUs:    torvalds#80  torvalds#81  torvalds#82  torvalds#83  torvalds#84  torvalds#85  torvalds#86  torvalds#87
[    6.648006] .... node  torvalds#11, CPUs:    torvalds#88  torvalds#89  torvalds#90  torvalds#91  torvalds#92  torvalds#93  torvalds#94  torvalds#95
[    7.262005] .... node  torvalds#12, CPUs:    torvalds#96  torvalds#97  torvalds#98  torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103
[    7.865005] .... node  torvalds#13, CPUs:   torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111
[    8.466005] .... node  torvalds#14, CPUs:   torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119
[    9.073006] .... node  torvalds#15, CPUs:   torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
@Elizafox
Copy link

Elizafox commented Jan 8, 2015

@Johvoo Not only is this totally worthless, Linus doesn't accept pull requests from GitHub. Sorry!

EctoCosmonaut pushed a commit to EctoCosmonaut/linux that referenced this pull request Jan 13, 2016
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Feb 8, 2016
When boot arm64 kernel with KASAN enabled, the below error is reported by
kasan:

BUG: KASAN: out-of-bounds in unwind_frame+0xec/0x260 at addr ffffffc064d57ba0
Read of size 8 by task pidof/499
page:ffffffbdc39355c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 2 PID: 499 Comm: pidof Not tainted 4.5.0-rc1 torvalds#119
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffffffc00008d078>] dump_backtrace+0x0/0x290
[<ffffffc00008d32c>] show_stack+0x24/0x30
[<ffffffc0006a981c>] dump_stack+0x8c/0xd8
[<ffffffc0002e4400>] kasan_report_error+0x558/0x588
[<ffffffc0002e4958>] kasan_report+0x60/0x70
[<ffffffc0002e3188>] __asan_load8+0x60/0x78
[<ffffffc00008c92c>] unwind_frame+0xec/0x260
[<ffffffc000087e60>] get_wchan+0x110/0x160
[<ffffffc0003b647c>] do_task_stat+0xb44/0xb68
[<ffffffc0003b7730>] proc_tgid_stat+0x40/0x50
[<ffffffc0003ac840>] proc_single_show+0x88/0xd8
[<ffffffc000345be8>] seq_read+0x370/0x770
[<ffffffc00030aba0>] __vfs_read+0xc8/0x1d8
[<ffffffc00030c0ec>] vfs_read+0x94/0x168
[<ffffffc00030d458>] SyS_read+0xb8/0x128
[<ffffffc000086530>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc064d57a80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4
 ffffffc064d57b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc064d57b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                  ^
 ffffffc064d57c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc064d57c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Since the shadow byte pointed by the report is 0, so it may mean it is just hit
oob in non-current task. So, disable the instrumentation to silence these
warnings.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
torvalds pushed a commit that referenced this pull request Feb 19, 2016
When boot arm64 kernel with KASAN enabled, the below error is reported by
kasan:

BUG: KASAN: out-of-bounds in unwind_frame+0xec/0x260 at addr ffffffc064d57ba0
Read of size 8 by task pidof/499
page:ffffffbdc39355c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 2 PID: 499 Comm: pidof Not tainted 4.5.0-rc1 #119
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffffffc00008d078>] dump_backtrace+0x0/0x290
[<ffffffc00008d32c>] show_stack+0x24/0x30
[<ffffffc0006a981c>] dump_stack+0x8c/0xd8
[<ffffffc0002e4400>] kasan_report_error+0x558/0x588
[<ffffffc0002e4958>] kasan_report+0x60/0x70
[<ffffffc0002e3188>] __asan_load8+0x60/0x78
[<ffffffc00008c92c>] unwind_frame+0xec/0x260
[<ffffffc000087e60>] get_wchan+0x110/0x160
[<ffffffc0003b647c>] do_task_stat+0xb44/0xb68
[<ffffffc0003b7730>] proc_tgid_stat+0x40/0x50
[<ffffffc0003ac840>] proc_single_show+0x88/0xd8
[<ffffffc000345be8>] seq_read+0x370/0x770
[<ffffffc00030aba0>] __vfs_read+0xc8/0x1d8
[<ffffffc00030c0ec>] vfs_read+0x94/0x168
[<ffffffc00030d458>] SyS_read+0xb8/0x128
[<ffffffc000086530>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
 ffffffc064d57a80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4
 ffffffc064d57b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc064d57b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                  ^
 ffffffc064d57c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc064d57c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Since the shadow byte pointed by the report is 0, so it may mean it is just hit
oob in non-current task. So, disable the instrumentation to silence these
warnings.

Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CkNoSFeRaTU pushed a commit to CkNoSFeRaTU/linux that referenced this pull request Aug 25, 2016
Compatibility fixes for kernel version 4.4.x
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 11, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 12, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 15, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 16, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 20, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 20, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 21, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Sep 21, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
vinceab pushed a commit to vinceab/linux that referenced this pull request Sep 28, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Sep 28, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Sep 30, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Oct 1, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
vinceab pushed a commit to vinceab/linux that referenced this pull request Oct 3, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Oct 5, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12: 
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Oct 10, 2016
…tch-fixes

ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
torvalds#12:
	commit 7c9cb38

WARNING: line over 80 characters
torvalds#87: FILE: kernel/relay.c:337:
+	struct rchan_buf *buf = container_of(work, struct rchan_buf, wakeup_work);

WARNING: waitqueue_active without comment
torvalds#119: FILE: kernel/relay.c:772:
+		if (waitqueue_active(&buf->read_wait)) {

total: 1 errors, 2 warnings, 70 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/relay-use-irq_work-instead-of-plain-timer-for-deferred-wakeup.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 10, 2017
Dmitry reported uses after free in qdisc code [1]

The problem here is that ops->init() can return an error.

qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.

Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.

This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.

[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV torvalds#119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
 ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
 ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
Call Trace:
 [<ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51
 [<ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [<ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline]
 [<ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
 [<ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline]
 [<ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
 [<ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
 [<ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
 [<ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
 [<ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
 [<ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
 [<ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403
 [<ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
 [<ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
 [<ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
 [<ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
 [<ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879
 [<ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958
 [<ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline]
 [<ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
 [<ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline]
 [<ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
 [<ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 12, 2017
Dmitry reported uses after free in qdisc code [1]

The problem here is that ops->init() can return an error.

qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.

Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.

This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.

[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV torvalds#119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
 ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
 ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
Call Trace:
 [<ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51
 [<ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [<ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline]
 [<ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
 [<ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline]
 [<ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
 [<ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
 [<ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
 [<ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
 [<ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
 [<ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
 [<ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403
 [<ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
 [<ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
 [<ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
 [<ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
 [<ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879
 [<ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958
 [<ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline]
 [<ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
 [<ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline]
 [<ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
 [<ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017
lklfuse: fix fuse routines calling sequence.
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 17, 2017
Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

[    2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
[    2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
[    2.597686]
[    2.597717] ===============================
[    2.597717] [ ERR: suspicious RCU usage.  ]
[    2.597717] 4.10.0-rc8-next-20170215+ torvalds#119 Not tainted
[    2.597717] -------------------------------
[    2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
[    2.597717]
[    2.597717] other info that might help us debug this:
[    2.597717]
[    2.597717]
[    2.597717] RCU used illegally from idle CPU!
[    2.597717] rcu_scheduler_active = 2, debug_locks = 0
[    2.597747] RCU used illegally from extended quiescent state!
[    2.597747] 2 locks held by swapper/0/0:
[    2.597747]  #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
[    2.597747]  #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
[    2.597747]
[    2.597747] stack backtrace:
[    2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ torvalds#119
[    2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
[    2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[    2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
[    2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
[    2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
[    2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
[    2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
[    2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
[    2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
[    2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
[    2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
[    2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
[    2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
[    2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[    2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
[    2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
[    2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
[    2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which
want lockdep to be enabled "current->lockdep_recursion == 0".

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: <stable@vger.kernel.org> [3.4+]
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 19, 2017
Use rcuidle console tracepoint because, apparently, it may be issued from
an idle CPU:

[    2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
[    2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
[    2.597686]
[    2.597717] ===============================
[    2.597717] [ ERR: suspicious RCU usage.  ]
[    2.597717] 4.10.0-rc8-next-20170215+ torvalds#119 Not tainted
[    2.597717] -------------------------------
[    2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
[    2.597717]
[    2.597717] other info that might help us debug this:
[    2.597717]
[    2.597717]
[    2.597717] RCU used illegally from idle CPU!
[    2.597717] rcu_scheduler_active = 2, debug_locks = 0
[    2.597747] RCU used illegally from extended quiescent state!
[    2.597747] 2 locks held by swapper/0/0:
[    2.597747]  #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
[    2.597747]  #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
[    2.597747]
[    2.597747] stack backtrace:
[    2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ torvalds#119
[    2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
[    2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[    2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
[    2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
[    2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
[    2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
[    2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
[    2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
[    2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
[    2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
[    2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
[    2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
[    2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
[    2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[    2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
[    2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
[    2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
[    2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)

This RCU warning, however, is suppressed by lockdep_off() in printk(). 
lockdep_off() increments the ->lockdep_recursion counter and thus disables
RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to
be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Cc: <stable@vger.kernel.org> [3.4+]

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 20, 2017
Use rcuidle console tracepoint because, apparently, it may be issued from
an idle CPU:

[    2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
[    2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
[    2.597686]
[    2.597717] ===============================
[    2.597717] [ ERR: suspicious RCU usage.  ]
[    2.597717] 4.10.0-rc8-next-20170215+ torvalds#119 Not tainted
[    2.597717] -------------------------------
[    2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
[    2.597717]
[    2.597717] other info that might help us debug this:
[    2.597717]
[    2.597717]
[    2.597717] RCU used illegally from idle CPU!
[    2.597717] rcu_scheduler_active = 2, debug_locks = 0
[    2.597747] RCU used illegally from extended quiescent state!
[    2.597747] 2 locks held by swapper/0/0:
[    2.597747]  #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
[    2.597747]  #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
[    2.597747]
[    2.597747] stack backtrace:
[    2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ torvalds#119
[    2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
[    2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[    2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
[    2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
[    2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
[    2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
[    2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
[    2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
[    2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
[    2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
[    2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
[    2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
[    2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
[    2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[    2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
[    2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
[    2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
[    2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus disables
RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to
be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Cc: <stable@vger.kernel.org> [3.4+]

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 23, 2017
commit fc98c3c upstream.

Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ torvalds#119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ torvalds#119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 23, 2017
commit fc98c3c upstream.

Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ torvalds#119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ torvalds#119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sean-jc added a commit to sean-jc/linux that referenced this pull request Jan 5, 2024
Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
yetist pushed a commit to loongarchlinux/linux that referenced this pull request Jan 9, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 9, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
sean-jc added a commit to sean-jc/linux that referenced this pull request Jan 9, 2024
Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
sean-jc added a commit to sean-jc/linux that referenced this pull request Jan 9, 2024
Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to do with "wakeup all" work
items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
sean-jc added a commit to sean-jc/linux that referenced this pull request Jan 10, 2024
Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
arinc9 pushed a commit to arinc9/linux that referenced this pull request Jan 10, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Jan 10, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Gelbpunkt pushed a commit to sm8450-mainline/linux that referenced this pull request Jan 11, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 12, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
roxell pushed a commit to roxell/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
logic10492 pushed a commit to logic10492/linux-amd-zen2 that referenced this pull request Jan 18, 2024
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 18, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 28, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Feb 7, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
roxell pushed a commit to roxell/linux that referenced this pull request Feb 7, 2024
Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Feb 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
yetist pushed a commit to loongarchlinux/linux that referenced this pull request Feb 29, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Mar 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Mar 29, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Apr 3, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Apr 3, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337 pushed a commit to CachyOS/linux that referenced this pull request Apr 3, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 10, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 13, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 13, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Apr 13, 2024
[ Upstream commit 3d75b8a ]

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm torvalds#119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5c ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b9 ("KVM: Halt vcpu if page it tries to access is swapped out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240110011533.503302-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants