[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218

hakavlad · 2021-07-17T16:36:47Z

Please look at this https://github.com/hakavlad/le9-patch

https://github.com/hakavlad/le9-patch/blob/main/le9db_patches/le9db-5.13-rc2-mg-LRU-v3.patch may be applied to zen 5.12-5.13. I suggest this patch to be applied to linux-zen.

See also

https://www.phoronix.com/scan.php?page=news_item&px=le9-Linux-Low-RAM and comments

https://forum.xanmod.org/thread-4102-post-7562.html

hakavlad/le9-patch#4 and https://notes.valdikss.org.ru/linux-for-old-pc-from-2007/en/

damentz · 2021-07-17T17:16:31Z

Thanks, this patchset looks safe and straightforward to add. I ended up using the 5.14 release candidate for the 5.13 branch since 5.13's memory subsystem changed quite a bit since rc2.

Also, with both 5.12 and 5.13, I set a default value of effectively 256mb soft and 64mb hard cache protection. Should make it easier for any package maintainers to pick up this change without needing to research too much into it.

hakavlad · 2021-07-17T17:43:41Z

64mb hard cache protection

Note that hard protection may cause these issues with DRM/i915:

[DRM/i915] fatal IO error 11 (Resource temporarily unavailable) hakavlad/le9-patch#5
[DRM/i915] kernel: BUG: unable to handle page fault / kernel: Oops hakavlad/le9-patch#6

https://github.com/hakavlad/le9-patch#warnings

Soft protection should be safe and 256MB is OK.

@hakavlad

Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)

@hakavlad

Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)

damentz · 2021-07-17T18:40:38Z

@hakavlad thanks for the warning, dropped CLEAN_MIN_KBYTES to zero.

hakavlad · 2021-07-17T19:05:24Z

Also note that le9 doesn't work with mg-LRU [1]. If you are using mg-LRU, than le9 will not work (mg-LRU doesn't use get_scan_count()).

[1] https://forum.xanmod.org/thread-4102-post-7599.html#pid7599

@hakavlad

Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] zen-kernel#218 (comment) Signed-off-by: kernelOfTruth <kerneloftruth@gmail.com>

travankor · 2021-07-25T12:01:08Z

@yuzhaogoogle "Protecting file pages with mgLRU will require a completely different approach than in le9. I do not yet know how to implement protection of file pages when using mgLRU."

Do you plan to implement this for mgLRU?

yuzhaogoogle · 2021-07-25T21:12:26Z

Yes, sometime next week.

yuzhaogoogle · 2021-08-11T20:16:54Z

Circling back on this: now echo 1000 >/sys/kernel/mm/lru_gen/min_ttl_ms can protect both anon and file memory that is used within the last 1000ms. Feel free to adjust this value to fit into your use cases. @travankor

Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.

travankor · 2021-08-12T15:20:20Z

I will try this out soon, thanks. What are the trade-offs when tweaking (larger and smaller) this with sysctl?

yuzhaogoogle · 2021-08-12T16:53:35Z

Oom kill more aggressively (more get killed) but more responsive user experience
or
oom kill less aggressively (fewer get killed) but less responsive user experience.

Does it make sense?

travankor · 2021-08-12T20:25:39Z

Yes, I understand. So userspace OOM daemons, like earlyoom, are no longer needed with mgLRU?

Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.

[ Upstream commit d8c22c4 ] Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com Fixes: 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.

Distinguish between xe_pt and the xe_pt_dir subclass when allocating and freeing. Also use a fixed-size array for the xe_pt_dir page entries to make life easier for dynamic range- checkers. Finally rename the page-directory child pointer array to "children". While no functional change, this fixes ubsan splats similar to: [ 51.463021] ------------[ cut here ]------------ [ 51.463022] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/xe/xe_pt.c:47:9 [ 51.463023] index 0 is out of range for type 'xe_ptw *[*]' [ 51.463024] CPU: 5 PID: 2778 Comm: xe_vm Tainted: G U 6.8.0-rc1+ #218 [ 51.463026] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 51.463027] Call Trace: [ 51.463028] <TASK> [ 51.463029] dump_stack_lvl+0x47/0x60 [ 51.463030] __ubsan_handle_out_of_bounds+0x95/0xd0 [ 51.463032] xe_pt_destroy+0xa5/0x150 [xe] [ 51.463088] __xe_pt_unbind_vma+0x36c/0x9b0 [xe] [ 51.463144] xe_vm_unbind+0xd8/0x580 [xe] [ 51.463204] ? drm_exec_prepare_obj+0x3f/0x60 [drm_exec] [ 51.463208] __xe_vma_op_execute+0x5da/0x910 [xe] [ 51.463268] ? __drm_gpuvm_sm_unmap+0x1cb/0x220 [drm_gpuvm] [ 51.463272] ? radix_tree_node_alloc.constprop.0+0x89/0xc0 [ 51.463275] ? drm_gpuva_it_remove+0x1f3/0x2a0 [drm_gpuvm] [ 51.463279] ? drm_gpuva_remove+0x2f/0xc0 [drm_gpuvm] [ 51.463283] xe_vm_bind_ioctl+0x1a55/0x20b0 [xe] [ 51.463344] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463414] drm_ioctl_kernel+0xb6/0x120 [ 51.463416] drm_ioctl+0x287/0x4e0 [ 51.463418] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463481] __x64_sys_ioctl+0x94/0xd0 [ 51.463484] do_syscall_64+0x86/0x170 [ 51.463486] ? syscall_exit_to_user_mode+0x7d/0x200 [ 51.463488] ? do_syscall_64+0x96/0x170 [ 51.463490] ? do_syscall_64+0x96/0x170 [ 51.463492] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 51.463494] RIP: 0033:0x7f246bfe817d [ 51.463498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 51.463501] RSP: 002b:00007ffc1bd19ad0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 51.463502] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f246bfe817d [ 51.463504] RDX: 00007ffc1bd19b60 RSI: 0000000040886445 RDI: 0000000000000003 [ 51.463505] RBP: 00007ffc1bd19b20 R08: 0000000000000000 R09: 0000000000000000 [ 51.463506] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc1bd19b60 [ 51.463508] R13: 0000000040886445 R14: 0000000000000003 R15: 0000000000010000 [ 51.463510] </TASK> [ 51.463517] ---[ end trace ]--- v2 - Fix kerneldoc warning (Matthew Brost) Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209112655.4872-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 157261c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.

damentz closed this as completed Jul 17, 2021

damentz added a commit that referenced this issue Jul 17, 2021

ZEN: Set CLEAN_MIN_KBYTES to 0

67850ef

Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)

damentz added a commit that referenced this issue Jul 17, 2021

ZEN: Set CLEAN_MIN_KBYTES to 0

87358ef

Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)

travankor mentioned this issue Aug 4, 2021

Zen swaps excessively when copying large files, leading to lags and freezes #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218

[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218

hakavlad commented Jul 17, 2021

damentz commented Jul 17, 2021

hakavlad commented Jul 17, 2021

damentz commented Jul 17, 2021

hakavlad commented Jul 17, 2021

travankor commented Jul 25, 2021

yuzhaogoogle commented Jul 25, 2021

yuzhaogoogle commented Aug 11, 2021

travankor commented Aug 12, 2021

yuzhaogoogle commented Aug 12, 2021

travankor commented Aug 12, 2021

[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218

[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218

Comments

hakavlad commented Jul 17, 2021

damentz commented Jul 17, 2021

hakavlad commented Jul 17, 2021

damentz commented Jul 17, 2021

hakavlad commented Jul 17, 2021

travankor commented Jul 25, 2021

yuzhaogoogle commented Jul 25, 2021

yuzhaogoogle commented Aug 11, 2021

travankor commented Aug 12, 2021

yuzhaogoogle commented Aug 12, 2021

travankor commented Aug 12, 2021