Skip to content

Commit

Permalink
mm/vmscan: Add sysctl knobs for protecting the working set [le9uo-1.5]
Browse files Browse the repository at this point in the history
The kernel does not provide a way to protect the working set under memory
pressure. A certain amount of anonymous and clean file pages is required by
the userspace for normal operation. First of all, the userspace needs a
cache of shared libraries and executable binaries. If the amount of the
clean file pages falls below a certain level, then thrashing and even
livelock can take place.

The patch provides sysctl knobs for protecting the working set (anonymous
and clean file pages) under memory pressure.

== Multi-Gen LRU compatibility ==

le9uo 1.3 and above comes with a long-waited Multi-Gen LRU (MGLRU, orlru_gen)
compatibility. It comes with the working set protection features like it has
to the traditional LRU. Please be aware that there is an MGLRU-specific
limitation. At the latest Linux kernel (version 6.7.5 at the time this is
written), Multi-gen LRU lacks the ability to comply with the vm.swappiness
sysctl knob like it was initially designed. Almost regardless of what value
is put in vm.swappiness (as long as greater than 0), it seems to evict whatever
it finds first. This behavior is coming from MGLRU's page-scanner
design/implementation, and it causes to start to thrash much earlier and easier
than the traditional LRU. MGLRU does rather temporal approach called min_ttl,
but this design has another problem; it's much more difficult to estimate each
system's optimal effective value than traditional LRU + le9's spacial approach,
and when the value is out of the effective range, it easily results either in
too early invocation of OOM killer, or thrashing.

le9uo does not fix this issue, but greatly mitigates it so that these limitations
due to MGLRU's design/implementation isn't a problem anymore.

[1] https://github.com/firelzrd/le9uo/blob/main/le9uo_patches/stable/0001-linux6.6-le9uo-1.5.patch

Signed-off-by: Alexandre Frade <kernel@xanmod.org>
  • Loading branch information
firelzrd authored and xanmod committed Apr 10, 2024
1 parent ea0ae0b commit 5cd56f6
Show file tree
Hide file tree
Showing 6 changed files with 329 additions and 7 deletions.
72 changes: 72 additions & 0 deletions Documentation/admin-guide/sysctl/vm.rst
Expand Up @@ -25,6 +25,9 @@ files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:

- admin_reserve_kbytes
- anon_min_ratio
- clean_low_ratio
- clean_min_ratio
- compact_memory
- compaction_proactiveness
- compact_unevictable_allowed
Expand Down Expand Up @@ -106,6 +109,67 @@ On x86_64 this is about 128MB.
Changing this takes effect whenever an application requests memory.


anon_min_ratio
==============

This knob provides *hard* protection of anonymous pages. The anonymous pages
on the current node won't be reclaimed under any conditions when their amount
is below vm.anon_min_ratio.

This knob may be used to prevent excessive swap thrashing when anonymous
memory is low (for example, when memory is going to be overfilled by
compressed data of zram module).

Setting this value too high (close to 100) can result in inability to
swap and can lead to early OOM under memory pressure.

The unit of measurement is the percentage of the total memory of the node.

The default value is 15.


clean_low_ratio
================

This knob provides *best-effort* protection of clean file pages. The file pages
on the current node won't be reclaimed under memory pressure when the amount of
clean file pages is below vm.clean_low_ratio *unless* we threaten to OOM.

Protection of clean file pages using this knob may be used when swapping is
still possible to
- prevent disk I/O thrashing under memory pressure;
- improve performance in disk cache-bound tasks under memory pressure.

Setting it to a high value may result in a early eviction of anonymous pages
into the swap space by attempting to hold the protected amount of clean file
pages in memory.

The unit of measurement is the percentage of the total memory of the node.

The default value is 0.


clean_min_ratio
================

This knob provides *hard* protection of clean file pages. The file pages on the
current node won't be reclaimed under memory pressure when the amount of clean
file pages is below vm.clean_min_ratio.

Hard protection of clean file pages using this knob may be used to
- prevent disk I/O thrashing under memory pressure even with no free swap space;
- improve performance in disk cache-bound tasks under memory pressure;
- avoid high latency and prevent livelock in near-OOM conditions.

Setting it to a high value may result in a early out-of-memory condition due to
the inability to reclaim the protected amount of clean file pages when other
types of pages cannot be reclaimed.

The unit of measurement is the percentage of the total memory of the node.

The default value is 15.


compact_memory
==============

Expand Down Expand Up @@ -910,6 +974,14 @@ be 133 (x + 2x = 200, 2x = 133.33).
At 0, the kernel will not initiate swap until the amount of free and
file-backed pages is less than the high watermark in a zone.

This knob has no effect if the amount of clean file pages on the current
node is below vm.clean_low_ratio or vm.clean_min_ratio. In this case,
only anonymous pages can be reclaimed.

If the number of anonymous pages on the current node is below
vm.anon_min_ratio, then only file pages can be reclaimed with
any vm.swappiness value.


unprivileged_userfaultfd
========================
Expand Down
8 changes: 8 additions & 0 deletions include/linux/mm.h
Expand Up @@ -195,6 +195,14 @@ static inline void __mm_zero_struct_page(struct page *page)

extern int sysctl_max_map_count;

extern bool sysctl_workingset_protection;
extern u8 sysctl_anon_min_ratio;
extern u8 sysctl_clean_low_ratio;
extern u8 sysctl_clean_min_ratio;
int vm_workingset_protection_update_handler(
struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);

extern unsigned long sysctl_user_reserve_kbytes;
extern unsigned long sysctl_admin_reserve_kbytes;

Expand Down
34 changes: 34 additions & 0 deletions kernel/sysctl.c
Expand Up @@ -2227,6 +2227,40 @@ static struct ctl_table vm_table[] = {
.extra1 = SYSCTL_ZERO,
},
#endif
{
.procname = "workingset_protection",
.data = &sysctl_workingset_protection,
.maxlen = sizeof(bool),
.mode = 0644,
.proc_handler = &proc_dobool,
},
{
.procname = "anon_min_ratio",
.data = &sysctl_anon_min_ratio,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = &vm_workingset_protection_update_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "clean_low_ratio",
.data = &sysctl_clean_low_ratio,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = &vm_workingset_protection_update_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "clean_min_ratio",
.data = &sysctl_clean_min_ratio,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = &vm_workingset_protection_update_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_HUNDRED,
},
{
.procname = "user_reserve_kbytes",
.data = &sysctl_user_reserve_kbytes,
Expand Down
63 changes: 63 additions & 0 deletions mm/Kconfig
Expand Up @@ -486,6 +486,69 @@ config ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
config ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
bool

config ANON_MIN_RATIO
int "Default value for vm.anon_min_ratio"
depends on SYSCTL
range 0 100
default 15
help
This option sets the default value for vm.anon_min_ratio sysctl knob.

The vm.anon_min_ratio sysctl knob provides *hard* protection of
anonymous pages. The anonymous pages on the current node won't be
reclaimed under any conditions when their amount is below
vm.anon_min_ratio. This knob may be used to prevent excessive swap
thrashing when anonymous memory is low (for example, when memory is
going to be overfilled by compressed data of zram module).

Setting this value too high (close to MemTotal) can result in
inability to swap and can lead to early OOM under memory pressure.

config CLEAN_LOW_RATIO
int "Default value for vm.clean_low_ratio"
depends on SYSCTL
range 0 100
default 0
help
This option sets the default value for vm.clean_low_ratio sysctl knob.

The vm.clean_low_ratio sysctl knob provides *best-effort*
protection of clean file pages. The file pages on the current node
won't be reclaimed under memory pressure when the amount of clean file
pages is below vm.clean_low_ratio *unless* we threaten to OOM.
Protection of clean file pages using this knob may be used when
swapping is still possible to
- prevent disk I/O thrashing under memory pressure;
- improve performance in disk cache-bound tasks under memory
pressure.

Setting it to a high value may result in a early eviction of anonymous
pages into the swap space by attempting to hold the protected amount
of clean file pages in memory.

config CLEAN_MIN_RATIO
int "Default value for vm.clean_min_ratio"
depends on SYSCTL
range 0 100
default 15
help
This option sets the default value for vm.clean_min_ratio sysctl knob.

The vm.clean_min_ratio sysctl knob provides *hard* protection of
clean file pages. The file pages on the current node won't be
reclaimed under memory pressure when the amount of clean file pages is
below vm.clean_min_ratio. Hard protection of clean file pages using
this knob may be used to
- prevent disk I/O thrashing under memory pressure even with no free
swap space;
- improve performance in disk cache-bound tasks under memory
pressure;
- avoid high latency and prevent livelock in near-OOM conditions.

Setting it to a high value may result in a early out-of-memory condition
due to the inability to reclaim the protected amount of clean file pages
when other types of pages cannot be reclaimed.

config HAVE_MEMBLOCK_PHYS_MAP
bool

Expand Down
1 change: 1 addition & 0 deletions mm/mm_init.c
Expand Up @@ -2749,6 +2749,7 @@ static void __init mem_init_print_info(void)
, K(totalhigh_pages())
#endif
);
printk(KERN_INFO "le9 Unofficial (le9uo) working set protection 1.5 by Masahito Suzuki (forked from hakavlad's original le9 patch)");
}

/*
Expand Down

0 comments on commit 5cd56f6

Please sign in to comment.