Skip to content

Commit

Permalink
mm/vmscan: add sysctl knobs for protecting clean cache
Browse files Browse the repository at this point in the history
The patch provides sysctl knobs for protecting the specified amount of
clean file pages (CFP) under memory pressure.

The kernel does not have a mechanism for selectively protecting clean file
pages. A certain amount of the CFP is required by the userspace for normal
operation. First of all, you need a cache of shared libraries and
executable files. If the volume of the CFP cache falls below a certain
level, thrashing and even livelock occurs.

Protection of CFP may be used to prevent thrashing and reducing I/O under
memory pressure. Hard protection of CFP may be used to avoid high latency
and prevent livelock in near-OOM conditions. The patch provides sysctl
knobs for protecting the specified amount of clean file cache under memory
pressure.

The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of
CFP. The CFP on the current node won't be reclaimed uder memory pressure
when their amount is below vm.clean_low_kbytes *unless* we threaten to OOM
or have no free swap space or vm.swappiness=0. Setting it to a high value
may result in a early eviction of anonymous pages into the swap space by
attempting to hold the protected amount of clean file pages in memory. The
default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 150000 in
Kconfig).

The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The
CFP on the current node won't be reclaimed under memory pressure when their
amount is below vm.clean_min_kbytes. Setting it to a high value may result
in a early out-of-memory condition due to the inability to reclaim the
protected amount of CFP when other types of pages cannot be reclaimed. The
default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in
Kconfig).

Added compatibility with Multigenerational LRU Framework patchset v2.

Signed-off-by: Alexey Avramov <hakavlad@inbox.lv>
Signed-off-by: Alexandre Frade <kernel@xanmod.org>
  • Loading branch information
Alexey Avramov authored and xanmod committed May 19, 2021
1 parent 4ee49b5 commit d2aeab6
Show file tree
Hide file tree
Showing 5 changed files with 158 additions and 0 deletions.
37 changes: 37 additions & 0 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ Currently, these files are in /proc/sys/vm:

- admin_reserve_kbytes
- block_dump
- clean_low_kbytes
- clean_min_kbytes
- compact_memory
- compaction_proactiveness
- compact_unevictable_allowed
Expand Down Expand Up @@ -113,6 +115,41 @@ block_dump enables block I/O debugging when set to a nonzero value. More
information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.


clean_low_kbytes
=====================

This knob provides *best-effort* protection of clean file pages. The clean file
pages on the current node won't be reclaimed under memory pressure when their
amount is below vm.clean_low_kbytes *unless* we threaten to OOM or have no
free swap space or vm.swappiness=0.

Protection of clean file pages may be used to prevent thrashing and
reducing I/O under low-memory conditions.

Setting it to a high value may result in a early eviction of anonymous pages
into the swap space by attempting to hold the protected amount of clean file
pages in memory.

The default value is defined by CONFIG_CLEAN_LOW_KBYTES.


clean_min_kbytes
=====================

This knob provides *hard* protection of clean file pages. The clean file pages
on the current node won't be reclaimed under memory pressure when their amount
is below vm.clean_min_kbytes.

Hard protection of clean file pages may be used to avoid high latency and
prevent livelock in near-OOM conditions.

Setting it to a high value may result in a early out-of-memory condition due to
the inability to reclaim the protected amount of clean file pages when other
types of pages cannot be reclaimed.

The default value is defined by CONFIG_CLEAN_MIN_KBYTES.


compact_memory
==============

Expand Down
3 changes: 3 additions & 0 deletions include/linux/mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,9 @@ static inline void __mm_zero_struct_page(struct page *page)

extern int sysctl_max_map_count;

extern unsigned long sysctl_clean_low_kbytes;
extern unsigned long sysctl_clean_min_kbytes;

extern unsigned long sysctl_user_reserve_kbytes;
extern unsigned long sysctl_admin_reserve_kbytes;

Expand Down
14 changes: 14 additions & 0 deletions kernel/sysctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -3124,6 +3124,20 @@ static struct ctl_table vm_table[] = {
.extra2 = SYSCTL_ONE,
},
#endif
{
.procname = "clean_low_kbytes",
.data = &sysctl_clean_low_kbytes,
.maxlen = sizeof(sysctl_clean_low_kbytes),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
},
{
.procname = "clean_min_kbytes",
.data = &sysctl_clean_min_kbytes,
.maxlen = sizeof(sysctl_clean_min_kbytes),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
},
{
.procname = "user_reserve_kbytes",
.data = &sysctl_user_reserve_kbytes,
Expand Down
35 changes: 35 additions & 0 deletions mm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,41 @@ config SPARSEMEM_VMEMMAP
pfn_to_page and page_to_pfn operations. This is the most
efficient option when sufficient kernel resources are available.

config CLEAN_LOW_KBYTES
int "Default value for vm.clean_low_kbytes"
depends on SYSCTL
default "150000"
help
The vm.clean_low_kbytes sysctl knob provides *best-effort*
protection of clean file pages. The clean file pages on the current
node won't be reclaimed under memory pressure when their amount is
below vm.clean_low_kbytes *unless* we threaten to OOM or have
no free swap space or vm.swappiness=0.

Protection of clean file pages may be used to prevent thrashing and
reducing I/O under low-memory conditions.

Setting it to a high value may result in a early eviction of anonymous
pages into the swap space by attempting to hold the protected amount of
clean file pages in memory.

config CLEAN_MIN_KBYTES
int "Default value for vm.clean_min_kbytes"
depends on SYSCTL
default "0"
help
The vm.clean_min_kbytes sysctl knob provides *hard* protection
of clean file pages. The clean file pages on the current node won't be
reclaimed under memory pressure when their amount is below
vm.clean_min_kbytes.

Hard protection of clean file pages may be used to avoid high latency and
prevent livelock in near-OOM conditions.

Setting it to a high value may result in a early out-of-memory condition
due to the inability to reclaim the protected amount of clean file pages
when other types of pages cannot be reclaimed.

config HAVE_MEMBLOCK_PHYS_MAP
bool

Expand Down
69 changes: 69 additions & 0 deletions mm/vmscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,19 @@ struct scan_control {
/* The file pages on the current node are dangerously low */
unsigned int file_is_tiny:1;

/*
* The clean file pages on the current node won't be reclaimed when
* their amount is below vm.clean_low_kbytes *unless* we threaten
* to OOM or have no free swap space or vm.swappiness=0.
*/
unsigned int clean_below_low:1;

/*
* The clean file pages on the current node won't be reclaimed when
* their amount is below vm.clean_min_kbytes.
*/
unsigned int clean_below_min:1;

/* Allocation order */
s8 order;

Expand Down Expand Up @@ -169,6 +182,17 @@ struct scan_control {
#define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
#endif

#if CONFIG_CLEAN_LOW_KBYTES < 0
#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
#endif

#if CONFIG_CLEAN_MIN_KBYTES < 0
#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
#endif

unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;

/*
* From 0 .. 200. Higher means more swappy.
*/
Expand Down Expand Up @@ -2333,6 +2357,34 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc)
file + free <= total_high_wmark &&
!(sc->may_deactivate & DEACTIVATE_ANON) &&
anon >> sc->priority;

/*
* Check the number of clean file pages to protect them from
* reclaiming if their amount is below the specified.
*/
if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
unsigned long reclaimable_file, dirty, clean;

reclaimable_file =
node_page_state(pgdat, NR_ACTIVE_FILE) +
node_page_state(pgdat, NR_INACTIVE_FILE) +
node_page_state(pgdat, NR_ISOLATED_FILE);
dirty = node_page_state(pgdat, NR_FILE_DIRTY);
/*
* node_page_state() sum can go out of sync since
* all the values are not read at once.
*/
if (likely(reclaimable_file > dirty))
clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10);
else
clean = 0;

sc->clean_below_low = clean < sysctl_clean_low_kbytes;
sc->clean_below_min = clean < sysctl_clean_min_kbytes;
} else {
sc->clean_below_low = false;
sc->clean_below_min = false;
}
}
}

Expand Down Expand Up @@ -2393,6 +2445,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
goto out;
}

/*
* Force-scan anon if clean file pages is under vm.clean_min_kbytes
* or vm.clean_low_kbytes (unless the swappiness setting
* disagrees with swapping).
*/
if ((sc->clean_below_low || sc->clean_below_min) && swappiness) {
scan_balance = SCAN_ANON;
goto out;
}

/*
* If there is enough inactive page cache, we do not reclaim
* anything from the anonymous working right now.
Expand Down Expand Up @@ -2529,6 +2591,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
BUG();
}

/*
* Don't reclaim clean file pages when their amount is below
* vm.clean_min_kbytes.
*/
if (file && sc->clean_below_min)
scan = 0;

nr[lru] = scan;
}
}
Expand Down

0 comments on commit d2aeab6

Please sign in to comment.