- Required
Set
CONFIG_LRU_GEN=y
.- Optional
Set
CONFIG_LRU_GEN_ENABLED=y
to turn the feature on by default.
- Required
Write
1
to/sys/kernel/mm/lru_gen/enable
if the feature was not turned on by default.- Optional
Write
N
to/sys/kernel/mm/lru_gen/min_ttl_ms
to protect the working set ofN
milliseconds. The OOM killer is invoked if this working set cannot be kept in memory.- Optional
Read
/sys/kernel/debug/lru_gen
to confirm the feature is turned on. This file has the following output:
memcg memcg_id memcg_path
node node_id
min_gen birth_time anon_size file_size
...
max_gen birth_time anon_size file_size
min_gen
is the oldest generation number and max_gen
is the youngest generation number. birth_time
is in milliseconds. anon_size
and file_size
are in pages.
No additional configurations required.
- To support more generations
Change
CONFIG_NR_LRU_GENS
to a larger number.- To support more tiers
Change
CONFIG_TIERS_PER_GEN
to a larger number.- To support full stats
Set
CONFIG_LRU_GEN_STATS=y
.- Working set estimation
Write
+ memcg_id node_id max_gen [swappiness]
to/sys/kernel/debug/lru_gen
to invoke the aging, which scans PTEs for accessed pages and then creates the next generationmax_gen+1
. A swap file and a non-zeroswappiness
, which overridesvm.swappiness
, are required to scan PTEs mapping anon pages.- Proactive reclaim
Write
- memcg_id node_id min_gen [swappiness] [nr_to_reclaim]
to/sys/kernel/debug/lru_gen
to invoke the eviction, which evicts generations less than or equal tomin_gen
.min_gen
should be less thanmax_gen-1
asmax_gen
andmax_gen-1
are not fully aged and therefore cannot be evicted.nr_to_reclaim
can be used to limit the number of pages to evict. Multiple command lines are supported, so does concatenation with delimiters,
and;
.
For each lruvec
, evictable pages are divided into multiple generations. The youngest generation number is stored in lrugen->max_seq
for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[2]
separately for anon and file types as clean file pages can be evicted regardless of swap and writeback constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(CONFIG_NR_LRU_GENS+1)
bits in order to fit into page->flags
. The sliding window technique is used to prevent truncated generation numbers from overlapping. Each truncated generation number is an index to an array of per-type and per-zone lists lrugen->lists
.
Each generation is then divided into multiple tiers. Tiers represent levels of usage from file descriptors only. Pages accessed N
times via file descriptors belong to tier order_base_2(N)
. Each generation contains at most CONFIG_TIERS_PER_GEN
tiers, and they require additional CONFIG_TIERS_PER_GEN-2
bits in page->flags
. In contrast to moving across generations which requires list operations, moving across tiers only involves operations on page->flags
and therefore has a negligible cost. A feedback loop modeled after the PID controller monitors refault rates of all tiers and decides when to protect pages from which tiers.
The framework comprises two conceptually independent components: the aging and the eviction, which can be invoked separately from user space for the purpose of working set estimation and proactive reclaim.
The aging produces young generations. Given an lruvec
, the aging traverses lruvec_memcg()->mm_list
and calls walk_page_range()
to scan PTEs for accessed pages (a mm_struct
list is maintained for each memcg
). Upon finding one, the aging updates its generation number to max_seq
(modulo CONFIG_NR_LRU_GENS
). After each round of traversal, the aging increments max_seq
. The aging is due when both min_seq[2]
have caught up with max_seq-1
.
The eviction consumes old generations. Given an lruvec
, the eviction scans pages on the per-zone lists indexed by anon and file min_seq[2]
(modulo CONFIG_NR_LRU_GENS
). It first tries to select a type based on the values of min_seq[2]
. If they are equal, it selects the type that has a lower refault rate. The eviction sorts a page according to its updated generation number if the aging has found this page accessed. It also moves a page to the next generation if this page is from an upper tier that has a higher refault rate than the base tier. The eviction increments min_seq[2]
of a selected type when it finds all the per-zone lists indexed by min_seq[2]
of this selected type are empty.
Support shadow page table walk.
Optimize page table walk for NUMA.