Skip to content

Latest commit

 

History

History
134 lines (112 loc) · 5.2 KB

multigen_lru.rst

File metadata and controls

134 lines (112 loc) · 5.2 KB

Multigenerational LRU

Quick Start

Build Configurations

Required

Set CONFIG_LRU_GEN=y.

Optional

Set CONFIG_LRU_GEN_ENABLED=y to turn the feature on by default.

Runtime Configurations

Required

Write 1 to /sys/kernel/mm/lru_gen/enable if the feature was not turned on by default.

Optional

Write N to /sys/kernel/mm/lru_gen/min_ttl_ms to protect the working set of N milliseconds. The OOM killer is invoked if this working set cannot be kept in memory.

Optional

Read /sys/kernel/debug/lru_gen to confirm the feature is turned on. This file has the following output:

memcg  memcg_id  memcg_path
  node  node_id
    min_gen  birth_time  anon_size  file_size
    ...
    max_gen  birth_time  anon_size  file_size

min_gen is the oldest generation number and max_gen is the youngest generation number. birth_time is in milliseconds. anon_size and file_size are in pages.

Phones/Laptops/Workstations

No additional configurations required.

Servers/Data Centers

To support more generations

Change CONFIG_NR_LRU_GENS to a larger number.

To support more tiers

Change CONFIG_TIERS_PER_GEN to a larger number.

To support full stats

Set CONFIG_LRU_GEN_STATS=y.

Working set estimation

Write + memcg_id node_id max_gen [swappiness] to /sys/kernel/debug/lru_gen to invoke the aging, which scans PTEs for accessed pages and then creates the next generation max_gen+1. A swap file and a non-zero swappiness, which overrides vm.swappiness, are required to scan PTEs mapping anon pages.

Proactive reclaim

Write - memcg_id node_id min_gen [swappiness] [nr_to_reclaim] to /sys/kernel/debug/lru_gen to invoke the eviction, which evicts generations less than or equal to min_gen. min_gen should be less than max_gen-1 as max_gen and max_gen-1 are not fully aged and therefore cannot be evicted. nr_to_reclaim can be used to limit the number of pages to evict. Multiple command lines are supported, so does concatenation with delimiters , and ;.

Framework

For each lruvec, evictable pages are divided into multiple generations. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[2] separately for anon and file types as clean file pages can be evicted regardless of swap and writeback constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(CONFIG_NR_LRU_GENS+1) bits in order to fit into page->flags. The sliding window technique is used to prevent truncated generation numbers from overlapping. Each truncated generation number is an index to an array of per-type and per-zone lists lrugen->lists.

Each generation is then divided into multiple tiers. Tiers represent levels of usage from file descriptors only. Pages accessed N times via file descriptors belong to tier order_base_2(N). Each generation contains at most CONFIG_TIERS_PER_GEN tiers, and they require additional CONFIG_TIERS_PER_GEN-2 bits in page->flags. In contrast to moving across generations which requires list operations, moving across tiers only involves operations on page->flags and therefore has a negligible cost. A feedback loop modeled after the PID controller monitors refault rates of all tiers and decides when to protect pages from which tiers.

The framework comprises two conceptually independent components: the aging and the eviction, which can be invoked separately from user space for the purpose of working set estimation and proactive reclaim.

Aging

The aging produces young generations. Given an lruvec, the aging traverses lruvec_memcg()->mm_list and calls walk_page_range() to scan PTEs for accessed pages (a mm_struct list is maintained for each memcg). Upon finding one, the aging updates its generation number to max_seq (modulo CONFIG_NR_LRU_GENS). After each round of traversal, the aging increments max_seq. The aging is due when both min_seq[2] have caught up with max_seq-1.

Eviction

The eviction consumes old generations. Given an lruvec, the eviction scans pages on the per-zone lists indexed by anon and file min_seq[2] (modulo CONFIG_NR_LRU_GENS). It first tries to select a type based on the values of min_seq[2]. If they are equal, it selects the type that has a lower refault rate. The eviction sorts a page according to its updated generation number if the aging has found this page accessed. It also moves a page to the next generation if this page is from an upper tier that has a higher refault rate than the base tier. The eviction increments min_seq[2] of a selected type when it finds all the per-zone lists indexed by min_seq[2] of this selected type are empty.

To-do List

KVM Optimization

Support shadow page table walk.

NUMA Optimization

Optimize page table walk for NUMA.