Skip to content

Commit d82d3bf

Browse files
kevin-brodsky-armakpm00
authored andcommitted
mm: pass mm down to pagetable_{pte,pmd}_ctor
Patch series "Always call constructor for kernel page tables", v2. There has been much confusion around exactly when page table constructors/destructors (pagetable_*_[cd]tor) are supposed to be called. They were initially introduced for user PTEs only (to support split page table locks), then at the PMD level for the same purpose. Accounting was added later on, starting at the PTE level and then moving to higher levels (PMD, PUD). Finally, with my earlier series "Account page tables at all levels" [1], the ctor/dtor is run for all levels, all the way to PGD. I thought this was the end of the story, and it hopefully is for user pgtables, but I was wrong for what concerns kernel pgtables. The current situation there makes very little sense: * At the PTE level, the ctor/dtor is not called (at least in the generic implementation). Specific helpers are used for kernel pgtables at this level (pte_{alloc,free}_kernel()) and those have never called the ctor/dtor, most likely because they were initially irrelevant in the kernel case. * At all other levels, the ctor/dtor is normally called. This is potentially wasteful at the PMD level (more on that later). This series aims to ensure that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. Besides consistency, the main motivation is to guarantee that ctor/dtor hooks are systematically called; this makes it possible to insert hooks to protect page tables [2], for instance. There is however an extra challenge: split locks are not used for kernel pgtables, and it would therefore be wasteful to initialise them (ptlock_init()). It is worth clarifying exactly when split locks are used. They clearly are for user pgtables, but as illustrated in commit 61444cd ("ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations"), they also are for special page tables like efi_mm. The one case where split locks are definitely unused is pgtables owned by init_mm; this is consistent with the behaviour of apply_to_pte_range(). The approach chosen in this series is therefore to pass the mm associated to the pgtables being constructed to pagetable_{pte,pmd}_ctor() (patch 1), and skip ptlock_init() if mm == &init_mm (patch 3 and 7). This makes it possible to call the PTE ctor/dtor from pte_{alloc,free}_kernel() without unintended consequences (patch 3). As a result the accounting functions are now called at all levels for kernel pgtables, and split locks are never initialised. In configurations where ptlocks are dynamically allocated (32-bit, PREEMPT_RT, etc.) and ARCH_ENABLE_SPLIT_PMD_PTLOCK is selected, this series results in the removal of a kmem_cache allocation for every kernel PMD. Additionally, for certain architectures that do not use <asm-generic/pgalloc.h> such as s390, the same optimisation occurs at the PTE level. === Things get more complicated when it comes to special pgtable allocators (patch 8-12). All architectures need such allocators to create initial kernel pgtables; we are not concerned with those as the ctor cannot be called so early in the boot sequence. However, those allocators may also be used later in the boot sequence or during normal operations. There are two main use-cases: 1. Mapping EFI memory: efi_mm (arm, arm64, riscv) 2. arch_add_memory(): init_mm The ctor is already explicitly run (at the PTE/PMD level) in the first case, as required for pgtables that are not associated with init_mm. However the same allocators may also be used for the second use-case (or others), and this is where it gets messy. Patch 1 calls the ctor with NULL as mm in those situations, as the actual mm isn't available. Practically this means that ptlocks will be unconditionally initialised. This is fine on arm - create_mapping_late() is only used for the EFI mapping. On arm64, __create_pgd_mapping() is also used by arch_add_memory(); patch 8/9/11 ensure that ctors are called at all levels with the appropriate mm. The situation is similar on riscv, but propagating the mm down to the ctor would require significant refactoring. Since they are already called unconditionally, this series leaves riscv no worse off - patch 10 adds comments to clarify the situation. From a cursory look at other architectures implementing arch_add_memory(), s390 and x86 may also need a similar treatment to add constructor calls. This is to be taken care of in a future version or as a follow-up. === The complications in those special pgtable allocators beg the question: does it really make sense to treat efi_mm and init_mm differently in e.g. apply_to_pte_range()? Maybe what we really need is a way to tell if an mm corresponds to user memory or not, and never use split locks for non-user mm's. Feedback and suggestions welcome! This patch (of 12): In preparation for calling constructors for all kernel page tables while eliding unnecessary ptlock initialisation, let's pass down the associated mm to the PTE/PMD level ctors. (These are the two levels where ptlocks are used.) In most cases the mm is already around at the point of calling the ctor so we simply pass it down. This is however not the case for special page table allocators: * arch/arm/mm/mmu.c * arch/arm64/mm/mmu.c * arch/riscv/mm/init.c In those cases, the page tables being allocated are either for standard kernel memory (init_mm) or special page directories, which may not be associated to any mm. For now let's pass NULL as mm; this will be refined where possible in future patches. No functional change in this patch. Link: https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/ [1] Link: https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@arm.com/ [2] Link: https://lkml.kernel.org/r/20250408095222.860601-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20250408095222.860601-2-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: <x86@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 24c76f3 commit d82d3bf

File tree

18 files changed

+30
-28
lines changed

18 files changed

+30
-28
lines changed

arch/arm/mm/mmu.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -735,7 +735,7 @@ static void *__init late_alloc(unsigned long sz)
735735
void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM,
736736
get_order(sz));
737737

738-
if (!ptdesc || !pagetable_pte_ctor(ptdesc))
738+
if (!ptdesc || !pagetable_pte_ctor(NULL, ptdesc))
739739
BUG();
740740
return ptdesc_to_virt(ptdesc);
741741
}

arch/arm64/mm/mmu.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -494,9 +494,9 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
494494
* folded, and if so pagetable_pte_ctor() becomes nop.
495495
*/
496496
if (shift == PAGE_SHIFT)
497-
BUG_ON(!pagetable_pte_ctor(ptdesc));
497+
BUG_ON(!pagetable_pte_ctor(NULL, ptdesc));
498498
else if (shift == PMD_SHIFT)
499-
BUG_ON(!pagetable_pmd_ctor(ptdesc));
499+
BUG_ON(!pagetable_pmd_ctor(NULL, ptdesc));
500500

501501
return pa;
502502
}

arch/loongarch/include/asm/pgalloc.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
6969
if (!ptdesc)
7070
return NULL;
7171

72-
if (!pagetable_pmd_ctor(ptdesc)) {
72+
if (!pagetable_pmd_ctor(mm, ptdesc)) {
7373
pagetable_free(ptdesc);
7474
return NULL;
7575
}

arch/m68k/include/asm/mcf_pgalloc.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
4848

4949
if (!ptdesc)
5050
return NULL;
51-
if (!pagetable_pte_ctor(ptdesc)) {
51+
if (!pagetable_pte_ctor(mm, ptdesc)) {
5252
pagetable_free(ptdesc);
5353
return NULL;
5454
}

arch/m68k/include/asm/motorola_pgalloc.h

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ enum m68k_table_types {
1515
};
1616

1717
extern void init_pointer_table(void *table, int type);
18-
extern void *get_pointer_table(int type);
18+
extern void *get_pointer_table(struct mm_struct *mm, int type);
1919
extern int free_pointer_table(void *table, int type);
2020

2121
/*
@@ -26,7 +26,7 @@ extern int free_pointer_table(void *table, int type);
2626

2727
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
2828
{
29-
return get_pointer_table(TABLE_PTE);
29+
return get_pointer_table(mm, TABLE_PTE);
3030
}
3131

3232
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -36,7 +36,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
3636

3737
static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
3838
{
39-
return get_pointer_table(TABLE_PTE);
39+
return get_pointer_table(mm, TABLE_PTE);
4040
}
4141

4242
static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
@@ -53,7 +53,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
5353

5454
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
5555
{
56-
return get_pointer_table(TABLE_PMD);
56+
return get_pointer_table(mm, TABLE_PMD);
5757
}
5858

5959
static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd)
@@ -75,7 +75,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
7575

7676
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
7777
{
78-
return get_pointer_table(TABLE_PGD);
78+
return get_pointer_table(mm, TABLE_PGD);
7979
}
8080

8181

arch/m68k/mm/motorola.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ void __init init_pointer_table(void *table, int type)
139139
return;
140140
}
141141

142-
void *get_pointer_table(int type)
142+
void *get_pointer_table(struct mm_struct *mm, int type)
143143
{
144144
ptable_desc *dp = ptable_list[type].next;
145145
unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
@@ -164,10 +164,10 @@ void *get_pointer_table(int type)
164164
* m68k doesn't have SPLIT_PTE_PTLOCKS for not having
165165
* SMP.
166166
*/
167-
pagetable_pte_ctor(virt_to_ptdesc(page));
167+
pagetable_pte_ctor(mm, virt_to_ptdesc(page));
168168
break;
169169
case TABLE_PMD:
170-
pagetable_pmd_ctor(virt_to_ptdesc(page));
170+
pagetable_pmd_ctor(mm, virt_to_ptdesc(page));
171171
break;
172172
case TABLE_PGD:
173173
pagetable_pgd_ctor(virt_to_ptdesc(page));

arch/mips/include/asm/pgalloc.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
6262
if (!ptdesc)
6363
return NULL;
6464

65-
if (!pagetable_pmd_ctor(ptdesc)) {
65+
if (!pagetable_pmd_ctor(mm, ptdesc)) {
6666
pagetable_free(ptdesc);
6767
return NULL;
6868
}

arch/parisc/include/asm/pgalloc.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
3939
ptdesc = pagetable_alloc(gfp, PMD_TABLE_ORDER);
4040
if (!ptdesc)
4141
return NULL;
42-
if (!pagetable_pmd_ctor(ptdesc)) {
42+
if (!pagetable_pmd_ctor(mm, ptdesc)) {
4343
pagetable_free(ptdesc);
4444
return NULL;
4545
}

arch/powerpc/mm/book3s64/pgtable.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
417417
ptdesc = pagetable_alloc(gfp, 0);
418418
if (!ptdesc)
419419
return NULL;
420-
if (!pagetable_pmd_ctor(ptdesc)) {
420+
if (!pagetable_pmd_ctor(mm, ptdesc)) {
421421
pagetable_free(ptdesc);
422422
return NULL;
423423
}

arch/powerpc/mm/pgtable-frag.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
6161
ptdesc = pagetable_alloc(PGALLOC_GFP | __GFP_ACCOUNT, 0);
6262
if (!ptdesc)
6363
return NULL;
64-
if (!pagetable_pte_ctor(ptdesc)) {
64+
if (!pagetable_pte_ctor(mm, ptdesc)) {
6565
pagetable_free(ptdesc);
6666
return NULL;
6767
}

arch/riscv/mm/init.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
442442
{
443443
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
444444

445-
BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
445+
BUG_ON(!ptdesc || !pagetable_pte_ctor(NULL, ptdesc));
446446
return __pa((pte_t *)ptdesc_address(ptdesc));
447447
}
448448

@@ -522,7 +522,7 @@ static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
522522
{
523523
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
524524

525-
BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
525+
BUG_ON(!ptdesc || !pagetable_pmd_ctor(NULL, ptdesc));
526526
return __pa((pmd_t *)ptdesc_address(ptdesc));
527527
}
528528

arch/s390/include/asm/pgalloc.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
9797
if (!table)
9898
return NULL;
9999
crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
100-
if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
100+
if (!pagetable_pmd_ctor(mm, virt_to_ptdesc(table))) {
101101
crst_table_free(mm, table);
102102
return NULL;
103103
}

arch/s390/mm/pgalloc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
145145
ptdesc = pagetable_alloc(GFP_KERNEL, 0);
146146
if (!ptdesc)
147147
return NULL;
148-
if (!pagetable_pte_ctor(ptdesc)) {
148+
if (!pagetable_pte_ctor(mm, ptdesc)) {
149149
pagetable_free(ptdesc);
150150
return NULL;
151151
}

arch/sparc/mm/init_64.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2895,7 +2895,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
28952895

28962896
if (!ptdesc)
28972897
return NULL;
2898-
if (!pagetable_pte_ctor(ptdesc)) {
2898+
if (!pagetable_pte_ctor(mm, ptdesc)) {
28992899
pagetable_free(ptdesc);
29002900
return NULL;
29012901
}

arch/sparc/mm/srmmu.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -350,7 +350,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
350350
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
351351
spin_lock(&mm->page_table_lock);
352352
if (page_ref_inc_return(page) == 2 &&
353-
!pagetable_pte_ctor(page_ptdesc(page))) {
353+
!pagetable_pte_ctor(mm, page_ptdesc(page))) {
354354
page_ref_dec(page);
355355
ptep = NULL;
356356
}

arch/x86/mm/pgtable.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
205205

206206
if (!ptdesc)
207207
failed = true;
208-
if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
208+
if (ptdesc && !pagetable_pmd_ctor(mm, ptdesc)) {
209209
pagetable_free(ptdesc);
210210
ptdesc = NULL;
211211
failed = true;

include/asm-generic/pgalloc.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ static inline pgtable_t __pte_alloc_one_noprof(struct mm_struct *mm, gfp_t gfp)
7070
ptdesc = pagetable_alloc_noprof(gfp, 0);
7171
if (!ptdesc)
7272
return NULL;
73-
if (!pagetable_pte_ctor(ptdesc)) {
73+
if (!pagetable_pte_ctor(mm, ptdesc)) {
7474
pagetable_free(ptdesc);
7575
return NULL;
7676
}
@@ -137,7 +137,7 @@ static inline pmd_t *pmd_alloc_one_noprof(struct mm_struct *mm, unsigned long ad
137137
ptdesc = pagetable_alloc_noprof(gfp, 0);
138138
if (!ptdesc)
139139
return NULL;
140-
if (!pagetable_pmd_ctor(ptdesc)) {
140+
if (!pagetable_pmd_ctor(mm, ptdesc)) {
141141
pagetable_free(ptdesc);
142142
return NULL;
143143
}

include/linux/mm.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3147,7 +3147,8 @@ static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
31473147
pagetable_free(ptdesc);
31483148
}
31493149

3150-
static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
3150+
static inline bool pagetable_pte_ctor(struct mm_struct *mm,
3151+
struct ptdesc *ptdesc)
31513152
{
31523153
if (!ptlock_init(ptdesc))
31533154
return false;
@@ -3253,7 +3254,8 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
32533254
return ptl;
32543255
}
32553256

3256-
static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
3257+
static inline bool pagetable_pmd_ctor(struct mm_struct *mm,
3258+
struct ptdesc *ptdesc)
32573259
{
32583260
if (!pmd_ptlock_init(ptdesc))
32593261
return false;

0 commit comments

Comments
 (0)