Skip to content

Commit

Permalink
mm, treewide: redefine MAX_ORDER sanely
Browse files Browse the repository at this point in the history
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
  • Loading branch information
kiryl authored and akpm00 committed Apr 6, 2023
1 parent 61883d3 commit 23baf83
Show file tree
Hide file tree
Showing 84 changed files with 223 additions and 253 deletions.
6 changes: 3 additions & 3 deletions Documentation/admin-guide/kdump/vmcoreinfo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ variables.
Offset of the free_list's member. This value is used to compute the number
of free pages.

Each zone has a free_area structure array called free_area[MAX_ORDER].
Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
The free_list represents a linked list of free page blocks.

(list_head, next|prev)
Expand All @@ -189,8 +189,8 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
information. Makedumpfile gets the start address of the vmalloc region
from this.

(zone.free_area, MAX_ORDER)
---------------------------
(zone.free_area, MAX_ORDER + 1)
-------------------------------

Free areas descriptor. User-space tools use this value to iterate the
free_area ranges. MAX_ORDER is used by the zone buddy allocator.
Expand Down
2 changes: 1 addition & 1 deletion Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3969,7 +3969,7 @@
[KNL] Minimal page reporting order
Format: <integer>
Adjust the minimal page reporting order. The page
reporting is disabled when it exceeds (MAX_ORDER-1).
reporting is disabled when it exceeds MAX_ORDER.

panic= [KNL] Kernel behaviour on panic: delay <timeout>
timeout > 0: seconds before rebooting
Expand Down
4 changes: 2 additions & 2 deletions arch/arc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ endmenu # "ARC Architecture Configuration"

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if ARC_HUGEPAGE_16M
default "11"
default "11" if ARC_HUGEPAGE_16M
default "10"

source "kernel/power/Kconfig"
9 changes: 3 additions & 6 deletions arch/arm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1355,9 +1355,9 @@ config ARM_MODULE_PLTS

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if SOC_AM33XX
default "9" if SA1111
default "11"
default "11" if SOC_AM33XX
default "8" if SA1111
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -1366,9 +1366,6 @@ config ARCH_FORCE_MAX_ORDER
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

config ALIGNMENT_TRAP
def_bool CPU_CP15_MMU
select HAVE_PROC_CPU if PROC_FS
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/imx_v6_v7_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
CONFIG_SMP=y
CONFIG_ARM_PSCI=y
CONFIG_HIGHMEM=y
CONFIG_ARCH_FORCE_MAX_ORDER=14
CONFIG_ARCH_FORCE_MAX_ORDER=13
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/milbeaut_m10v_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
# CONFIG_ARM_PATCH_IDIV is not set
CONFIG_HIGHMEM=y
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=11
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
CONFIG_EFI=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/oxnas_v6_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
CONFIG_MACH_OX820=y
CONFIG_SMP=y
CONFIG_NR_CPUS=16
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=11
CONFIG_SECCOMP=y
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/pxa_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ CONFIG_PXA_SHARPSL=y
CONFIG_MACH_AKITA=y
CONFIG_MACH_BORZOI=y
CONFIG_AEABI=y
CONFIG_ARCH_FORCE_MAX_ORDER=9
CONFIG_ARCH_FORCE_MAX_ORDER=8
CONFIG_CMDLINE="root=/dev/ram0 ro"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/sama7_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
# CONFIG_CACHE_L2X0 is not set
# CONFIG_ARM_PATCH_IDIV is not set
# CONFIG_CPU_SW_DOMAIN_PAN is not set
CONFIG_ARCH_FORCE_MAX_ORDER=15
CONFIG_ARCH_FORCE_MAX_ORDER=14
CONFIG_UACCESS_WITH_MEMCPY=y
# CONFIG_ATAGS is not set
CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/sp7021_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ CONFIG_ARCH_SUNPLUS=y
# CONFIG_VDSO is not set
CONFIG_SMP=y
CONFIG_THUMB2_KERNEL=y
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=11
CONFIG_VFP=y
CONFIG_NEON=y
CONFIG_MODULES=y
Expand Down
27 changes: 12 additions & 15 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1476,22 +1476,22 @@ config XEN

# include/linux/mmzone.h requires the following to be true:
#
# MAX_ORDER - 1 + PAGE_SHIFT <= SECTION_SIZE_BITS
# MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
#
# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS + 1 - PAGE_SHIFT:
# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS - PAGE_SHIFT:
#
# | SECTION_SIZE_BITS | PAGE_SHIFT | max MAX_ORDER | default MAX_ORDER |
# ----+-------------------+--------------+-----------------+--------------------+
# 4K | 27 | 12 | 16 | 11 |
# 16K | 27 | 14 | 14 | 12 |
# 64K | 29 | 16 | 14 | 14 |
# 4K | 27 | 12 | 15 | 10 |
# 16K | 27 | 14 | 13 | 11 |
# 64K | 29 | 16 | 13 | 13 |
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order" if ARM64_4K_PAGES || ARM64_16K_PAGES
default "14" if ARM64_64K_PAGES
range 12 14 if ARM64_16K_PAGES
default "12" if ARM64_16K_PAGES
range 11 16 if ARM64_4K_PAGES
default "11"
default "13" if ARM64_64K_PAGES
range 11 13 if ARM64_16K_PAGES
default "11" if ARM64_16K_PAGES
range 10 15 if ARM64_4K_PAGES
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -1500,14 +1500,11 @@ config ARCH_FORCE_MAX_ORDER
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

We make sure that we can allocate up to a HugePage size for each configuration.
Hence we have :
MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
MAX_ORDER = PMD_SHIFT - PAGE_SHIFT => PAGE_SHIFT - 3

However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
However for 4K, we choose a higher default value, 10 as opposed to 9, giving us
4M allocations matching the default size used by generic code.

config UNMAP_KERNEL_AT_EL0
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/sparsemem.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
/*
* Section size must be at least 512MB for 64K base
* page size config. Otherwise it will be less than
* (MAX_ORDER - 1) and the build process will fail.
* MAX_ORDER and the build process will fail.
*/
#ifdef CONFIG_ARM64_64K_PAGES
#define SECTION_SIZE_BITS 29
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kvm/hyp/include/nvhe/gfp.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ struct hyp_pool {
* API at EL2.
*/
hyp_spinlock_t lock;
struct list_head free_area[MAX_ORDER];
struct list_head free_area[MAX_ORDER + 1];
phys_addr_t range_start;
phys_addr_t range_end;
unsigned short max_order;
Expand Down
10 changes: 5 additions & 5 deletions arch/arm64/kvm/hyp/nvhe/page_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
* after coalescing, so make sure to mark it HYP_NO_ORDER proactively.
*/
p->order = HYP_NO_ORDER;
for (; (order + 1) < pool->max_order; order++) {
for (; (order + 1) <= pool->max_order; order++) {
buddy = __find_buddy_avail(pool, p, order);
if (!buddy)
break;
Expand Down Expand Up @@ -203,9 +203,9 @@ void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
hyp_spin_lock(&pool->lock);

/* Look for a high-enough-order page */
while (i < pool->max_order && list_empty(&pool->free_area[i]))
while (i <= pool->max_order && list_empty(&pool->free_area[i]))
i++;
if (i >= pool->max_order) {
if (i > pool->max_order) {
hyp_spin_unlock(&pool->lock);
return NULL;
}
Expand All @@ -228,8 +228,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
int i;

hyp_spin_lock_init(&pool->lock);
pool->max_order = min(MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
for (i = 0; i < pool->max_order; i++)
pool->max_order = min(MAX_ORDER, get_order(nr_pages << PAGE_SHIFT));
for (i = 0; i <= pool->max_order; i++)
INIT_LIST_HEAD(&pool->free_area[i]);
pool->range_start = phys;
pool->range_end = phys + (nr_pages << PAGE_SHIFT);
Expand Down
2 changes: 1 addition & 1 deletion arch/csky/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ config HIGHMEM

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "11"
default "10"

config DRAM_BASE
hex "DRAM start addr (the same with memory-section in dts)"
Expand Down
8 changes: 4 additions & 4 deletions arch/ia64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -202,10 +202,10 @@ config IA64_CYCLONE
If you're unsure, answer N.

config ARCH_FORCE_MAX_ORDER
int "MAX_ORDER (11 - 17)" if !HUGETLB_PAGE
range 11 17 if !HUGETLB_PAGE
default "17" if HUGETLB_PAGE
default "11"
int "MAX_ORDER (10 - 16)" if !HUGETLB_PAGE
range 10 16 if !HUGETLB_PAGE
default "16" if HUGETLB_PAGE
default "10"

config SMP
bool "Symmetric multi-processing support"
Expand Down
4 changes: 2 additions & 2 deletions arch/ia64/include/asm/sparsemem.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
#define SECTION_SIZE_BITS (30)
#define MAX_PHYSMEM_BITS (50)
#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
#if (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT > SECTION_SIZE_BITS)
#undef SECTION_SIZE_BITS
#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
#endif
#endif

Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ static int __init hugetlb_setup_sz(char *str)
size = memparse(str, &str);
if (*str || !is_power_of_2(size) || !(tr_pages & size) ||
size <= PAGE_SIZE ||
size >= (1UL << PAGE_SHIFT << MAX_ORDER)) {
size > (1UL << PAGE_SHIFT << MAX_ORDER)) {
printk(KERN_WARNING "Invalid huge page size specified\n");
return 1;
}
Expand Down
15 changes: 6 additions & 9 deletions arch/loongarch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -420,12 +420,12 @@ config NODES_SHIFT

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 14 64 if PAGE_SIZE_64KB
default "14" if PAGE_SIZE_64KB
range 12 64 if PAGE_SIZE_16KB
default "12" if PAGE_SIZE_16KB
range 11 64
default "11"
range 13 63 if PAGE_SIZE_64KB
default "13" if PAGE_SIZE_64KB
range 11 63 if PAGE_SIZE_16KB
default "11" if PAGE_SIZE_16KB
range 10 63
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -434,9 +434,6 @@ config ARCH_FORCE_MAX_ORDER
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

The page size is not necessarily 4KB. Keep this in mind
when choosing a value for this option.

Expand Down
5 changes: 1 addition & 4 deletions arch/m68k/Kconfig.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ config SINGLE_MEMORY_CHUNK
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order" if ADVANCED
depends on !SINGLE_MEMORY_CHUNK
default "11"
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -413,9 +413,6 @@ config ARCH_FORCE_MAX_ORDER
value also defines the minimal size of the hole that allows
freeing unused memory map.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

config 060_WRITETHROUGH
bool "Use write-through caching for 68060 supervisor accesses"
depends on ADVANCED && M68060
Expand Down
19 changes: 8 additions & 11 deletions arch/mips/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -2137,14 +2137,14 @@ endchoice

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
range 13 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
range 12 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
range 0 64
default "11"
range 13 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
range 12 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
range 11 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
default "11" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
range 0 63
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -2153,9 +2153,6 @@ config ARCH_FORCE_MAX_ORDER
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

The page size is not necessarily 4KB. Keep this in mind
when choosing a value for this option.

Expand Down
7 changes: 2 additions & 5 deletions arch/nios2/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ source "kernel/Kconfig.hz"

config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 9 20
default "11"
range 8 19
default "10"
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
Expand All @@ -56,9 +56,6 @@ config ARCH_FORCE_MAX_ORDER
blocks of physically contiguous memory, then you may need to
increase this value.

This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.

endmenu

source "arch/nios2/platform/Kconfig.platform"
Expand Down
Loading

0 comments on commit 23baf83

Please sign in to comment.