Skip to content

Commit 6471384

Browse files
ramosian-glidertorvalds
authored andcommitted
mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Patch series "add init_on_alloc/init_on_free boot options", v10. Provide init_on_alloc and init_on_free boot options. These are aimed at preventing possible information leaks and making the control-flow bugs that depend on uninitialized values more deterministic. Enabling either of the options guarantees that the memory returned by the page allocator and SL[AU]B is initialized with zeroes. SLOB allocator isn't supported at the moment, as its emulation of kmem caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly. Enabling init_on_free also guarantees that pages and heap objects are initialized right after they're freed, so it won't be possible to access stale data by using a dangling pointer. As suggested by Michal Hocko, right now we don't let the heap users to disable initialization for certain allocations. There's not enough evidence that doing so can speed up real-life cases, and introducing ways to opt-out may result in things going out of control. This patch (of 2): The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. This is expected to be on-by-default on Android and Chrome OS. And it gives the opportunity for anyone else to use it under distros too via the boot args. (The init_on_free feature is regularly requested by folks where memory forensics is included in their threat models.) init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. If either SLUB poisoning or page poisoning is enabled, those options take precedence over init_on_alloc and init_on_free: initialization is only applied to unpoisoned allocations. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructure for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. [glider@google.com: v8] Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com [glider@google.com: v9] Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com [glider@google.com: v10] Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts Acked-by: James Morris <jamorris@linux.microsoft.com>] Cc: Christoph Lameter <cl@linux.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Sandeep Patil <sspatil@android.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent ba5c5e4 commit 6471384

File tree

10 files changed

+199
-18
lines changed

10 files changed

+199
-18
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1668,6 +1668,15 @@
16681668

16691669
initrd= [BOOT] Specify the location of the initial ramdisk
16701670

1671+
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
1672+
zeroes.
1673+
Format: 0 | 1
1674+
Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
1675+
1676+
init_on_free= [MM] Fill freed pages and heap objects with zeroes.
1677+
Format: 0 | 1
1678+
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
1679+
16711680
init_pkru= [x86] Specify the default memory protection keys rights
16721681
register contents for all processes. 0x55555554 by
16731682
default (disallow access to all but pkey 0). Can

drivers/infiniband/core/uverbs_ioctl.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
127127
res = (void *)pbundle->internal_buffer + pbundle->internal_used;
128128
pbundle->internal_used =
129129
ALIGN(new_used, sizeof(*pbundle->internal_buffer));
130-
if (flags & __GFP_ZERO)
130+
if (want_init_on_alloc(flags))
131131
memset(res, 0, size);
132132
return res;
133133
}

include/linux/mm.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2700,6 +2700,30 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
27002700
int enable) { }
27012701
#endif
27022702

2703+
#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
2704+
DECLARE_STATIC_KEY_TRUE(init_on_alloc);
2705+
#else
2706+
DECLARE_STATIC_KEY_FALSE(init_on_alloc);
2707+
#endif
2708+
static inline bool want_init_on_alloc(gfp_t flags)
2709+
{
2710+
if (static_branch_unlikely(&init_on_alloc) &&
2711+
!page_poisoning_enabled())
2712+
return true;
2713+
return flags & __GFP_ZERO;
2714+
}
2715+
2716+
#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
2717+
DECLARE_STATIC_KEY_TRUE(init_on_free);
2718+
#else
2719+
DECLARE_STATIC_KEY_FALSE(init_on_free);
2720+
#endif
2721+
static inline bool want_init_on_free(void)
2722+
{
2723+
return static_branch_unlikely(&init_on_free) &&
2724+
!page_poisoning_enabled();
2725+
}
2726+
27032727
#ifdef CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
27042728
DECLARE_STATIC_KEY_TRUE(_debug_pagealloc_enabled);
27052729
#else

mm/dmapool.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
378378
#endif
379379
spin_unlock_irqrestore(&pool->lock, flags);
380380

381-
if (mem_flags & __GFP_ZERO)
381+
if (want_init_on_alloc(mem_flags))
382382
memset(retval, 0, pool->size);
383383

384384
return retval;
@@ -428,6 +428,8 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
428428
}
429429

430430
offset = vaddr - page->vaddr;
431+
if (want_init_on_free())
432+
memset(vaddr, 0, pool->size);
431433
#ifdef DMAPOOL_DEBUG
432434
if ((dma - page->dma) != offset) {
433435
spin_unlock_irqrestore(&pool->lock, flags);

mm/page_alloc.c

Lines changed: 64 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,55 @@ unsigned long totalcma_pages __read_mostly;
135135

136136
int percpu_pagelist_fraction;
137137
gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
138+
#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
139+
DEFINE_STATIC_KEY_TRUE(init_on_alloc);
140+
#else
141+
DEFINE_STATIC_KEY_FALSE(init_on_alloc);
142+
#endif
143+
EXPORT_SYMBOL(init_on_alloc);
144+
145+
#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
146+
DEFINE_STATIC_KEY_TRUE(init_on_free);
147+
#else
148+
DEFINE_STATIC_KEY_FALSE(init_on_free);
149+
#endif
150+
EXPORT_SYMBOL(init_on_free);
151+
152+
static int __init early_init_on_alloc(char *buf)
153+
{
154+
int ret;
155+
bool bool_result;
156+
157+
if (!buf)
158+
return -EINVAL;
159+
ret = kstrtobool(buf, &bool_result);
160+
if (bool_result && page_poisoning_enabled())
161+
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_alloc\n");
162+
if (bool_result)
163+
static_branch_enable(&init_on_alloc);
164+
else
165+
static_branch_disable(&init_on_alloc);
166+
return ret;
167+
}
168+
early_param("init_on_alloc", early_init_on_alloc);
169+
170+
static int __init early_init_on_free(char *buf)
171+
{
172+
int ret;
173+
bool bool_result;
174+
175+
if (!buf)
176+
return -EINVAL;
177+
ret = kstrtobool(buf, &bool_result);
178+
if (bool_result && page_poisoning_enabled())
179+
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_free\n");
180+
if (bool_result)
181+
static_branch_enable(&init_on_free);
182+
else
183+
static_branch_disable(&init_on_free);
184+
return ret;
185+
}
186+
early_param("init_on_free", early_init_on_free);
138187

139188
/*
140189
* A cached value of the page's pageblock's migratetype, used when the page is
@@ -1067,6 +1116,14 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
10671116
return ret;
10681117
}
10691118

1119+
static void kernel_init_free_pages(struct page *page, int numpages)
1120+
{
1121+
int i;
1122+
1123+
for (i = 0; i < numpages; i++)
1124+
clear_highpage(page + i);
1125+
}
1126+
10701127
static __always_inline bool free_pages_prepare(struct page *page,
10711128
unsigned int order, bool check_free)
10721129
{
@@ -1118,6 +1175,9 @@ static __always_inline bool free_pages_prepare(struct page *page,
11181175
PAGE_SIZE << order);
11191176
}
11201177
arch_free_page(page, order);
1178+
if (want_init_on_free())
1179+
kernel_init_free_pages(page, 1 << order);
1180+
11211181
kernel_poison_pages(page, 1 << order, 0);
11221182
if (debug_pagealloc_enabled())
11231183
kernel_map_pages(page, 1 << order, 0);
@@ -2019,8 +2079,8 @@ static inline int check_new_page(struct page *page)
20192079

20202080
static inline bool free_pages_prezeroed(void)
20212081
{
2022-
return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
2023-
page_poisoning_enabled();
2082+
return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
2083+
page_poisoning_enabled()) || want_init_on_free();
20242084
}
20252085

20262086
#ifdef CONFIG_DEBUG_VM
@@ -2090,13 +2150,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
20902150
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
20912151
unsigned int alloc_flags)
20922152
{
2093-
int i;
2094-
20952153
post_alloc_hook(page, order, gfp_flags);
20962154

2097-
if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
2098-
for (i = 0; i < (1 << order); i++)
2099-
clear_highpage(page + i);
2155+
if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
2156+
kernel_init_free_pages(page, 1 << order);
21002157

21012158
if (order && (gfp_flags & __GFP_COMP))
21022159
prep_compound_page(page, order);

mm/slab.c

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1811,6 +1811,14 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
18111811

18121812
cachep->num = 0;
18131813

1814+
/*
1815+
* If slab auto-initialization on free is enabled, store the freelist
1816+
* off-slab, so that its contents don't end up in one of the allocated
1817+
* objects.
1818+
*/
1819+
if (unlikely(slab_want_init_on_free(cachep)))
1820+
return false;
1821+
18141822
if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
18151823
return false;
18161824

@@ -3248,7 +3256,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
32483256
local_irq_restore(save_flags);
32493257
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
32503258

3251-
if (unlikely(flags & __GFP_ZERO) && ptr)
3259+
if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
32523260
memset(ptr, 0, cachep->object_size);
32533261

32543262
slab_post_alloc_hook(cachep, flags, 1, &ptr);
@@ -3305,7 +3313,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
33053313
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
33063314
prefetchw(objp);
33073315

3308-
if (unlikely(flags & __GFP_ZERO) && objp)
3316+
if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
33093317
memset(objp, 0, cachep->object_size);
33103318

33113319
slab_post_alloc_hook(cachep, flags, 1, &objp);
@@ -3426,6 +3434,8 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
34263434
struct array_cache *ac = cpu_cache_get(cachep);
34273435

34283436
check_irq_off();
3437+
if (unlikely(slab_want_init_on_free(cachep)))
3438+
memset(objp, 0, cachep->object_size);
34293439
kmemleak_free_recursive(objp, cachep->flags);
34303440
objp = cache_free_debugcheck(cachep, objp, caller);
34313441

@@ -3513,7 +3523,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
35133523
cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
35143524

35153525
/* Clear memory outside IRQ disabled section */
3516-
if (unlikely(flags & __GFP_ZERO))
3526+
if (unlikely(slab_want_init_on_alloc(flags, s)))
35173527
for (i = 0; i < size; i++)
35183528
memset(p[i], 0, s->object_size);
35193529

mm/slab.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -607,4 +607,24 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
607607
static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
608608
#endif /* CONFIG_SLAB_FREELIST_RANDOM */
609609

610+
static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
611+
{
612+
if (static_branch_unlikely(&init_on_alloc)) {
613+
if (c->ctor)
614+
return false;
615+
if (c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON))
616+
return flags & __GFP_ZERO;
617+
return true;
618+
}
619+
return flags & __GFP_ZERO;
620+
}
621+
622+
static inline bool slab_want_init_on_free(struct kmem_cache *c)
623+
{
624+
if (static_branch_unlikely(&init_on_free))
625+
return !(c->ctor ||
626+
(c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)));
627+
return false;
628+
}
629+
610630
#endif /* MM_SLAB_H */

mm/slub.c

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1279,6 +1279,10 @@ static int __init setup_slub_debug(char *str)
12791279
if (*str == ',')
12801280
slub_debug_slabs = str + 1;
12811281
out:
1282+
if ((static_branch_unlikely(&init_on_alloc) ||
1283+
static_branch_unlikely(&init_on_free)) &&
1284+
(slub_debug & SLAB_POISON))
1285+
pr_info("mem auto-init: SLAB_POISON will take precedence over init_on_alloc/init_on_free\n");
12821286
return 1;
12831287
}
12841288

@@ -1422,6 +1426,28 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x)
14221426
static inline bool slab_free_freelist_hook(struct kmem_cache *s,
14231427
void **head, void **tail)
14241428
{
1429+
1430+
void *object;
1431+
void *next = *head;
1432+
void *old_tail = *tail ? *tail : *head;
1433+
int rsize;
1434+
1435+
if (slab_want_init_on_free(s))
1436+
do {
1437+
object = next;
1438+
next = get_freepointer(s, object);
1439+
/*
1440+
* Clear the object and the metadata, but don't touch
1441+
* the redzone.
1442+
*/
1443+
memset(object, 0, s->object_size);
1444+
rsize = (s->flags & SLAB_RED_ZONE) ? s->red_left_pad
1445+
: 0;
1446+
memset((char *)object + s->inuse, 0,
1447+
s->size - s->inuse - rsize);
1448+
set_freepointer(s, object, next);
1449+
} while (object != old_tail);
1450+
14251451
/*
14261452
* Compiler cannot detect this function can be removed if slab_free_hook()
14271453
* evaluates to nothing. Thus, catch all relevant config debug options here.
@@ -1431,9 +1457,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
14311457
defined(CONFIG_DEBUG_OBJECTS_FREE) || \
14321458
defined(CONFIG_KASAN)
14331459

1434-
void *object;
1435-
void *next = *head;
1436-
void *old_tail = *tail ? *tail : *head;
1460+
next = *head;
14371461

14381462
/* Head and tail of the reconstructed freelist */
14391463
*head = NULL;
@@ -2729,8 +2753,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
27292753
prefetch_freepointer(s, next_object);
27302754
stat(s, ALLOC_FASTPATH);
27312755
}
2756+
/*
2757+
* If the object has been wiped upon free, make sure it's fully
2758+
* initialized by zeroing out freelist pointer.
2759+
*/
2760+
if (unlikely(slab_want_init_on_free(s)) && object)
2761+
memset(object + s->offset, 0, sizeof(void *));
27322762

2733-
if (unlikely(gfpflags & __GFP_ZERO) && object)
2763+
if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
27342764
memset(object, 0, s->object_size);
27352765

27362766
slab_post_alloc_hook(s, gfpflags, 1, &object);
@@ -3151,7 +3181,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
31513181
local_irq_enable();
31523182

31533183
/* Clear memory outside IRQ disabled fastpath loop */
3154-
if (unlikely(flags & __GFP_ZERO)) {
3184+
if (unlikely(slab_want_init_on_alloc(flags, s))) {
31553185
int j;
31563186

31573187
for (j = 0; j < i; j++)

net/core/sock.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1597,7 +1597,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
15971597
sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
15981598
if (!sk)
15991599
return sk;
1600-
if (priority & __GFP_ZERO)
1600+
if (want_init_on_alloc(priority))
16011601
sk_prot_clear_nulls(sk, prot->obj_size);
16021602
} else
16031603
sk = kmalloc(prot->obj_size, priority);

security/Kconfig.hardening

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,35 @@ config STACKLEAK_RUNTIME_DISABLE
160160
runtime to control kernel stack erasing for kernels built with
161161
CONFIG_GCC_PLUGIN_STACKLEAK.
162162

163+
config INIT_ON_ALLOC_DEFAULT_ON
164+
bool "Enable heap memory zeroing on allocation by default"
165+
help
166+
This has the effect of setting "init_on_alloc=1" on the kernel
167+
command line. This can be disabled with "init_on_alloc=0".
168+
When "init_on_alloc" is enabled, all page allocator and slab
169+
allocator memory will be zeroed when allocated, eliminating
170+
many kinds of "uninitialized heap memory" flaws, especially
171+
heap content exposures. The performance impact varies by
172+
workload, but most cases see <1% impact. Some synthetic
173+
workloads have measured as high as 7%.
174+
175+
config INIT_ON_FREE_DEFAULT_ON
176+
bool "Enable heap memory zeroing on free by default"
177+
help
178+
This has the effect of setting "init_on_free=1" on the kernel
179+
command line. This can be disabled with "init_on_free=0".
180+
Similar to "init_on_alloc", when "init_on_free" is enabled,
181+
all page allocator and slab allocator memory will be zeroed
182+
when freed, eliminating many kinds of "uninitialized heap memory"
183+
flaws, especially heap content exposures. The primary difference
184+
with "init_on_free" is that data lifetime in memory is reduced,
185+
as anything freed is wiped immediately, making live forensics or
186+
cold boot memory attacks unable to recover freed memory contents.
187+
The performance impact varies by workload, but is more expensive
188+
than "init_on_alloc" due to the negative cache effects of
189+
touching "cold" memory areas. Most cases see 3-5% impact. Some
190+
synthetic workloads have measured as high as 8%.
191+
163192
endmenu
164193

165194
endmenu

0 commit comments

Comments
 (0)