mm/memcontrol: optimize __memcg_slab_post_alloc_hook batch charging by Copilot · Pull Request #3 · teawater/linux

Copilot · 2026-04-16T09:02:02Z

The per-object memcg charging loop in __memcg_slab_post_alloc_hook() re-acquired stock_lock once per object via mod_objcg_state(), and charged all objects upfront requiring individual uncharg calls when alloc_slab_obj_exts() failed.

Changes

Per-pgdat run batching: scan-ahead groups consecutive objects sharing a pgdat node into a run; obj_cgroup_charge() and mod_objcg_state() are called once per run instead of once per object — reducing local_lock_irqsave/irqrestore from N to ~1 for typical same-node bulk allocations
Charge-on-assign: upfront bulk charge (size * obj_full_size(s)) replaced with per-run charging; objects that fail alloc_slab_obj_exts() are skipped entirely, eliminating the per-failure obj_cgroup_uncharge() call
Hoist loop-invariant max_run: (INT_MAX - PAGE_SIZE) / obj_size computed once before the loop; also caps run length to keep batch_bytes within int range for mod_objcg_state()
Reuse first object's slab pointer: slab from the outer loop head is used directly for the first object's obj_ext assignment, saving a virt_to_slab() call per run
run_end absolute indexing: replaces run_len + relative j + skip_next flag; while (++i < run_end) advances i to the correct position automatically — failed alloc_slab_obj_exts() in scan-ahead simply breaks, and the next outer iteration retries naturally

/* Before: N lock acquisitions for N objects */
for (i = 0; i < size; i++) {
    ...
    mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), obj_full_size(s));
}

/* After: ~1 lock acquisition per pgdat run */
while (run_end < size && (run_end - i) < max_run) { /* scan ahead */ }
batch_bytes = (int)(obj_size * (run_end - i));
obj_cgroup_charge(objcg, flags, batch_bytes);
/* assign obj_ext per object ... */
mod_objcg_state(objcg, pgdat, cache_vmstat_idx(s), batch_bytes);

Original prompt

Problem

Commit 7ef269f ("mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook") introduced batch memcg charging but the implementation has several performance issues that cause overhead even when the optimization helps:

Issues in the current code (mm/memcontrol.c, function `__memcg_slab_post_alloc_hook`, around line 3280-3364):

1. max_size recomputed every outer iteration (line 3307-3308)

max_size = min((size_t)((INT_MAX - PAGE_SIZE) / obj_size), size);

This value is constant across all iterations — obj_size and size don't change. It should be hoisted before the loop.

2. skip_next flag adds unnecessary complexity (lines 3289, 3318-3319, 3360-3363)
When alloc_slab_obj_exts() fails for an object in the scan-ahead loop, the code sets skip_next = true, breaks, then at the end does:

if (skip_next)
    i = i + run_len + 1;
else
    i += run_len;

This is unnecessary. Without skip_next, setting i += run_len makes the outer loop land on the failed object. The next iteration calls virt_to_slab() + alloc_slab_obj_exts() which fails again → i++; continue; — same net result, and this is the rare error path so the one extra check is negligible.

3. Redundant virt_to_slab() calls
Each object in a run gets virt_to_slab() called twice:

Once in the scan-ahead loop (line 3311): struct slab *slab_j = virt_to_slab(p[j]);
Once again in the obj_ext assignment loop (line 3350): slab = virt_to_slab(p[i + j]);

For the first object of each run, it's called three times total (line 3291 + scan-ahead for next run's check + assignment).

The first object's slab pointer (slab from line 3291) is already available but thrown away in the assignment loop which re-computes it at j=0.

4. Confusing index arithmetic
The scan-ahead loop uses absolute index j from i+1, but the assignment loop uses relative j from 0 with p[i + j]. This is error-prone and harder to read.

5. run_len initialized to 0 then immediately set to 1 (lines 3286, 3300)
Minor but sloppy: size_t run_len = 0; followed by run_len = 1; a few lines later.

Proposed Fix

Optimize the loop in __memcg_slab_post_alloc_hook() with these changes:

Hoist max_run computation before the loop — it's a loop-invariant constant.
Remove the skip_next flag entirely — simplify the control flow. When alloc_slab_obj_exts() fails in scan-ahead, just break. The next outer iteration handles it naturally.
Avoid redundant virt_to_slab() for the first object — handle it separately using the slab pointer already computed at the top of the outer loop, then loop over remaining objects.
Use run_end absolute index — replace run_len + j relative indexing with run_end absolute index. The assignment loop uses do { ... } while (++i < run_end) which advances i directly, eliminating separate index tracking.

Here is the optimized replacement for the loop body in __memcg_slab_post_alloc_hook() (mm/memcontrol.c). Replace lines 3280-3364 (the for (i = 0; i < size; ) loop and its body) with:

	/*
	 * Cap run length to prevent integer overflow in the final
	 * accumulation performed by __account_obj_stock() where
	 * batch_bytes is added to an int field.
	 */
	const size_t max_run = min_t(size_t,
				     (INT_MAX - PAGE_SIZE) / obj_size, size);

	for (i = 0; i < size; ) {
		unsigned long obj_exts;
		struct slabobj_ext *obj_ext;
		struct obj_stock_pcp *stock;
		struct pglist_data *pgdat;
		int batch_bytes;
		size_t run_end;

		slab = virt_to_slab(p[i]);

		if (!slab_obj_exts(slab) &&
		    alloc_slab_obj_exts(slab, s, flags, false)) {
			i++;
			continue;
		}

		pgdat = slab_pgdat(slab);
		run_end = i + 1;

		/*
		 * Scan ahead for objects on the same pgdat node so we
		 * can batch the memcg charge into a single call.
		 */
		while (run_end < size && (run_end - i) < max_run) {
			struct slab *slab_j = virt_to_slab(p[run_end]);

			if (slab_pgdat(slab_j) != pgdat)
				break;

			if (!slab_obj_exts(slab_j) &&
			    alloc_slab_obj_exts(slab_j, s, flags, false))
				break;

			run_end++;
		}

		/*
		 * If we fail and size is 1, memcg_alloc_abort_single() will
		 * just free the object, which is ok as we have not assigned
		 * objcg to its obj_ext yet.
		 *
		 * For larger sizes, kmem_cache_free_bulk() will uncharge
		 * any objects that were already charged and obj_ext assigned.
		 */
		batch_bytes = obj_size * (run_end - i);
		stock = trylock_stock();
		if (!stock || !__consume_obj_stock(objcg, stock, batch_bytes)) {
			size_t remainder;

			unlock_stock(stock);
			if (__obj_cgroup_charge(objcg, flags, batch_bytes, &remainder))
				return false;
			stock = trylock_stock();
			if (remainder)
				__refill_obj_stock(objcg, stock, remainder, false);
		}
		__account_obj_stock(objcg, stock, batch_bytes,
				    pgdat, cache_vmstat_idx(s));
		unlock_stock(stock);

		/*
		 * Assign obj_ext for each object in this run.
		 * Re...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…oc_hook Rewrite the per-object slab memcg charging loop with several improvements: 1. Hoist the loop-invariant max_run computation before the loop. The cap (INT_MAX - PAGE_SIZE) / obj_size is constant across all iterations. 2. Replace the single upfront bulk charge + per-object vmstat updates with per-pgdat-run batching. The scan-ahead loop groups consecutive objects on the same pgdat node, then: - obj_cgroup_charge() is called once per run (not once for all objects upfront) - mod_objcg_state() is called once per run instead of once per object, reducing repeated local_lock_irqsave/irqrestore from N acquisitions to approximately one per pgdat run - Objects that fail alloc_slab_obj_exts() are simply skipped rather than being charged upfront and then individually uncharged 3. Avoid redundant virt_to_slab() for the first object of each run by reusing the slab pointer already computed at the top of the outer loop. 4. Use run_end absolute index with while (++i < run_end) to advance i directly, eliminating a separate run_len counter and relative indexing. The skip_next flag pattern is not needed: when alloc_slab_obj_exts() fails in the scan-ahead, we simply break and the next outer iteration retries that object naturally. Agent-Logs-Url: https://github.com/teawater/linux/sessions/cd4a349b-c91d-46eb-889a-fa3f75c83e73 Co-authored-by: teawater <432382+teawater@users.noreply.github.com>

Initial plan

adf9804

Copilot AI assigned Copilot and teawater Apr 16, 2026

Copilot started work on behalf of teawater April 16, 2026 09:02 View session

Copilot AI changed the title ~~[WIP] Fix performance issues in batch memcg charging implementation~~ mm/memcontrol: optimize __memcg_slab_post_alloc_hook batch charging Apr 16, 2026

Copilot AI requested a review from teawater April 16, 2026 09:17

Copilot finished work on behalf of teawater April 16, 2026 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mm/memcontrol: optimize __memcg_slab_post_alloc_hook batch charging#3

mm/memcontrol: optimize __memcg_slab_post_alloc_hook batch charging#3
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-batch-memcg-charging

Copilot AI commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Problem

Issues in the current code (mm/memcontrol.c, function __memcg_slab_post_alloc_hook, around line 3280-3364):

Proposed Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 16, 2026 •

edited

Loading

Issues in the current code (mm/memcontrol.c, function `__memcg_slab_post_alloc_hook`, around line 3280-3364):