Skip to content

Commit

Permalink
mm: memcg: fix test for child groups
Browse files Browse the repository at this point in the history
When memcg code needs to know whether any given memcg has children, it
uses the cgroup child iteration primitives and returns true/false
depending on whether the iteration loop is executed at least once or
not.

Because a cgroup's list of children is RCU protected, these primitives
require the RCU read-lock to be held, which is not the case for all
memcg callers.  This results in the following splat when e.g.  enabling
hierarchy mode:

  WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
  CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9-dirty #18
  Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
  Call Trace:
    dump_stack+0x54/0x74
    warn_slowpath_common+0x78/0xa0
    warn_slowpath_null+0x1a/0x20
    css_next_child+0xa3/0x160
    mem_cgroup_hierarchy_write+0x5b/0xa0
    cgroup_file_write+0x108/0x2a0
    vfs_write+0xbd/0x1e0
    SyS_write+0x4c/0xa0
    system_call_fastpath+0x16/0x1b

In the memcg case, we only care about children when we are attempting to
modify inheritable attributes interactively.  Racing with deletion could
mean a spurious -EBUSY, no problem.  Racing with addition is handled
just fine as well through the memcg_create_mutex: if the child group is
not on the list after the mutex is acquired, it won't be initialized
from the parent's attributes until after the unlock.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
hnaz authored and torvalds committed Oct 31, 2013
1 parent 0056f4e commit 696ac17
Showing 1 changed file with 11 additions and 24 deletions.
35 changes: 11 additions & 24 deletions mm/memcontrol.c
Original file line number Diff line number Diff line change
Expand Up @@ -4959,31 +4959,18 @@ static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
} while (usage > 0);
}

/*
* This mainly exists for tests during the setting of set of use_hierarchy.
* Since this is the very setting we are changing, the current hierarchy value
* is meaningless
*/
static inline bool __memcg_has_children(struct mem_cgroup *memcg)
{
struct cgroup_subsys_state *pos;

/* bounce at first found */
css_for_each_child(pos, &memcg->css)
return true;
return false;
}

/*
* Must be called with memcg_create_mutex held, unless the cgroup is guaranteed
* to be already dead (as in mem_cgroup_force_empty, for instance). This is
* from mem_cgroup_count_children(), in the sense that we don't really care how
* many children we have; we only need to know if we have any. It also counts
* any memcg without hierarchy as infertile.
*/
static inline bool memcg_has_children(struct mem_cgroup *memcg)
{
return memcg->use_hierarchy && __memcg_has_children(memcg);
lockdep_assert_held(&memcg_create_mutex);
/*
* The lock does not prevent addition or deletion to the list
* of children, but it prevents a new child from being
* initialized based on this parent in css_online(), so it's
* enough to decide whether hierarchically inherited
* attributes can still be changed or not.
*/
return memcg->use_hierarchy &&
!list_empty(&memcg->css.cgroup->children);
}

/*
Expand Down Expand Up @@ -5063,7 +5050,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css,
*/
if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
(val == 1 || val == 0)) {
if (!__memcg_has_children(memcg))
if (list_empty(&memcg->css.cgroup->children))
memcg->use_hierarchy = val;
else
retval = -EBUSY;
Expand Down

0 comments on commit 696ac17

Please sign in to comment.