Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Commits on May 29, 2012
  1. @fweisbec

    rescounters: add res_counter_uncharge_until()

    fweisbec authored committed
    When killing a res_counter which is a child of other counter, we need to
    do
    
    	res_counter_uncharge(child, xxx)
    	res_counter_charge(parent, xxx)
    
    This is not atomic and wastes CPU.  This patch adds
    res_counter_uncharge_until().  This function's uncharge propagates to
    ancestors until specified res_counter.
    
    	res_counter_uncharge_until(child, parent, xxx)
    
    Now the operation is atomic and efficient.
    
    Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com>
    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Ying Han <yinghan@google.com>
    Cc: Glauber Costa <glommer@parallels.com>
    Reviewed-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Apr 27, 2012
  1. @fweisbec

    res_counter: Account max_usage when calling res_counter_charge_nofail()

    fweisbec authored Tejun Heo committed
    Updating max_usage is something one would expect when we reach
    a new maximum usage value even when we do this by forcing through
    the limit with res_counter_charge_nofail().
    
    (Whether we want to account failcnt when we force through the limit
    is another debate).
    
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Acked-by: Glauber Costa <glommer@parallels.com>
    Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
    Cc: Li Zefan <lizefan@huawei.com>
  2. @fweisbec

    res_counter: Merge res_counter_charge and res_counter_charge_nofail

    fweisbec authored Tejun Heo committed
    These two functions do almost the same thing and duplicate some code.
    Merge their implementation into a single common function.
    res_counter_charge_locked() takes one more parameter but it doesn't seem
    to be used outside res_counter.c yet anyway.
    
    There is no (intended) change in the behaviour.
    
    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Acked-by: Glauber Costa <glommer@parallels.com>
    Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
    Cc: Li Zefan <lizefan@huawei.com>
Commits on Jan 22, 2012
  1. @davem330

    net: introduce res_counter_charge_nofail() for socket allocations

    Glauber Costa authored davem330 committed
    There is a case in __sk_mem_schedule(), where an allocation
    is beyond the maximum, but yet we are allowed to proceed.
    It happens under the following condition:
    
    	sk->sk_wmem_queued + size >= sk->sk_sndbuf
    
    The network code won't revert the allocation in this case,
    meaning that at some point later it'll try to do it. Since
    this is never communicated to the underlying res_counter
    code, there is an inbalance in res_counter uncharge operation.
    
    I see two ways of fixing this:
    
    1) storing the information about those allocations somewhere
       in memcg, and then deducting from that first, before
       we start draining the res_counter,
    2) providing a slightly different allocation function for
       the res_counter, that matches the original behavior of
       the network code more closely.
    
    I decided to go for #2 here, believing it to be more elegant,
    since #1 would require us to do basically that, but in a more
    obscure way.
    
    Signed-off-by: Glauber Costa <glommer@parallels.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    CC: Tejun Heo <tj@kernel.org>
    CC: Li Zefan <lizf@cn.fujitsu.com>
    CC: Laurent Chavey <chavey@google.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Dec 13, 2011
  1. resource cgroups: remove bogus cast

    Davidlohr Bueso authored Tejun Heo committed
    The memparse() function already accepts const char * as the parsing string.
    
    Signed-off-by: Davidlohr Bueso <dave@gnu.org>
    Acked-by: Pavel Emelyanov <xemul@parallels.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
Commits on Mar 24, 2011
  1. @hkamezawa

    memcg: res_counter_read_u64(): fix potential races on 32-bit machines

    hkamezawa authored committed
    res_counter_read_u64 reads u64 value without lock.  It's dangerous in a
    32bit environment.  Add locking.
    
    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Minchan Kim <minchan.kim@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Mar 30, 2010
  1. include cleanup: Update gfp.h and slab.h includes to prepare for brea…

    Tejun Heo authored
    …king implicit slab.h inclusion from percpu.h
    
    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files.  percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.
    
    percpu.h -> slab.h dependency is about to be removed.  Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability.  As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.
    
      http://userweb.kernel.org/~tj/misc/slabh-sweep.py
    
    The script does the followings.
    
    * Scan files for gfp and slab usages and update includes such that
      only the necessary includes are there.  ie. if only gfp is used,
      gfp.h, if slab is used, slab.h.
    
    * When the script inserts a new include, it looks at the include
      blocks and try to put the new include such that its order conforms
      to its surrounding.  It's put in the include block which contains
      core kernel includes, in the same order that the rest are ordered -
      alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
      doesn't seem to be any matching order.
    
    * If the script can't find a place to put a new include (mostly
      because the file doesn't have fitting include block), it prints out
      an error message indicating which .h file needs to be added to the
      file.
    
    The conversion was done in the following steps.
    
    1. The initial automatic conversion of all .c files updated slightly
       over 4000 files, deleting around 700 includes and adding ~480 gfp.h
       and ~3000 slab.h inclusions.  The script emitted errors for ~400
       files.
    
    2. Each error was manually checked.  Some didn't need the inclusion,
       some needed manual addition while adding it to implementation .h or
       embedding .c file was more appropriate for others.  This step added
       inclusions to around 150 files.
    
    3. The script was run again and the output was compared to the edits
       from #2 to make sure no file was left behind.
    
    4. Several build tests were done and a couple of problems were fixed.
       e.g. lib/decompress_*.c used malloc/free() wrappers around slab
       APIs requiring slab.h to be added manually.
    
    5. The script was run on all .h files but without automatically
       editing them as sprinkling gfp.h and slab.h inclusions around .h
       files could easily lead to inclusion dependency hell.  Most gfp.h
       inclusion directives were ignored as stuff from gfp.h was usually
       wildly available and often used in preprocessor macros.  Each
       slab.h inclusion directive was examined and added manually as
       necessary.
    
    6. percpu.h was updated not to include slab.h.
    
    7. Build test were done on the following configurations and failures
       were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
       distributed build env didn't work with gcov compiles) and a few
       more options had to be turned off depending on archs to make things
       build (like ipr on powerpc/64 which failed due to missing writeq).
    
       * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
       * powerpc and powerpc64 SMP allmodconfig
       * sparc and sparc64 SMP allmodconfig
       * ia64 SMP allmodconfig
       * s390 SMP allmodconfig
       * alpha SMP allmodconfig
       * um on x86_64 SMP allmodconfig
    
    8. percpu.h modifications were reverted so that it could be applied as
       a separate patch and serve as bisection point.
    
    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Commits on Oct 1, 2009
  1. @hkamezawa

    memcg: some modification to softlimit under hierarchical memory reclaim.

    hkamezawa authored committed
    This patch clean up/fixes for memcg's uncharge soft limit path.
    
    Problems:
      Now, res_counter_charge()/uncharge() handles softlimit information at
      charge/uncharge and softlimit-check is done when event counter per memcg
      goes over limit. Now, event counter per memcg is updated only when
      memory usage is over soft limit. Here, considering hierarchical memcg
      management, ancesotors should be taken care of.
    
      Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
      This is not good.
    
      Prolems:
      1. memcg's event counter incremented only when softlimit hits. That's bad.
         It makes event counter hard to be reused for other purpose.
    
      2. At uncharge, only the lowest level rescounter is handled. This is bug.
         Because ancesotor's event counter is not incremented, children should
         take care of them.
    
      3. res_counter_uncharge()'s 3rd argument is NULL in most case.
         ops under res_counter->lock should be small. No "if" sentense is better.
    
    Fixes:
      * Removed soft_limit_xx poitner and checks in charge and uncharge.
        Do-check-only-when-necessary scheme works enough well without them.
    
      * make event-counter of memcg incremented at every charge/uncharge.
        (per-cpu area will be accessed soon anyway)
    
      * All ancestors are checked at soft-limit-check. This is necessary because
        ancesotor's event counter may never be modified. Then, they should be
        checked at the same time.
    
    Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Paul Menage <menage@google.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Sep 24, 2009
  1. memory controller: soft limit organize cgroups

    Balbir Singh authored committed
    Organize cgroups over soft limit in a RB-Tree
    
    Introduce an RB-Tree for storing memory cgroups that are over their soft
    limit.  The overall goal is to
    
    1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
       We are careful about updates, updates take place only after a particular
       time interval has passed
    2. We remove the node from the RB-Tree when the usage goes below the soft
       limit
    
    The next set of patches will exploit the RB-Tree to get the group that is
    over its soft limit by the largest amount and reclaim from it, when we
    face memory contention.
    
    [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot]
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
    Cc: Jiri Slaby <jirislaby@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  2. memory controller: soft limit interface

    Balbir Singh authored committed
    Add an interface to allow get/set of soft limits.  Soft limits for memory
    plus swap controller (memsw) is currently not supported.  Resource
    counters have been enhanced to support soft limits and new type
    RES_SOFT_LIMIT has been added.  Unlike hard limits, soft limits can be
    directly set and do not need any reclaim or checks before setting them to
    a newer value.
    
    Kamezawa-San raised a question as to whether soft limit should belong to
    res_counter.  Since all resources understand the basic concepts of hard
    and soft limits, it is justified to add soft limits here.  Soft limits are
    a generic resource usage feature, even file system quotas support soft
    limits.
    
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Jun 18, 2009
  1. memcg: add interface to reset limits

    Daisuke Nishimura authored committed
    We don't have an interface to reset mem.limit or memsw.limit now.
    
    This patch allows to reset mem.limit or memsw.limit when they are being
    set to -1.
    
    Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Jan 8, 2009
  1. memcg: memory cgroup resource counters for hierarchy

    Balbir Singh authored committed
    Add support for building hierarchies in resource counters.  Cgroups allows
    us to build a deep hierarchy, but we currently don't link the resource
    counters belonging to the memory controller control groups, in the same
    fashion as the corresponding cgroup entries in the cgroup hierarchy.  This
    patch provides the infrastructure for resource counters that have the same
    hiearchy as their cgroup counter parts.
    
    These set of patches are based on the resource counter hiearchy patches
    posted by Pavel Emelianov.
    
    NOTE: Building hiearchies is expensive, deeper hierarchies imply charging
    the all the way up to the root.  It is known that hiearchies are
    expensive, so the user needs to be careful and aware of the trade-offs
    before creating very deep ones.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
    Cc: Paul Menage <menage@google.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Pavel Emelianov <xemul@openvz.org>
    Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Jul 25, 2008
  1. cgroup files: convert res_counter_write() to be a cgroups write_strin…

    Paul Menage authored committed
    …g() handler
    
    Currently res_counter_write() is a raw file handler even though it's
    ultimately taking a number, since in some cases it wants to
    pre-process the string when converting it to a number.
    
    This patch converts res_counter_write() from a raw file handler to a
    write_string() handler; this allows some of the boilerplate
    copying/locking/checking to be removed, and simplies the cleanup path,
    since these functions are now performed by the cgroups framework.
    
    [lizf@cn.fujitsu.com: build fix]
    Signed-off-by: Paul Menage <menage@google.com>
    Cc: Paul Jackson <pj@sgi.com>
    Cc: Pavel Emelyanov <xemul@openvz.org>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Serge Hallyn <serue@us.ibm.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Apr 29, 2008
  1. @xemul

    memcgroup: add the max_usage member on the res_counter

    xemul authored committed
    This field is the maximal value of the usage one since the counter creation
    (or since the latest reset).
    
    To reset this to the usage value simply write anything to the appropriate
    cgroup file.
    
    Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
    Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  2. CGroup API files: add res_counter_read_u64()

    Paul Menage authored committed
    Adds a function for returning the value of a resource counter member, in a
    form suitable for use in a cgroup read_u64 control file method.
    
    Signed-off-by: Paul Menage <menage@google.com>
    Cc: "Li Zefan" <lizf@cn.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Paul Jackson <pj@sgi.com>
    Cc: Pavel Emelyanov <xemul@openvz.org>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  3. @rpjday

    kernel: explicitly include required header files under kernel/

    rpjday authored committed
    Following an experimental deletion of the unnecessary directive
    
     #include <linux/slab.h>
    
    from the header file <linux/percpu.h>, these files under kernel/ were exposed
    as needing to include one of <linux/slab.h> or <linux/gfp.h>, so explicit
    includes were added where necessary.
    
    Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Mar 5, 2008
  1. Memory Resource Controller use strstrip while parsing arguments

    Balbir Singh authored Linus Torvalds committed
    The memory controller has a requirement that while writing values, we need
    to use echo -n. This patch fixes the problem and makes the UI more consistent.
    
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: Paul Menage <menage@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits on Feb 7, 2008
  1. Memory controller improve user interface

    Balbir Singh authored Linus Torvalds committed
    Change the interface to use bytes instead of pages.  Page sizes can vary
    across platforms and configurations.  A new strategy routine has been added
    to the resource counters infrastructure to format the data as desired.
    
    Suggested by David Rientjes, Andrew Morton and Herbert Poetzl
    
    Tested on a UML setup with the config for memory control enabled.
    
    [kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter]
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Signed-off-by: Pavel Emelianov <xemul@openvz.org>
    Cc: Paul Menage <menage@google.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Kirill Korotaev <dev@sw.ru>
    Cc: Herbert Poetzl <herbert@13thfloor.at>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  2. @xemul

    Memory controller: resource counters

    xemul authored Linus Torvalds committed
    With fixes from David Rientjes <rientjes@google.com>
    
    Introduce generic structures and routines for resource accounting.
    
    Each resource accounting cgroup is supposed to aggregate it,
    cgroup_subsystem_state and its resource-specific members within.
    
    Signed-off-by: Pavel Emelianov <xemul@openvz.org>
    Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: Paul Menage <menage@google.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Kirill Korotaev <dev@sw.ru>
    Cc: Herbert Poetzl <herbert@13thfloor.at>
    Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
    Signed-off-by: David Rientjes <rientjes@google.com>
    Cc: Pavel Emelianov <xemul@openvz.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Something went wrong with that request. Please try again.