Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

We’re showing branches in this repository, but you can also compare across forks.

base fork: torvalds/linux
base: master
...
head fork: fweisbec/linux-dynticks
  • 7 commits
  • 13 files changed
  • 0 commit comments
  • 1 contributor
Commits on Jun 13, 2012
Frederic Weisbecker fweisbec nohz: Add more comment about CONFIG_NO_HZ
In order to prepare for adding a new config to implement
adaptive tickless, clarify that CONFIG_NO_HZ alone only
stops the tick on idle.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
99d7b5a
Frederic Weisbecker fweisbec nohz: Introduce adaptive nohz config
Prepare a config option for the full adaptive nohz feature.
This way we can start to put the related code under appropriate
ifdefs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
c9f5391
Frederic Weisbecker fweisbec nohz: Generalize tickless cpu time accounting
When the CPU enters idle, it saves the jiffies stamp into
ts->idle_jiffies, increment this value by one every time
there is a timer interrupt and accounts "jiffies - ts->idle_jiffies"
idle ticks when we exit idle. This way we still account the
idle CPU time even if the tick is stopped.

This patch settles the ground to generalize this for user
and system accounting. ts->idle_jiffies becomes ts->saved_jiffies and
a new member ts->saved_jiffies_whence indicates from which domain
we saved the jiffies: user, system or idle.

This is one more step toward making the tickless infrastructure usable
further idle contexts.

For now this is only used by idle but further patches make use of
it for user and system.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
7ad34f4
Frederic Weisbecker fweisbec nohz: Account user and system times in adaptive nohz mode
When we'll run in adaptive tickless mode, the tick won't be
there anymore to maintain the user/system cputime on every jiffy.

To solve this, save a snapshot of the jiffies on the boundaries of
the kernel and keep track of where we saved it: user or system entry.
On top of this, we account the cputime elapsed when we cross
back the kernel boundaries and when we deschedule the task.

We do this only when requested through the TIF_NOHZ thread flag.
This will later be used by the timer engine when the tick gets
stopped.

This only settles system and user cputime accounting on kernel
boundaries. Further patches will complete the handling of adaptive
tickless cputime by saving and flushing the time on well defined
points: tick stop, tick restart, cputime report to user, etc...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
82e2630
Frederic Weisbecker fweisbec x86: Syscall hooks for adaptive nohz mode
Add syscall hooks to notify syscall entry and exit on
CPUs running in full adative nohz mode. This way we
can account the cputime elapsed in kernel boundaries.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
672776a
Frederic Weisbecker fweisbec x86: Add adaptive tickless hooks on do_notify_resume()
Before resuming to userspace, we may fall into do_notify_resume()
to handle signals or other things. And because we may be coming
from syscall/exception or interrupt exit, the current cputime is
considered as happening in userspace.

However we want do_notify_resume() cputime to be considered as
system time. Put the kernel boundaries hook in this function
to ensure that.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
495aaa7
Frederic Weisbecker fweisbec x86: Exception hooks for adaptive tickless
Add necessary hooks to x86 exception for adaptive nohz
support so that the time spent on exceptions handling is
considered as system cputime.

This includes traps, page fault, debug exceptions, etc...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
37c54b0
8 arch/Kconfig
View
@@ -251,6 +251,14 @@ config HAVE_CMPXCHG_DOUBLE
config ARCH_WANT_OLD_COMPAT_IPC
bool
+config HAVE_NO_HZ_FULL
+ bool
+ help
+ An arch should select this symbols if it provides
+ the kernel entry/exit hooks necessary to implement
+ full tickless support. This includes syscall entry/exit,
+ exceptions entry/exit and do_notify_resume() hooks.
+
config HAVE_ARCH_SECCOMP_FILTER
bool
help
1  arch/x86/Kconfig
View
@@ -95,6 +95,7 @@ config X86
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
+ select HAVE_NO_HZ_FULL
config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS || UPROBES)
10 arch/x86/include/asm/thread_info.h
View
@@ -89,6 +89,7 @@ struct thread_info {
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
#define TIF_FORK 18 /* ret_from_fork */
+#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_DEBUG 21 /* uses debug registers */
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
@@ -114,6 +115,7 @@ struct thread_info {
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
#define _TIF_FORK (1 << TIF_FORK)
+#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_DEBUG (1 << TIF_DEBUG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
@@ -126,12 +128,13 @@ struct thread_info {
/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \
- _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
+ _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
+ _TIF_NOHZ)
/* work to do in syscall_trace_leave() */
#define _TIF_WORK_SYSCALL_EXIT \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \
- _TIF_SYSCALL_TRACEPOINT)
+ _TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
@@ -141,7 +144,8 @@ struct thread_info {
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK \
- ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT)
+ ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \
+ _TIF_NOHZ)
/* Only used for 64 bit */
#define _TIF_DO_NOTIFY_MASK \
5 arch/x86/kernel/ptrace.c
View
@@ -21,6 +21,7 @@
#include <linux/signal.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
+#include <linux/tick.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -1463,6 +1464,8 @@ long syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;
+ tick_nohz_enter_kernel();
+
/*
* If we stepped into a sysenter/syscall insn, it trapped in
* kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
@@ -1526,4 +1529,6 @@ void syscall_trace_leave(struct pt_regs *regs)
!test_thread_flag(TIF_SYSCALL_EMU);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+ tick_nohz_exit_kernel();
}
3  arch/x86/kernel/signal.c
View
@@ -19,6 +19,7 @@
#include <linux/uaccess.h>
#include <linux/user-return-notifier.h>
#include <linux/uprobes.h>
+#include <linux/tick.h>
#include <asm/processor.h>
#include <asm/ucontext.h>
@@ -776,6 +777,7 @@ static void do_signal(struct pt_regs *regs)
void
do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
{
+ tick_nohz_enter_kernel();
#ifdef CONFIG_X86_MCE
/* notify userspace of pending MCEs */
if (thread_info_flags & _TIF_MCE_NOTIFY)
@@ -801,6 +803,7 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
#ifdef CONFIG_X86_32
clear_thread_flag(TIF_IRET);
#endif /* CONFIG_X86_32 */
+ tick_nohz_exit_kernel();
}
void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
14 arch/x86/kernel/traps.c
View
@@ -26,6 +26,7 @@
#include <linux/sched.h>
#include <linux/timer.h>
#include <linux/init.h>
+#include <linux/tick.h>
#include <linux/bug.h>
#include <linux/nmi.h>
#include <linux/mm.h>
@@ -311,6 +312,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ tick_nohz_enter_exception(regs);
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
@@ -330,6 +332,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, NULL);
preempt_conditional_cli(regs);
debug_stack_usage_dec();
+ tick_nohz_exit_exception(regs);
}
#ifdef CONFIG_X86_64
@@ -390,6 +393,8 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
unsigned long dr6;
int si_code;
+ tick_nohz_enter_exception(regs);
+
get_debugreg(dr6, 6);
/* Filter out all the reserved bits which are preset to 1 */
@@ -405,7 +410,7 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
/* Catch kmemcheck conditions first of all! */
if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
- return;
+ goto exit;
/* DR6 may or may not be cleared by the CPU */
set_debugreg(0, 6);
@@ -420,7 +425,7 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
if (notify_die(DIE_DEBUG, "debug", regs, PTR_ERR(&dr6), error_code,
SIGTRAP) == NOTIFY_STOP)
- return;
+ goto exit;
/*
* Let others (NMI) know that the debug stack is in use
@@ -436,7 +441,7 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
X86_TRAP_DB);
preempt_conditional_cli(regs);
debug_stack_usage_dec();
- return;
+ goto exit;
}
/*
@@ -457,7 +462,8 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
preempt_conditional_cli(regs);
debug_stack_usage_dec();
- return;
+exit:
+ tick_nohz_exit_exception(regs);
}
/*
13 arch/x86/mm/fault.c
View
@@ -13,6 +13,7 @@
#include <linux/perf_event.h> /* perf_sw_event */
#include <linux/hugetlb.h> /* hstate_index_to_shift */
#include <linux/prefetch.h> /* prefetchw */
+#include <linux/tick.h>
#include <asm/traps.h> /* dotraplinkage, ... */
#include <asm/pgalloc.h> /* pgd_*(), ... */
@@ -1000,8 +1001,8 @@ static int fault_in_kernel_space(unsigned long address)
* and the problem, and then passes it off to one of the appropriate
* routines.
*/
-dotraplinkage void __kprobes
-do_page_fault(struct pt_regs *regs, unsigned long error_code)
+static void __kprobes
+__do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
@@ -1209,3 +1210,11 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
up_read(&mm->mmap_sem);
}
+
+dotraplinkage void __kprobes
+do_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+ tick_nohz_enter_exception(regs);
+ __do_page_fault(regs, error_code);
+ tick_nohz_exit_exception(regs);
+}
2  include/linux/kernel_stat.h
View
@@ -122,7 +122,9 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
extern unsigned long long task_delta_exec(struct task_struct *);
extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
+extern void account_user_ticks(struct task_struct *, unsigned long);
extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
+extern void account_system_ticks(struct task_struct *, unsigned long);
extern void account_steal_time(cputime_t);
extern void account_idle_time(cputime_t);
59 include/linux/tick.h
View
@@ -27,25 +27,33 @@ enum tick_nohz_mode {
NOHZ_MODE_HIGHRES,
};
+enum tick_saved_jiffies {
+ JIFFIES_SAVED_NONE,
+ JIFFIES_SAVED_IDLE,
+ JIFFIES_SAVED_USER,
+ JIFFIES_SAVED_SYS,
+};
+
/**
* struct tick_sched - sched tick emulation and no idle tick control/stats
- * @sched_timer: hrtimer to schedule the periodic tick in high
- * resolution mode
- * @last_tick: Store the last tick expiry time when the tick
- * timer is modified for nohz sleeps. This is necessary
- * to resume the tick timer operation in the timeline
- * when the CPU returns from nohz sleep.
- * @tick_stopped: Indicator that the idle tick has been stopped
- * @idle_jiffies: jiffies at the entry to idle for idle time accounting
- * @idle_calls: Total number of idle calls
- * @idle_sleeps: Number of idle calls, where the sched tick was stopped
- * @idle_entrytime: Time when the idle call was entered
- * @idle_waketime: Time when the idle was interrupted
- * @idle_exittime: Time when the idle state was left
- * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
- * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @sleep_length: Duration of the current idle sleep
- * @do_timer_lst: CPU was the last one doing do_timer before going idle
+ * @sched_timer: hrtimer to schedule the periodic tick in high
+ * resolution mode
+ * @last_tick: Store the last tick expiry time when the tick
+ * timer is modified for nohz sleeps. This is necessary
+ * to resume the tick timer operation in the timeline
+ * when the CPU returns from nohz sleep.
+ * @tick_stopped: Indicator that the idle tick has been stopped
+ * @idle_calls: Total number of idle calls
+ * @idle_sleeps: Number of idle calls, where the sched tick was stopped
+ * @idle_entrytime: Time when the idle call was entered
+ * @idle_waketime: Time when the idle was interrupted
+ * @idle_exittime: Time when the idle state was left
+ * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
+ * @saved_jiffies: Jiffies snapshot on tick stop for cpu time accounting
+ * @saved_jiffies_whence: Area where we saved @saved_jiffies
+ * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
+ * @sleep_length: Duration of the current idle sleep
+ * @do_timer_lst: CPU was the last one doing do_timer before going idle
*/
struct tick_sched {
struct hrtimer sched_timer;
@@ -54,7 +62,6 @@ struct tick_sched {
ktime_t last_tick;
int inidle;
int tick_stopped;
- unsigned long idle_jiffies;
unsigned long idle_calls;
unsigned long idle_sleeps;
int idle_active;
@@ -62,6 +69,8 @@ struct tick_sched {
ktime_t idle_waketime;
ktime_t idle_exittime;
ktime_t idle_sleeptime;
+ enum tick_saved_jiffies saved_jiffies_whence;
+ unsigned long saved_jiffies;
ktime_t iowait_sleeptime;
ktime_t sleep_length;
unsigned long last_jiffies;
@@ -142,4 +151,18 @@ static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
# endif /* !NO_HZ */
+#ifdef CONFIG_NO_HZ_FULL
+extern void tick_nohz_enter_kernel(void);
+extern void tick_nohz_exit_kernel(void);
+extern void tick_nohz_enter_exception(struct pt_regs *regs);
+extern void tick_nohz_exit_exception(struct pt_regs *regs);
+extern void tick_nohz_pre_schedule(void);
+#else
+static inline void tick_nohz_enter_kernel(void) { }
+static inline void tick_nohz_exit_kernel(void) { }
+static inline void tick_nohz_enter_exception(struct pt_regs *regs) { }
+static inline void tick_nohz_exit_exception(struct pt_regs *regs) { }
+static inline void tick_nohz_pre_schedule(void) { }
+#endif /* !NO_HZ_FULL */
+
#endif
27 kernel/sched/core.c
View
@@ -1910,6 +1910,7 @@ static inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
+ tick_nohz_pre_schedule();
sched_info_switch(prev, next);
perf_event_task_sched_out(prev, next);
fire_sched_out_preempt_notifiers(prev, next);
@@ -2740,6 +2741,19 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
acct_update_integrals(p);
}
+#ifdef CONFIG_NO_HZ_FULL
+void account_user_ticks(struct task_struct *p, unsigned long ticks)
+{
+ cputime_t delta_cputime, delta_scaled;
+
+ if (ticks) {
+ delta_cputime = jiffies_to_cputime(ticks);
+ delta_scaled = cputime_to_scaled(ticks);
+ account_user_time(p, delta_cputime, delta_scaled);
+ }
+}
+#endif
+
/*
* Account guest cpu time to a process.
* @p: the process that the cpu time gets accounted to
@@ -2817,6 +2831,19 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
__account_system_time(p, cputime, cputime_scaled, index);
}
+#ifdef CONFIG_NO_HZ_FULL
+void account_system_ticks(struct task_struct *p, unsigned long ticks)
+{
+ cputime_t delta_cputime, delta_scaled;
+
+ if (ticks) {
+ delta_cputime = jiffies_to_cputime(ticks);
+ delta_scaled = cputime_to_scaled(ticks);
+ account_system_time(p, 0, delta_cputime, delta_scaled);
+ }
+}
+#endif
+
/*
* Account for involuntary wait time.
* @cputime: the cpu time spent in involuntary wait
14 kernel/time/Kconfig
View
@@ -58,13 +58,19 @@ config TICK_ONESHOT
bool
config NO_HZ
- bool "Tickless System (Dynamic Ticks)"
+ bool "Tickless idle system (Dynamic idle Ticks)"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
select TICK_ONESHOT
help
- This option enables a tickless system: timer interrupts will
- only trigger on an as-needed basis both when the system is
- busy and when the system is idle.
+ This option enables a tickless idle system: timer interrupts will
+ only trigger on an as-needed basis when the system is idle.
+
+config NO_HZ_FULL
+ bool "Full tickless system (Dynamic Ticks)"
+ depends on NO_HZ && HAVE_NO_HZ_FULL
+ help
+ This option enables a full adaptive tickless system: timer
+ interrupts will globally only trigger on an as-needed basis.
config HIGH_RES_TIMERS
bool "High Resolution Timer Support"
139 kernel/time/tick-sched.c
View
@@ -460,8 +460,10 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
ts->idle_expires = expires;
}
- if (!was_stopped && ts->tick_stopped)
- ts->idle_jiffies = ts->last_jiffies;
+ if (!was_stopped && ts->tick_stopped) {
+ ts->saved_jiffies = ts->last_jiffies;
+ ts->saved_jiffies_whence = JIFFIES_SAVED_IDLE;
+ }
}
}
@@ -578,22 +580,38 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
tick_nohz_restart(ts, now);
}
-static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
+static void tick_nohz_account_ticks(struct tick_sched *ts)
{
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
unsigned long ticks;
/*
- * We stopped the tick in idle. Update process times would miss the
- * time we slept as update_process_times does only a 1 tick
- * accounting. Enforce that this is accounted to idle !
+ * We stopped the tick. Update process times would miss the
+ * time we ran tickless as update_process_times does only a 1 tick
+ * accounting. Enforce that this is accounted to nohz timeslices.
*/
- ticks = jiffies - ts->idle_jiffies;
+ ticks = jiffies - ts->saved_jiffies;
/*
* We might be one off. Do not randomly account a huge number of ticks!
*/
- if (ticks && ticks < LONG_MAX)
- account_idle_ticks(ticks);
+ if (ticks && ticks < LONG_MAX) {
+ switch (ts->saved_jiffies_whence) {
+ case JIFFIES_SAVED_IDLE:
+ account_idle_ticks(ticks);
+ break;
+#ifdef CONFIG_NO_HZ_FULL
+ case JIFFIES_SAVED_USER:
+ account_user_ticks(current, ticks);
+ break;
+ case JIFFIES_SAVED_SYS:
+ account_system_ticks(current, ticks);
+ break;
+ case JIFFIES_SAVED_NONE:
+ break;
#endif
+ default:
+ WARN_ON_ONCE(1);
+ }
+ }
+ ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
}
/**
@@ -623,7 +641,9 @@ void tick_nohz_idle_exit(void)
if (ts->tick_stopped) {
tick_nohz_restart_sched_tick(ts, now);
- tick_nohz_account_idle_ticks(ts);
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+ tick_nohz_account_ticks(ts);
+#endif
}
local_irq_enable();
@@ -671,7 +691,7 @@ static void tick_nohz_handler(struct clock_event_device *dev)
*/
if (ts->tick_stopped) {
touch_softlockup_watchdog();
- ts->idle_jiffies++;
+ ts->saved_jiffies++;
}
update_process_times(user_mode(regs));
@@ -766,6 +786,85 @@ static inline void tick_check_nohz(int cpu)
}
}
+#ifdef CONFIG_NO_HZ_FULL
+void tick_nohz_exit_kernel(void)
+{
+ unsigned long flags;
+ struct tick_sched *ts;
+ unsigned long delta_jiffies;
+
+ if (!test_thread_flag(TIF_NOHZ))
+ return;
+
+ local_irq_save(flags);
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+
+ WARN_ON_ONCE(!ts->tick_stopped);
+ WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_SYS);
+
+ delta_jiffies = jiffies - ts->saved_jiffies;
+ account_system_ticks(current, delta_jiffies);
+
+ ts->saved_jiffies = jiffies;
+ ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+
+ local_irq_restore(flags);
+}
+
+void tick_nohz_enter_kernel(void)
+{
+ unsigned long flags;
+ struct tick_sched *ts;
+ unsigned long delta_jiffies;
+
+ if (!test_thread_flag(TIF_NOHZ))
+ return;
+
+ local_irq_save(flags);
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+
+ WARN_ON_ONCE(!ts->tick_stopped);
+ WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_USER);
+
+ delta_jiffies = jiffies - ts->saved_jiffies;
+ account_user_ticks(current, delta_jiffies);
+
+ ts->saved_jiffies = jiffies;
+ ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+ local_irq_restore(flags);
+}
+
+void tick_nohz_enter_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs))
+ tick_nohz_enter_kernel();
+}
+
+void tick_nohz_exit_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs))
+ tick_nohz_exit_kernel();
+}
+
+/*
+ * Flush cputime and clear hooks before context switch so that
+ * we account the time spent tickless.
+ */
+void tick_nohz_pre_schedule(void)
+{
+ struct tick_sched *ts;
+
+ if (test_thread_flag(TIF_NOHZ)) {
+ ts = &__get_cpu_var(tick_cpu_sched);
+ tick_nohz_account_ticks(ts);
+ clear_thread_flag(TIF_NOHZ);
+ }
+}
+#endif /* CONFIG_NO_HZ_FULL */
+
#else
static inline void tick_nohz_switch_to_nohz(void) { }
@@ -820,17 +919,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
*/
if (regs) {
/*
- * When we are idle and the tick is stopped, we have to touch
- * the watchdog as we might not schedule for a really long
- * time. This happens on complete idle SMP systems while
- * waiting on the login prompt. We also increment the "start of
- * idle" jiffy stamp so the idle accounting adjustment we do
- * when we go busy again does not account too much ticks.
+ * When the tick is stopped, we have to touch the watchdog
+ * as we might not schedule for a really long time. This
+ * happens on complete idle SMP systems while waiting on
+ * the login prompt. We also increment the last jiffy stamp
+ * recorded when we stopped the tick so the cpu time accounting
+ * adjustment does not account too much ticks when we flush them.
*/
if (ts->tick_stopped) {
+ /* CHECKME: may be this is only needed in idle */
touch_softlockup_watchdog();
- if (idle_cpu(cpu))
- ts->idle_jiffies++;
+ ts->saved_jiffies++;
}
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
3  kernel/time/timer_list.c
View
@@ -169,7 +169,8 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
P(nohz_mode);
P_ns(last_tick);
P(tick_stopped);
- P(idle_jiffies);
+ /* CHECKME: Do we want saved_jiffies_whence as well? */
+ P(saved_jiffies);
P(idle_calls);
P(idle_sleeps);
P_ns(idle_entrytime);

No commit comments for this range

Something went wrong with that request. Please try again.