Skip to content

Commit

Permalink
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/li…
Browse files Browse the repository at this point in the history
…nux/kernel/git/tip/tip

Pull perf changes from Ingo Molnar:
 "There are lots of improvements, the biggest changes are:

  Main kernel side changes:

   - Improve uprobes performance by adding 'pre-filtering' support, by
     Oleg Nesterov.

   - Make some POWER7 events available in sysfs, equivalent to what was
     done on x86, from Sukadev Bhattiprolu.

   - tracing updates by Steve Rostedt - mostly misc fixes and smaller
     improvements.

   - Use perf/event tracing to report PCI Express advanced errors, by
     Tony Luck.

   - Enable northbridge performance counters on AMD family 15h, by Jacob
     Shin.

   - This tracing commit:

        tracing: Remove the extra 4 bytes of padding in events

     changes the ABI.  All involved parties (PowerTop in particular)
     seem to agree that it's safe to do now with the introduction of
     libtraceevent, but the devil is in the details ...

  Main tooling side changes:

   - Add 'event group view', from Namyung Kim:

     To use it, 'perf record' should group events when recording.  And
     then perf report parses the saved group relation from file header
     and prints them together if --group option is provided.  You can
     use the 'perf evlist' command to see event group information:

        $ perf record -e '{ref-cycles,cycles}' noploop 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

        $ perf evlist --group
        {ref-cycles,cycles}

     With this example, default perf report will show you each event
     separately.

     You can use --group option to enable event group view:

        $ perf report --group
        ...
        # group: {ref-cycles,cycles}
        # ========
        # Samples: 7K of event 'anon group { ref-cycles, cycles }'
        # Event count (approx.): 6876107743
        #
        #         Overhead  Command      Shared Object                      Symbol
        # ................  .......  .................  ..........................
            99.84%  99.76%  noploop  noploop            [.] main
             0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
             0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
             0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
             0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
             0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
             0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
             0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
             0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

     As you can see the Overhead column now contains both of ref-cycles
     and cycles and header line shows group information also - 'anon
     group { ref-cycles, cycles }'.  The output is sorted by period of
     group leader first.

   - Initial GTK+ annotate browser, from Namhyung Kim.

   - Add option for runtime switching perf data file in perf report,
     just press 's' and a menu with the valid files found in the current
     directory will be presented, from Feng Tang.

   - Add support to display whole group data for raw columns, from Jiri
     Olsa.

   - Add per processor socket count aggregation in perf stat, from
     Stephane Eranian.

   - Add interval printing in 'perf stat', from Stephane Eranian.

   - 'perf test' improvements

   - Add support for wildcards in tracepoint system name, from Jiri
     Olsa.

   - Add anonymous huge page recognition, from Joshua Zhu.

   - perf build-id cache now can show DSOs present in a perf.data file
     that are not in the cache, to integrate with build-id servers being
     put in place by organizations such as Fedora.

   - perf top now shares more of the evsel config/creation routines with
     'record', paving the way for further integration like 'top'
     snapshots, etc.

   - perf top now supports DWARF callchains.

   - Fix mmap limitations on 32-bit, fix from David Miller.

   - 'perf bench numa mem' NUMA performance measurement suite

   - ... and lots of fixes, performance improvements, cleanups and other
     improvements I failed to list - see the shortlog and git log for
     details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
  perf/x86/amd: Enable northbridge performance counters on AMD family 15h
  perf/hwbp: Fix cleanup in case of kzalloc failure
  perf tools: Fix build with bison 2.3 and older.
  perf tools: Limit unwind support to x86 archs
  perf annotate: Make it to be able to skip unannotatable symbols
  perf gtk/annotate: Fail early if it can't annotate
  perf gtk/annotate: Show source lines with gray color
  perf gtk/annotate: Support multiple event annotation
  perf ui/gtk: Implement basic GTK2 annotation browser
  perf annotate: Fix warning message on a missing vmlinux
  perf buildid-cache: Add --update option
  uprobes/perf: Avoid uprobe_apply() whenever possible
  uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
  uprobes/perf: Teach trace_uprobe/perf code to pre-filter
  uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
  uprobes: Introduce uprobe_apply()
  perf: Introduce hw_perf_event->tp_target and ->tp_list
  uprobes/perf: Always increment trace_uprobe->nhit
  uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
  uprobes/tracing: Introduce is_trace_uprobe_enabled()
  ...
  • Loading branch information
torvalds committed Feb 20, 2013
2 parents b7133a9 + e259514 commit 8f55cea
Show file tree
Hide file tree
Showing 195 changed files with 8,995 additions and 4,097 deletions.
62 changes: 62 additions & 0 deletions Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -0,0 +1,62 @@
What: /sys/devices/cpu/events/
/sys/devices/cpu/events/branch-misses
/sys/devices/cpu/events/cache-references
/sys/devices/cpu/events/cache-misses
/sys/devices/cpu/events/stalled-cycles-frontend
/sys/devices/cpu/events/branch-instructions
/sys/devices/cpu/events/stalled-cycles-backend
/sys/devices/cpu/events/instructions
/sys/devices/cpu/events/cpu-cycles

Date: 2013/01/08

Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>

Description: Generic performance monitoring events

A collection of performance monitoring events that may be
supported by many/most CPUs. These events can be monitored
using the 'perf(1)' tool.

The contents of each file would look like:

event=0xNNNN

where 'N' is a hex digit and the number '0xNNNN' shows the
"raw code" for the perf event identified by the file's
"basename".


What: /sys/devices/cpu/events/PM_LD_MISS_L1
/sys/devices/cpu/events/PM_LD_REF_L1
/sys/devices/cpu/events/PM_CYC
/sys/devices/cpu/events/PM_BRU_FIN
/sys/devices/cpu/events/PM_GCT_NOSLOT_CYC
/sys/devices/cpu/events/PM_BRU_MPRED
/sys/devices/cpu/events/PM_INST_CMPL
/sys/devices/cpu/events/PM_CMPLU_STALL

Date: 2013/01/08

Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Linux Powerpc mailing list <linuxppc-dev@ozlabs.org>

Description: POWER-systems specific performance monitoring events

A collection of performance monitoring events that may be
supported by the POWER CPU. These events can be monitored
using the 'perf(1)' tool.

These events may not be supported by other CPUs.

The contents of each file would look like:

event=0xNNNN

where 'N' is a hex digit and the number '0xNNNN' shows the
"raw code" for the perf event identified by the file's
"basename".

Further, multiple terms like 'event=0xNNNN' can be specified
and separated with comma. All available terms are defined in
the /sys/bus/event_source/devices/<dev>/format file.
83 changes: 83 additions & 0 deletions Documentation/trace/ftrace.txt
Expand Up @@ -1842,6 +1842,89 @@ an error.
# cat buffer_size_kb
85

Snapshot
--------
CONFIG_TRACER_SNAPSHOT makes a generic snapshot feature
available to all non latency tracers. (Latency tracers which
record max latency, such as "irqsoff" or "wakeup", can't use
this feature, since those are already using the snapshot
mechanism internally.)

Snapshot preserves a current trace buffer at a particular point
in time without stopping tracing. Ftrace swaps the current
buffer with a spare buffer, and tracing continues in the new
current (=previous spare) buffer.

The following debugfs files in "tracing" are related to this
feature:

snapshot:

This is used to take a snapshot and to read the output
of the snapshot. Echo 1 into this file to allocate a
spare buffer and to take a snapshot (swap), then read
the snapshot from this file in the same format as
"trace" (described above in the section "The File
System"). Both reads snapshot and tracing are executable
in parallel. When the spare buffer is allocated, echoing
0 frees it, and echoing else (positive) values clear the
snapshot contents.
More details are shown in the table below.

status\input | 0 | 1 | else |
--------------+------------+------------+------------+
not allocated |(do nothing)| alloc+swap | EINVAL |
--------------+------------+------------+------------+
allocated | free | swap | clear |
--------------+------------+------------+------------+

Here is an example of using the snapshot feature.

# echo 1 > events/sched/enable
# echo 1 > snapshot
# cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 71/71 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [005] d... 2440.603828: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2242 next_prio=120
sleep-2242 [005] d... 2440.603846: sched_switch: prev_comm=snapshot-test-2 prev_pid=2242 prev_prio=120 prev_state=R ==> next_comm=kworker/5:1 next_pid=60 next_prio=120
[...]
<idle>-0 [002] d... 2440.707230: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2229 next_prio=120

# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 77/77 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [007] d... 2440.707395: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2243 next_prio=120
snapshot-test-2-2229 [002] d... 2440.707438: sched_switch: prev_comm=snapshot-test-2 prev_pid=2229 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
[...]


If you try to use this snapshot feature when current tracer is
one of the latency tracers, you will get the following results.

# echo wakeup > current_tracer
# echo 1 > snapshot
bash: echo: write error: Device or resource busy
# cat snapshot
cat: snapshot: Device or resource busy

-----------

More details can be found in the source code, in the
Expand Down
12 changes: 12 additions & 0 deletions arch/Kconfig
Expand Up @@ -76,6 +76,15 @@ config OPTPROBES
depends on KPROBES && HAVE_OPTPROBES
depends on !PREEMPT

config KPROBES_ON_FTRACE
def_bool y
depends on KPROBES && HAVE_KPROBES_ON_FTRACE
depends on DYNAMIC_FTRACE_WITH_REGS
help
If function tracer is enabled and the arch supports full
passing of pt_regs to function tracing, then kprobes can
optimize on top of function tracing.

config UPROBES
bool "Transparent user-space probes (EXPERIMENTAL)"
depends on UPROBE_EVENT && PERF_EVENTS
Expand Down Expand Up @@ -158,6 +167,9 @@ config HAVE_KRETPROBES
config HAVE_OPTPROBES
bool

config HAVE_KPROBES_ON_FTRACE
bool

config HAVE_NMI_WATCHDOG
bool
#
Expand Down
26 changes: 26 additions & 0 deletions arch/powerpc/include/asm/perf_event_server.h
Expand Up @@ -11,6 +11,7 @@

#include <linux/types.h>
#include <asm/hw_irq.h>
#include <linux/device.h>

#define MAX_HWEVENTS 8
#define MAX_EVENT_ALTERNATIVES 8
Expand All @@ -35,6 +36,7 @@ struct power_pmu {
void (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
u32 flags;
const struct attribute_group **attr_groups;
int n_generic;
int *generic_events;
int (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
Expand Down Expand Up @@ -109,3 +111,27 @@ extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
* If an event_id is not subject to the constraint expressed by a particular
* field, then it will have 0 in both the mask and value for that field.
*/

extern ssize_t power_events_sysfs_show(struct device *dev,
struct device_attribute *attr, char *page);

/*
* EVENT_VAR() is same as PMU_EVENT_VAR with a suffix.
*
* Having a suffix allows us to have aliases in sysfs - eg: the generic
* event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and
* 'PM_CYC' where the latter is the name by which the event is known in
* POWER CPU specification.
*/
#define EVENT_VAR(_id, _suffix) event_attr_##_id##_suffix
#define EVENT_PTR(_id, _suffix) &EVENT_VAR(_id, _suffix).attr.attr

#define EVENT_ATTR(_name, _id, _suffix) \
PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id, \
power_events_sysfs_show)

#define GENERIC_EVENT_ATTR(_name, _id) EVENT_ATTR(_name, _id, _g)
#define GENERIC_EVENT_PTR(_id) EVENT_PTR(_id, _g)

#define POWER_EVENT_ATTR(_name, _id) EVENT_ATTR(PM_##_name, _id, _p)
#define POWER_EVENT_PTR(_id) EVENT_PTR(_id, _p)
12 changes: 12 additions & 0 deletions arch/powerpc/perf/core-book3s.c
Expand Up @@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event)
return event->hw.idx;
}

ssize_t power_events_sysfs_show(struct device *dev,
struct device_attribute *attr, char *page)
{
struct perf_pmu_events_attr *pmu_attr;

pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);

return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}

struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable = power_pmu_disable,
Expand Down Expand Up @@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu)
pr_info("%s performance monitor hardware support registered\n",
pmu->name);

power_pmu.attr_groups = ppmu->attr_groups;

#ifdef MSR_HV
/*
* Use FCHV to ignore kernel events if MSR.HV is set.
Expand Down
80 changes: 72 additions & 8 deletions arch/powerpc/perf/power7-pmu.c
Expand Up @@ -50,6 +50,18 @@
#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8)
#define MMCR1_PMCSEL_MSK 0xff

/*
* Power7 event codes.
*/
#define PME_PM_CYC 0x1e
#define PME_PM_GCT_NOSLOT_CYC 0x100f8
#define PME_PM_CMPLU_STALL 0x4000a
#define PME_PM_INST_CMPL 0x2
#define PME_PM_LD_REF_L1 0xc880
#define PME_PM_LD_MISS_L1 0x400f0
#define PME_PM_BRU_FIN 0x10068
#define PME_PM_BRU_MPRED 0x400f6

/*
* Layout of constraint bits:
* 6666555555555544444444443333333333222222222211111111110000000000
Expand Down Expand Up @@ -307,14 +319,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
}

static int power7_generic_events[] = {
[PERF_COUNT_HW_CPU_CYCLES] = 0x1e,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */
[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a, /* CMPLU_STALL */
[PERF_COUNT_HW_INSTRUCTIONS] = 2,
[PERF_COUNT_HW_CACHE_REFERENCES] = 0xc880, /* LD_REF_L1_LSU*/
[PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1 */
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068, /* BRU_FIN */
[PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6, /* BR_MPRED */
[PERF_COUNT_HW_CPU_CYCLES] = PME_PM_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = PME_PM_CMPLU_STALL,
[PERF_COUNT_HW_INSTRUCTIONS] = PME_PM_INST_CMPL,
[PERF_COUNT_HW_CACHE_REFERENCES] = PME_PM_LD_REF_L1,
[PERF_COUNT_HW_CACHE_MISSES] = PME_PM_LD_MISS_L1,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PME_PM_BRU_FIN,
[PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BRU_MPRED,
};

#define C(x) PERF_COUNT_HW_CACHE_##x
Expand Down Expand Up @@ -362,6 +374,57 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
},
};


GENERIC_EVENT_ATTR(cpu-cycles, CYC);
GENERIC_EVENT_ATTR(stalled-cycles-frontend, GCT_NOSLOT_CYC);
GENERIC_EVENT_ATTR(stalled-cycles-backend, CMPLU_STALL);
GENERIC_EVENT_ATTR(instructions, INST_CMPL);
GENERIC_EVENT_ATTR(cache-references, LD_REF_L1);
GENERIC_EVENT_ATTR(cache-misses, LD_MISS_L1);
GENERIC_EVENT_ATTR(branch-instructions, BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, BRU_MPRED);

POWER_EVENT_ATTR(CYC, CYC);
POWER_EVENT_ATTR(GCT_NOSLOT_CYC, GCT_NOSLOT_CYC);
POWER_EVENT_ATTR(CMPLU_STALL, CMPLU_STALL);
POWER_EVENT_ATTR(INST_CMPL, INST_CMPL);
POWER_EVENT_ATTR(LD_REF_L1, LD_REF_L1);
POWER_EVENT_ATTR(LD_MISS_L1, LD_MISS_L1);
POWER_EVENT_ATTR(BRU_FIN, BRU_FIN)
POWER_EVENT_ATTR(BRU_MPRED, BRU_MPRED);

static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(CYC),
GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
GENERIC_EVENT_PTR(CMPLU_STALL),
GENERIC_EVENT_PTR(INST_CMPL),
GENERIC_EVENT_PTR(LD_REF_L1),
GENERIC_EVENT_PTR(LD_MISS_L1),
GENERIC_EVENT_PTR(BRU_FIN),
GENERIC_EVENT_PTR(BRU_MPRED),

POWER_EVENT_PTR(CYC),
POWER_EVENT_PTR(GCT_NOSLOT_CYC),
POWER_EVENT_PTR(CMPLU_STALL),
POWER_EVENT_PTR(INST_CMPL),
POWER_EVENT_PTR(LD_REF_L1),
POWER_EVENT_PTR(LD_MISS_L1),
POWER_EVENT_PTR(BRU_FIN),
POWER_EVENT_PTR(BRU_MPRED),
NULL
};


static struct attribute_group power7_pmu_events_group = {
.name = "events",
.attrs = power7_events_attr,
};

static const struct attribute_group *power7_pmu_attr_groups[] = {
&power7_pmu_events_group,
NULL,
};

static struct power_pmu power7_pmu = {
.name = "POWER7",
.n_counter = 6,
Expand All @@ -373,6 +436,7 @@ static struct power_pmu power7_pmu = {
.get_alternatives = power7_get_alternatives,
.disable_pmc = power7_disable_pmc,
.flags = PPMU_ALT_SIPR,
.attr_groups = power7_pmu_attr_groups,
.n_generic = ARRAY_SIZE(power7_generic_events),
.generic_events = power7_generic_events,
.cache_events = &power7_cache_events,
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/Kconfig
Expand Up @@ -39,10 +39,12 @@ config X86
select HAVE_DMA_CONTIGUOUS if !SWIOTLB
select HAVE_KRETPROBES
select HAVE_OPTPROBES
select HAVE_KPROBES_ON_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FENTRY if X86_64
select HAVE_C_RECORDMCOUNT
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_FP_TEST
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/cpufeature.h
Expand Up @@ -167,6 +167,7 @@
#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
#define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
#define X86_FEATURE_PERFCTR_NB (6*32+24) /* NB performance counter extensions */

/*
* Auxiliary flags: Linux defined - For features scattered in various
Expand Down Expand Up @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
#define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
#define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ)
#define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
#define cpu_has_perfctr_nb boot_cpu_has(X86_FEATURE_PERFCTR_NB)
#define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8)
#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16)
#define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU)
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/ftrace.h
Expand Up @@ -44,7 +44,6 @@

#ifdef CONFIG_DYNAMIC_FTRACE
#define ARCH_SUPPORTS_FTRACE_OPS 1
#define ARCH_SUPPORTS_FTRACE_SAVE_REGS
#endif

#ifndef __ASSEMBLY__
Expand Down

0 comments on commit 8f55cea

Please sign in to comment.