Skip to content

Commit

Permalink
perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf …
Browse files Browse the repository at this point in the history
…NMI and throttling

[ Upstream commit baa014b ]

amd_pmu_enable_all() does:

      if (!test_bit(idx, cpuc->active_mask))
              continue;

      amd_pmu_enable_event(cpuc->events[idx]);

A perf NMI of another event can come between these two steps. Perf NMI
handler internally disables and enables _all_ events, including the one
which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
If that unintentionally enabled event has very low sampling period and
causes immediate successive NMI, causing the event to be throttled,
cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
This will result in amd_pmu_enable_event() getting called with event=NULL
when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
kernel crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000198
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  [...]
  Call Trace:
   <TASK>
   amd_pmu_enable_all+0x68/0xb0
   ctx_resched+0xd9/0x150
   event_function+0xb8/0x130
   ? hrtimer_start_range_ns+0x141/0x4a0
   ? perf_duration_warn+0x30/0x30
   remote_function+0x4d/0x60
   __flush_smp_call_function_queue+0xc4/0x500
   flush_smp_call_function_queue+0x11d/0x1b0
   do_idle+0x18f/0x2d0
   cpu_startup_entry+0x19/0x20
   start_secondary+0x121/0x160
   secondary_startup_64_no_verify+0xe5/0xeb
   </TASK>

amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
were recently added as part of BRS enablement but I'm not sure whether
we really need them. We can just disable BRS in the beginning and enable
it back while returning from NMI. This will solve the issue by not
enabling those events whose active_masks are set but are not yet enabled
in hw pmu.

Fixes: ada5434 ("perf/x86/amd: Add AMD Fam19h Branch Sampling support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
  • Loading branch information
Ravi Bangoria authored and gregkh committed Nov 26, 2022
1 parent b434476 commit fd5e454
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions arch/x86/events/amd/core.c
Expand Up @@ -896,8 +896,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
pmu_enabled = cpuc->enabled;
cpuc->enabled = 0;

/* stop everything (includes BRS) */
amd_pmu_disable_all();
amd_brs_disable_all();

/* Drain BRS is in use (could be inactive) */
if (cpuc->lbr_users)
Expand All @@ -908,7 +907,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)

cpuc->enabled = pmu_enabled;
if (pmu_enabled)
amd_pmu_enable_all(0);
amd_brs_enable_all();

return amd_pmu_adjust_nmi_window(handled);
}
Expand Down

0 comments on commit fd5e454

Please sign in to comment.