Skip to content

Commit

Permalink
x86/thermal: Fix LVT thermal setup for SMI delivery mode
Browse files Browse the repository at this point in the history
commit 9a90ed0 upstream.

There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8 ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
  • Loading branch information
Borislav Petkov authored and gregkh committed Jun 10, 2021
1 parent 25823b4 commit 3cbf4d6
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 5 deletions.
4 changes: 3 additions & 1 deletion arch/x86/include/asm/thermal.h
Expand Up @@ -3,11 +3,13 @@
#define _ASM_X86_THERMAL_H

#ifdef CONFIG_X86_THERMAL_VECTOR
void therm_lvt_init(void);
void intel_init_thermal(struct cpuinfo_x86 *c);
bool x86_thermal_enabled(void);
void intel_thermal_interrupt(void);
#else
static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
static inline void therm_lvt_init(void) { }
static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
#endif

#endif /* _ASM_X86_THERMAL_H */
9 changes: 9 additions & 0 deletions arch/x86/kernel/setup.c
Expand Up @@ -44,6 +44,7 @@
#include <asm/pci-direct.h>
#include <asm/prom.h>
#include <asm/proto.h>
#include <asm/thermal.h>
#include <asm/unwind.h>
#include <asm/vsyscall.h>
#include <linux/vmalloc.h>
Expand Down Expand Up @@ -1220,6 +1221,14 @@ void __init setup_arch(char **cmdline_p)

x86_init.timers.wallclock_init();

/*
* This needs to run before setup_local_APIC() which soft-disables the
* local APIC temporarily and that masks the thermal LVT interrupt,
* leading to softlockups on machines which have configured SMI
* interrupt delivery.
*/
therm_lvt_init();

mcheck_init();

register_refined_jiffies(CLOCK_TICK_RATE);
Expand Down
15 changes: 11 additions & 4 deletions drivers/thermal/intel/therm_throt.c
Expand Up @@ -621,6 +621,17 @@ bool x86_thermal_enabled(void)
return atomic_read(&therm_throt_en);
}

void __init therm_lvt_init(void)
{
/*
* This function is only called on boot CPU. Save the init thermal
* LVT value on BSP and use that value to restore APs' thermal LVT
* entry BIOS programmed later
*/
if (intel_thermal_supported(&boot_cpu_data))
lvtthmr_init = apic_read(APIC_LVTTHMR);
}

void intel_init_thermal(struct cpuinfo_x86 *c)
{
unsigned int cpu = smp_processor_id();
Expand All @@ -630,10 +641,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
if (!intel_thermal_supported(c))
return;

/* On the BSP? */
if (c == &boot_cpu_data)
lvtthmr_init = apic_read(APIC_LVTTHMR);

/*
* First check if its enabled already, in which case there might
* be some SMM goo which handles it, so we can't even put a handler
Expand Down

0 comments on commit 3cbf4d6

Please sign in to comment.