Skip to content

Commit

Permalink
x86/kconfig: Enable additional cpu optimizations for gcc v10.1+ kerne…
Browse files Browse the repository at this point in the history
…l v5.8

WARNING
This patch works with gcc versions 10.1+ and with kernel version 5.8 and should
NOT be applied when compiling on older versions of gcc due to key name changes
of the march flags introduced with the version 4.9 release of gcc.[1]

Use the older version of this patch hosted on the same github for older
versions of gcc.

FEATURES
This patch adds additional CPU options to the Linux kernel accessible under:
 Processor type and features  --->
  Processor family --->

The expanded microarchitectures include:
* AMD Improved K8-family
* AMD K10-family
* AMD Family 10h (Barcelona)
* AMD Family 14h (Bobcat)
* AMD Family 16h (Jaguar)
* AMD Family 15h (Bulldozer)
* AMD Family 15h (Piledriver)
* AMD Family 15h (Steamroller)
* AMD Family 15h (Excavator)
* AMD Family 17h (Zen)
* AMD Family 17h (Zen 2)
* Intel Silvermont low-power processors
* Intel Goldmont low-power processors (Apollo Lake and Denverton)
* Intel Goldmont Plus low-power processors (Gemini Lake)
* Intel 1st Gen Core i3/i5/i7 (Nehalem)
* Intel 1.5 Gen Core i3/i5/i7 (Westmere)
* Intel 2nd Gen Core i3/i5/i7 (Sandybridge)
* Intel 3rd Gen Core i3/i5/i7 (Ivybridge)
* Intel 4th Gen Core i3/i5/i7 (Haswell)
* Intel 5th Gen Core i3/i5/i7 (Broadwell)
* Intel 6th Gen Core i3/i5/i7 (Skylake)
* Intel 6th Gen Core i7/i9 (Skylake X)
* Intel 8th Gen Core i3/i5/i7 (Cannon Lake)
* Intel 10th Gen Core i7/i9 (Ice Lake)
* Intel Xeon (Cascade Lake)
* Intel Xeon (Cooper Lake)
* Intel 3rd Gen 10nm++  i3/i5/i7/i9-family (Tiger Lake)

It also offers to compile passing the 'native' option which, "selects the CPU
to generate code for at compilation time by determining the processor type of
the compiling machine. Using -march=native enables all instruction subsets
supported by the local machine and will produce code optimized for the local
machine under the constraints of the selected instruction set."[2]

Do NOT try using the 'native' option on AMD Piledriver, Steamroller, or
Excavator CPUs (-march=bdver{2,3,4} flag). The build will error out due the
kernel's objtool issue with these.[3a,b]

MINOR NOTES
This patch also changes 'atom' to 'bonnell' in accordance with the gcc v4.9
changes. Note that upstream is using the deprecated 'match=atom' flags when I
believe it should use the newer 'march=bonnell' flag for atom processors.[4]

It is not recommended to compile on Atom-CPUs with the 'native' option.[5] The
recommendation is to use the 'atom' option instead.

BENEFITS
Small but real speed increases are measurable using a make endpoint comparing
a generic kernel to one built with one of the respective microarchs.

See the following experimental evidence supporting this statement:
https://github.com/graysky2/kernel_gcc_patch

REQUIREMENTS
linux version >=5.8
gcc version >=10.1

ACKNOWLEDGMENTS
This patch builds on the seminal work by Jeroen.[6]

REFERENCES
1.  https://gcc.gnu.org/gcc-4.9/changes.html
2.  https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
3a. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95671#c11
3b. https://github.com/graysky2/kernel_gcc_patch/issues/55
4.  https://bugzilla.kernel.org/show_bug.cgi?id=77461
5.  https://github.com/graysky2/kernel_gcc_patch/issues/15
6.  http://www.linuxforge.net/docs/linux/linux-gcc.php
  • Loading branch information
graysky2 authored and xanmod committed Oct 15, 2020
1 parent 324aac9 commit 9562989
Show file tree
Hide file tree
Showing 4 changed files with 407 additions and 35 deletions.
301 changes: 271 additions & 30 deletions arch/x86/Kconfig.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ config MPENTIUMM
config MPENTIUM4
bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
depends on X86_32
select X86_P6_NOP
help
Select this for Intel Pentium 4 chips. This includes the
Pentium 4, Pentium D, P4-based Celeron and Xeon, and
Expand Down Expand Up @@ -155,30 +156,107 @@ config MPENTIUM4
-Paxville
-Dempsey


config MK6
bool "K6/K6-II/K6-III"
bool "AMD K6/K6-II/K6-III"
depends on X86_32
help
Select this for an AMD K6-family processor. Enables use of
some extended instructions, and passes appropriate optimization
flags to GCC.

config MK7
bool "Athlon/Duron/K7"
bool "AMD Athlon/Duron/K7"
depends on X86_32
help
Select this for an AMD Athlon K7-family processor. Enables use of
some extended instructions, and passes appropriate optimization
flags to GCC.

config MK8
bool "Opteron/Athlon64/Hammer/K8"
bool "AMD Opteron/Athlon64/Hammer/K8"
help
Select this for an AMD Opteron or Athlon64 Hammer-family processor.
Enables use of some extended instructions, and passes appropriate
optimization flags to GCC.

config MK8SSE3
bool "AMD Opteron/Athlon64/Hammer/K8 with SSE3"
help
Select this for improved AMD Opteron or Athlon64 Hammer-family processors.
Enables use of some extended instructions, and passes appropriate
optimization flags to GCC.

config MK10
bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
help
Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
Enables use of some extended instructions, and passes appropriate
optimization flags to GCC.

config MBARCELONA
bool "AMD Barcelona"
help
Select this for AMD Family 10h Barcelona processors.

Enables -march=barcelona

config MBOBCAT
bool "AMD Bobcat"
help
Select this for AMD Family 14h Bobcat processors.

Enables -march=btver1

config MJAGUAR
bool "AMD Jaguar"
help
Select this for AMD Family 16h Jaguar processors.

Enables -march=btver2

config MBULLDOZER
bool "AMD Bulldozer"
help
Select this for AMD Family 15h Bulldozer processors.

Enables -march=bdver1

config MPILEDRIVER
bool "AMD Piledriver"
help
Select this for AMD Family 15h Piledriver processors.

Enables -march=bdver2

config MSTEAMROLLER
bool "AMD Steamroller"
help
Select this for AMD Family 15h Steamroller processors.

Enables -march=bdver3

config MEXCAVATOR
bool "AMD Excavator"
help
Select this for AMD Family 15h Excavator processors.

Enables -march=bdver4

config MZEN
bool "AMD Zen"
help
Select this for AMD Family 17h Zen processors.

Enables -march=znver1

config MZEN2
bool "AMD Zen 2"
help
Select this for AMD Family 17h Zen 2 processors.

Enables -march=znver2

config MCRUSOE
bool "Crusoe"
depends on X86_32
Expand Down Expand Up @@ -260,6 +338,7 @@ config MVIAC7

config MPSC
bool "Intel P4 / older Netburst based Xeon"
select X86_P6_NOP
depends on X86_64
help
Optimize for Intel Pentium 4, Pentium D and older Nocona/Dempsey
Expand All @@ -269,23 +348,171 @@ config MPSC
using the cpu family field
in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.

config MATOM
bool "Intel Atom"
select X86_P6_NOP
help

Select this for the Intel Atom platform. Intel Atom CPUs have an
in-order pipelining architecture and thus can benefit from
accordingly optimized code. Use a recent GCC with specific Atom
support in order to fully benefit from selecting this option.

config MCORE2
bool "Core 2/newer Xeon"
bool "Intel Core 2"
select X86_P6_NOP
help

Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
53xx) CPUs. You can distinguish newer from older Xeons by the CPU
family in /proc/cpuinfo. Newer ones have 6 and older ones 15
(not a typo)

config MATOM
bool "Intel Atom"
Enables -march=core2

config MNEHALEM
bool "Intel Nehalem"
select X86_P6_NOP
help

Select this for the Intel Atom platform. Intel Atom CPUs have an
in-order pipelining architecture and thus can benefit from
accordingly optimized code. Use a recent GCC with specific Atom
support in order to fully benefit from selecting this option.
Select this for 1st Gen Core processors in the Nehalem family.

Enables -march=nehalem

config MWESTMERE
bool "Intel Westmere"
select X86_P6_NOP
help

Select this for the Intel Westmere formerly Nehalem-C family.

Enables -march=westmere

config MSILVERMONT
bool "Intel Silvermont"
select X86_P6_NOP
help

Select this for the Intel Silvermont platform.

Enables -march=silvermont

config MGOLDMONT
bool "Intel Goldmont"
select X86_P6_NOP
help

Select this for the Intel Goldmont platform including Apollo Lake and Denverton.

Enables -march=goldmont

config MGOLDMONTPLUS
bool "Intel Goldmont Plus"
select X86_P6_NOP
help

Select this for the Intel Goldmont Plus platform including Gemini Lake.

Enables -march=goldmont-plus

config MSANDYBRIDGE
bool "Intel Sandy Bridge"
select X86_P6_NOP
help

Select this for 2nd Gen Core processors in the Sandy Bridge family.

Enables -march=sandybridge

config MIVYBRIDGE
bool "Intel Ivy Bridge"
select X86_P6_NOP
help

Select this for 3rd Gen Core processors in the Ivy Bridge family.

Enables -march=ivybridge

config MHASWELL
bool "Intel Haswell"
select X86_P6_NOP
help

Select this for 4th Gen Core processors in the Haswell family.

Enables -march=haswell

config MBROADWELL
bool "Intel Broadwell"
select X86_P6_NOP
help

Select this for 5th Gen Core processors in the Broadwell family.

Enables -march=broadwell

config MSKYLAKE
bool "Intel Skylake"
select X86_P6_NOP
help

Select this for 6th Gen Core processors in the Skylake family.

Enables -march=skylake

config MSKYLAKEX
bool "Intel Skylake X"
select X86_P6_NOP
help

Select this for 6th Gen Core processors in the Skylake X family.

Enables -march=skylake-avx512

config MCANNONLAKE
bool "Intel Cannon Lake"
select X86_P6_NOP
help

Select this for 8th Gen Core processors

Enables -march=cannonlake

config MICELAKE
bool "Intel Ice Lake"
select X86_P6_NOP
help

Select this for 10th Gen Core processors in the Ice Lake family.

Enables -march=icelake-client

config MCASCADELAKE
bool "Intel Cascade Lake"
select X86_P6_NOP
help

Select this for Xeon processors in the Cascade Lake family.

Enables -march=cascadelake

config MCOOPERLAKE
bool "Intel Cooper Lake"
select X86_P6_NOP
help

Select this for Xeon processors in the Cooper Lake family.

Enables -march=cooperlake

config MTIGERLAKE
bool "Intel Tiger Lake"
select X86_P6_NOP
help

Select this for third-generation 10 nm process processors in the Tiger Lake family.

Enables -march=tigerlake

config GENERIC_CPU
bool "Generic-x86-64"
Expand All @@ -294,6 +521,19 @@ config GENERIC_CPU
Generic x86-64 CPU.
Run equally well on all x86-64 CPUs.

config MNATIVE
bool "Native optimizations autodetected by GCC"
help

GCC 4.2 and above support -march=native, which automatically detects
the optimum settings to use based on your processor. -march=native
also detects and applies additional settings beyond -march specific
to your CPU, (eg. -msse4). Unless you have a specific reason not to
(e.g. distcc cross-compiling), you should probably be using
-march=native rather than anything listed below.

Enables -march=native

endchoice

config X86_GENERIC
Expand All @@ -318,7 +558,7 @@ config X86_INTERNODE_CACHE_SHIFT
config X86_L1_CACHE_SHIFT
int
default "7" if MPENTIUM4 || MPSC
default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
default "6" if MK7 || MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || MEXCAVATOR || MZEN || MZEN2 || MJAGUAR || MPENTIUMM || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MNATIVE || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
default "4" if MELAN || M486SX || M486 || MGEODEGX1
default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX

Expand All @@ -336,35 +576,36 @@ config X86_ALIGNMENT_16

config X86_INTEL_USERCOPY
def_bool y
depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK8SSE3 || MK7 || MEFFICEON || MCORE2 || MK10 || MBARCELONA || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MNATIVE

config X86_USE_PPRO_CHECKSUM
def_bool y
depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MATOM || MNATIVE

config X86_USE_3DNOW
def_bool y
depends on (MCYRIXIII || MK7 || MGEODE_LX) && !UML

#
# P6_NOPs are a relatively minor optimization that require a family >=
# 6 processor, except that it is broken on certain VIA chips.
# Furthermore, AMD chips prefer a totally different sequence of NOPs
# (which work on all CPUs). In addition, it looks like Virtual PC
# does not understand them.
#
# As a result, disallow these if we're not compiling for X86_64 (these
# NOPs do work on all x86-64 capable chips); the list of processors in
# the right-hand clause are the cores that benefit from this optimization.
#
config X86_P6_NOP
def_bool y
depends on X86_64
depends on (MCORE2 || MPENTIUM4 || MPSC)
default n
bool "Support for P6_NOPs on Intel chips"
depends on (MCORE2 || MPENTIUM4 || MPSC || MATOM || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MNATIVE)
help
P6_NOPs are a relatively minor optimization that require a family >=
6 processor, except that it is broken on certain VIA chips.
Furthermore, AMD chips prefer a totally different sequence of NOPs
(which work on all CPUs). In addition, it looks like Virtual PC
does not understand them.

As a result, disallow these if we're not compiling for X86_64 (these
NOPs do work on all x86-64 capable chips); the list of processors in
the right-hand clause are the cores that benefit from this optimization.

Say Y if you have Intel CPU newer than Pentium Pro, N otherwise.

config X86_TSC
def_bool y
depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) || X86_64
depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MNATIVE || MATOM) || X86_64

config X86_CMPXCHG64
def_bool y
Expand All @@ -374,7 +615,7 @@ config X86_CMPXCHG64
# generates cmov.
config X86_CMOV
def_bool y
depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
depends on (MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MSTEAMROLLER || MEXCAVATOR || MZEN || MZEN2 || MJAGUAR || MK7 || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MGOLDMONT || MGOLDMONTPLUS || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MSKYLAKE || MSKYLAKEX || MCANNONLAKE || MICELAKE || MCASCADELAKE || MCOOPERLAKE || MTIGERLAKE || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)

config X86_MINIMUM_CPU_FAMILY
int
Expand Down

0 comments on commit 9562989

Please sign in to comment.