Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable additional architectural optimizations in release #1492

Open
mborgerson opened this issue Jul 21, 2023 · 5 comments
Open

Enable additional architectural optimizations in release #1492

mborgerson opened this issue Jul 21, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@mborgerson
Copy link
Member

mborgerson commented Jul 21, 2023

Feature Request

  • We should optimize releases to run well on the majority of target machines. Thankfully the Steam Hardware Survey can enlighten us to which architectural features likely xemu users have.
    • Consider creating an xemu hardware survey to get a more accurate picture of what machines xemu is actually running on.
  • xemu does already opportunistically generate guest emulation code using hw simd features, but is otherwise compiled without additional architecture flags, with exception of macOS x86-64 builds which currently targets IvyBridge.
  • Default architecture features can be listed with gcc -Q --help=target. Looks like for x86-64, SSE2 is enabled by default. We probably want to enable SSE4.2/AVX2/etc.
  • We should be careful not to force these compiler optimizations when building from source, e.g. build.sh should not hardcode IvyBridge target for macOS x86-64 builds #1485.
  • Consider providing legacy hardware releases for older machines.
    • Consider macOS Universal style multiversion binary.
  • Add startup feature check that provides a helpful error message--not crashing on unsupported instruction.

Alternatives

No response

Additional Context

No response

@mborgerson mborgerson added the enhancement New feature or request label Jul 21, 2023
@insaneninja117
Copy link

insaneninja117 commented Jul 22, 2023

When it comes to supported instruction sets:

  • The very first iterations of AMD Athlon 64 have SSE2 as their maximum
  • Later AMD Athlon 64 and All Intel 64 Capable CPUS support up to SSE3
  • First iteration of Core 2 Duo and Core 2 Quad introduced SSSE3
  • Second Iteration of Core 2 Duo and Core 2 Quad introduced SSE4.1
  • All Intel Core ix and AMD FX and newer CPUs support SSE4.2
  • Intel Sandy Bridge (Core ix 2xxx series) and AMD FX (Bulldozer) introduced AVX
  • AVX2 introduced with Intel Haswell (Core ix 4xxx series) and final iteration of AMD FX (Piledriver architecture)

Bumping up the minimum instruction set from SSE2 to 3 would lose virtually zero users (but the gains would likely be small), SSE4.2 has been around for almost 15 years now and AVX2 10 so it is a good idea to introduce these optimisations. But if it is of little trouble then keeping a lower end .exe would be best.

@Spidy123222
Copy link

Spidy123222 commented Jul 23, 2023

When it comes to supported instruction sets:

  • The very first iterations of AMD Athlon 64 have SSE2 as their maximum
  • Later AMD Athlon 64 and All Intel 64 Capable CPUS support up to SSE3
  • First iteration of Core 2 Duo and Core 2 Quad introduced SSSE3
  • Second Iteration of Core 2 Duo and Core 2 Quad introduced SSE4.1
  • All Intel Core ix and AMD FX and newer CPUs support SSE4.2
  • Intel Sandy Bridge (Core ix 2xxx series) and AMD FX (Bulldozer) introduced AVX
  • AVX2 introduced with Intel Haswell (Core ix 4xxx series) and final iteration of AMD FX (Piledriver architecture)

Bumping up the minimum instruction set from SSE2 to 3 would lose virtually zero users (but the gains would likely be small), SSE4.2 has been around for almost 15 years now and AVX2 10 so it is a good idea to introduce these optimisations. But if it is of little trouble then keeping a lower end .exe would be best.

In this case compiling using x86-64-v3 and adding avx512f optimization and avx2 optimization would cover everything we would want since CPUs that don’t have avx2 won’t be able to run xemu anyways that well and if you want slightly lower can get avx.

@insaneninja117
Copy link

One important caveat is that while AVX was introduced with Sandy Bridge (Core ix 2xxx) and AVX2 with Haswell (Core ix 4xxx) processors, the lower end Celeron and Pentium models had no AVX/2 support until Alder Lake (Core ix 12xxx). However XEMU would likely remain playable on those CPUs even without AVX due to SSE4.2 support and all of the other improvements successive generations brought, and locking them out entirely would not be a user friendly move in my opinion.

@Spidy123222
Copy link

I did my own testing and compiling avx512f and avx2 in steamdeck which doesn't have avx512 it gets illegal instruction error. Other emulators like rpcs3 can toggle it on and off in app so I wonder if somthing could be similar and just compile to x86-64-v3 which goes up to avx2. But like what person above says that until alderlake came lower end CPUs didn't have avx2. Is there some way to have All these options. From testing xemu compiling with these I do get like 4 fps increase on most stuff even with enabling optimization on avx/avx2.

@Calinou
Copy link

Calinou commented Nov 5, 2023

SSE4.2 should be a pretty good baseline as it's also available on older Celerons/Pentiums, which don't have AVX let alone AVX2 (these were disabled for market segmentation reasons).

If your CPU doesn't support SSE4.2, it's unlikely to be able to run any game in xemu at fullspeed. (Even on CPUs that support SSE4.2 but not AVX, this is a tall order already – but I can see some 2D games running acceptably in that case.)

Is there some way to have All these options

Dynamic dispatch makes it possible to use modern instruction sets on demand, but it can't be used for compiler autovectorization (which is what this issue is about). You can only pick one set of instructions per binary1, and it's a hard requirement.

Therefore, if you wanted to optionally support newer instruction sets, you'd have to distribute 2 binaries (e.g. SSE4.2 and AVX2). PCSX2 did this a while ago but recently stopped, likely due to the maintenance cost and longer build times.

Footnotes

  1. You can set it on a per-file basis when compiling, but the final binary will have the requirements of the most demanding instruction set used during compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants