Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (some/all of) the things Rosetta 2 does to achieve high x86_64 performance on Apple Silicon #5460

Open
adriancable opened this issue Jul 13, 2023 · 8 comments
Labels
enhancement New feature or request qemu QEMU related

Comments

@adriancable
Copy link

Hi all,
As is well known Rosetta 2 runs x86_64 code on Apple Silicon typically 4-5X faster than QEMU/UTM. These are big gains. This speed-up is due primarily to 3 things that QEMU/UTM does not do on macOS:

  1. Use of TSO mode (removes the need for memory fences)
  2. Return address prediction (rewriting CALL/RET with BL/RET with some extra housekeeping)
  3. Use of Apple Silicon ARM extensions to handle x86_64 flag setting directly (SETF8, SETF16, AXFLAG, XAFLAG)

#1 probably gives the largest gains and can be enabled from userland with the com.apple.private.oahd entitlement. Yes, this couldn't be done with the App Store version of UTM, but shouldn't cause any difficulties for the direct-download version.

#2 and #3 can be achieved without any special entitlements.

QEMU may or may not have interest in implementing these since they are Apple Silicon specific. Has UTM considered these for its QEMU fork? One or more of these changes could lead to large gains.

@adriancable adriancable added the enhancement New feature or request label Jul 13, 2023
@osy osy added the qemu QEMU related label Jul 13, 2023
@osy
Copy link
Contributor

osy commented Jul 13, 2023

See also #2366

@osy
Copy link
Contributor

osy commented Jul 13, 2023

We currently have no plans to support hardware acceleration of the system emulator. We recommend installing an ARM64 operating system in virtualization mode and using an x86_64 emulator there. It will give you much greater performance than anything that can be achieved at the system level.

Also FWIW I don’t think these hw optimizations will make drastic improvements because from past discussions, the real bottleneck in QEMU system emulation is in the FPU and MMU emulation.

@adriancable
Copy link
Author

adriancable commented Jul 13, 2023

@osy - if the bottleneck is in FPU/MMU emulation, what is it that Rosetta 2 does differently to QEMU in this regard? (And what are the principal challenges to doing something similar on QEMU/UTM?)

Yes, using an ARM64 OS + x86_64 covers some of the use cases (e.g. running x86_64 Linux binaries using Rosetta 2 on ARM64 Linux). But if you have an ARM64 OS then QEMU/UTM isn't really needed at all - you can use any number of the other virtualization solutions out there (free + commercial). My expectation is that, like me, many/most people using QEMU/UTM on Apple Silicon are using it for the x86_64 system emulation, for which there are no (free or commercial) alternatives. So maybe the answer here is that, since there are no alternatives and hence no market pressure, QEMU/UTM doesn't really need to improve x86_64 emulation performance to remain competitive. I don't know of course if this is the thought process behind the QEMU and UTM teams, but it would make economic sense if it were.

In my own case: I regularly make use of an x86 build of QNX, along with a custom build of Linux using an older kernel/libraries (it's for embedded system dev which, like most embedded systems, is stuck where it is in terms of OS versioning). In neither case is there a usable ARM64 + x86_64 user mode equivalent.

@osy
Copy link
Contributor

osy commented Jul 13, 2023

Rosetta 2 does not do system level emulation. It makes no attempt to emulate system instructions at all. Additionally Rosetta has one source and target. QEMU has multiple source architectures and multiple target architectures. That means the IR it generates does not provide for as many optimization opportunities. The work to just improve instruction emulation would be an insane amount of work and I’m saying it’s not obvious we’ll even see as much of an improvement provided that the softmmu is pretty slow. That’s why I think something like FEX or box64 is a better approach as it’s similar to Rosetta 2 in that it’s userland only.

@adriancable
Copy link
Author

@osy - thanks for the time you took to write a detailed explanation. I appreciate it! Unfortunately userland approaches won't work in my use case, but I am still very appreciative of everything QEMU/UTM can do, even with the performance limitations.

@ylluminate
Copy link

ylluminate commented Aug 27, 2023

While probably not entirely relevant here since it's only for x86_64 on arm64 Linux distros, it is interesting that Parallels 19 leverages Rosetta 2 to execute x86_64 thereon, and especially advertised for Docker use. I would still like to see a box64 evaluation and/or integration in some fashion if feasible. Hmm.

@gmerlino
Copy link

gmerlino commented Sep 7, 2023

While probably not entirely relevant here since it's only for x86_64 on arm64 Linux distros, it is interesting that Parallels 19 leverages Rosetta 2 to execute x86_64 thereon, and especially advertised for Docker use. I would still like to see a box64 evaluation and/or integration in some fashion if feasible. Hmm.

Unrelated to the OP, what would be needed to support Rosetta 2 under QEMU virtualization @osy?

The latest Parallels supports Rosetta 2, thus Rosetta usage is definitely not tied to AVF somehow.

@msmuenchen
Copy link

I filed a ticket upstream at QEMU: https://gitlab.com/qemu-project/qemu/-/issues/2295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request qemu QEMU related
Projects
None yet
Development

No branches or pull requests

5 participants