Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[edk2-devel] [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch -- push #843

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

lersek
Copy link
Member

@lersek lersek commented Jul 31, 2020

http://mid.mail-archive.com/20200729185217.10084-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/63454

Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
APHandler(), and SmiRendezvous(). However, the "main wait" within
APHandler():

>     //
>     // Wait for something to happen
>     //
>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);

doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
without pausing.

The performance impact is especially notable in QEMU/KVM + OVMF
virtualization with CPU overcommit (that is, when the guest has
significantly more VCPUs than the host has physical CPUs). The guest BSP
is working heavily in:

  BSPHandler()                  [MpService.c]
    PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
      SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]

while the many guest APs are spinning in the "Wait for something to
happen" semaphore acquisition, in APHandler(). The guest APs are
generating useless memory traffic and saturating host CPUs, hindering the
guest BSP's progress in SetUefiMemMapAttributes().

Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
after the first check fails. Due to Pause Loop Exiting (known as Pause
Filter on AMD), the host scheduler can favor the guest BSP over the guest
APs.

Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
less than 4 minutes.

The patch should benefit physical machines as well -- according to the
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
PAUSE to the generic WaitForSemaphore() function is considered a general
improvement.

Cc: Eric Dong <eric.dong@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>

Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
APHandler(), and SmiRendezvous(). However, the "main wait" within
APHandler():

>     //
>     // Wait for something to happen
>     //
>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);

doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
without pausing.

The performance impact is especially notable in QEMU/KVM + OVMF
virtualization with CPU overcommit (that is, when the guest has
significantly more VCPUs than the host has physical CPUs). The guest BSP
is working heavily in:

  BSPHandler()                  [MpService.c]
    PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
      SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]

while the many guest APs are spinning in the "Wait for something to
happen" semaphore acquisition, in APHandler(). The guest APs are
generating useless memory traffic and saturating host CPUs, hindering the
guest BSP's progress in SetUefiMemMapAttributes().

Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
after the first check fails. Due to Pause Loop Exiting (known as Pause
Filter on AMD), the host scheduler can favor the guest BSP over the guest
APs.

Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
less than 4 minutes.

The patch should benefit physical machines as well -- according to the
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
PAUSE to the generic WaitForSemaphore() function is considered a general
improvement.

Cc: Eric Dong <eric.dong@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
@lersek lersek added the push Auto push patch series in PR if all checks pass label Jul 31, 2020
@mergify mergify bot merged commit 9001b75 into tianocore:master Jul 31, 2020
@lersek lersek deleted the sem_wait_pause_rhbz1861718_push branch July 31, 2020 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
push Auto push patch series in PR if all checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant