Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMULauncher uses 200% CPU #5833

Open
pernatiy opened this issue Oct 26, 2023 · 9 comments
Open

QEMULauncher uses 200% CPU #5833

pernatiy opened this issue Oct 26, 2023 · 9 comments

Comments

@pernatiy
Copy link

Describe the issue
After host suspend/resume, without stopping VM, sometimes QEMULauncher uses 200% CPU and network in VM doesn't work. It feels like issues less reproducible after upgrade to Sonoma and UTM 4.3/4.4 but it is still there.

I believe it is closely related to #4803

I use VM as headless but have serial terminal connected. To work around this issue I log in to the guest over serial, it works perfectly fine, no CPU usage reported but network is completely dead. Simple turn off/on of the network interface(using something like ip link set enp0s1 down && ip link set enp0s1 up) lead to the interface reinitialisation and online fix of the QEMU network stack.

Configuration

  • UTM Version: 4.4.3
  • macOS Version: 14.1
  • Mac Chip (Intel, M1, ...): M1

Crash log
N/A

Debug log
Sampling taken during the issue: QEMULauncher-4.4.3.txt

Upload VM
Was unable to reproduce this issue on general, fresh, VM. But, maybe I just need more patience.

@pernatiy
Copy link
Author

Hit the issue again today. In sampling report call stack goes through vmnet and getsockopt is responsible for 100% CPU but thread marked as "org.qemu.vmnet.if_queue (serial)". What is that "serial" about? Could that be a problem because I use serial connection to the VM instead of "normal" console?
image
QEMULauncher-4.4.3-1.txt

@odysseusjak
Copy link

This is happening to me, too. I'm running 4.4.4, though, on a MBP running an M2 chip. I have four different VMs and it doesn't matter which one I use, the launcher peaks at over 200% CPU. Then, after a moment, it will calm back down to "normal" readings. However the VM is unstable -- the mouse and keyboard are jittery and slow to respond. This happens even at the login screen.

@pernatiy
Copy link
Author

Still an issue on MacOS 14.2 / UTM 4.4.4
QEMULauncher-4.4.4.txt

@pernatiy
Copy link
Author

Turns out that issue doesn't tied to host suspend/resume cycle. The issue happened online during normal workload. Interestingly, QEMULauncher has worked as usual but network was already dead at that moment. Only after some time CPU usage hit 100% with getsockopt queries and after some more time raised to 200% (I believe that has something to do with dispatch queue but not sure).

@justin-petermann
Copy link

justin-petermann commented Dec 23, 2023

[SOLVED may be]
I don't know if it's exaclty the same problem but i found this with same symptomes

The problem appears when one of the network interfaces is not configured.
VM OS client debian12/qemu-guest-agent installed :

2 interfaces configured for VM, only one UP in VM (/etc/network/interfaces not configured with 1 interface) Other is DOWN
% utmctl ip-address 95F7BD43-0C02-4F19-BD8C-33C7AXXXXXX
10.1.0.156
fe80::1c00:7aff:fe8e:c809
=>> CPU 200%

2 interfaces configured for VM, only one UP in VM (/etc/network/interfaces not configured with 1 interface)
% utmctl ip-address 95F7BD43-0C02-4F19-BD8C-33C7AXXXXXX
10.1.0.156
192.168.128.3
fe80::1c00:7aff:fe8e:c809
fe80::3c39:1dff:fef4:dfc9
==>> CPU avg 6%

hope this will help somebody.

@pernatiy pernatiy changed the title QEMULauncher uses 200% CPU after host resume QEMULauncher uses 200% CPU Dec 27, 2023
@ballo
Copy link

ballo commented Jan 30, 2024

I'm having the same issue with one Linux install. 200% CPU when the Linux guest is idling. Another distro shows 20% when idle. Both too high!

Since someone mentioned network, the one at 200% is using host-only networking if that matters.

@jul-m
Copy link

jul-m commented Feb 21, 2024

I have the same problem (QEMULauncher at 200% in addition to the activity of the VM), with a Windows 11 ARM64 VM on a MacBook Air M2, in the case where I start the VM with the network card disabled (at OS level), or after a suspend / resume, always with the network card disabled. When disabling the network card without suspending or shutting down, the problem occurs after a random duration. Simply re-enabling the network card fixes the problem.

I hadn't made the report with the network until I came across this issue (I noticed the problem during offline testing). The problem therefore seems to occur when the network part is no longer functional.

  • Version 4.4.5 (via brew)
  • MacBook Air M2 2022
  • MacOS 14.3.1 (23D60)

@pernatiy
Copy link
Author

pernatiy commented Feb 23, 2024

I've tried some workarounds, in particular to change emulated network card, in hope that problem specific to some card-dependent emulation code. First I've tried different flavours of virtio (like virtio-net-pci-non-transitional) without success. Then I've tried e1000e - it seems that issue reproduction rate is lower, but, given its generally low reproducibility rate, I'm not sure. Also, once, I believe, issue has happened and system was able to recover but recovery happened in a bit unusual way:

[472698.942050] ------------[ cut here ]------------
[472698.945489] NETDEV WATCHDOG: enp0s1 (e1000e): transmit queue 0 timed out 7768 ms
[472698.948158] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x29c/0x2b0
[472698.950987] Modules linked in: 9p fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat 9pnet_virtio e1000e 9pnet virtio_balloon joydev fuse loop zram xfs uas usb_storage crct10dif_ce polyval_ce polyval_generic ghash_ce sha3_ce sha512_ce sha512_arm64 virtio_console virtio_blk virtio_mmio scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath qemu_fw_cfg
[472698.961802] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.13-100.fc38.aarch64 #1
[472698.962458] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[472698.962898] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[472698.963343] pc : dev_watchdog+0x29c/0x2b0
[472698.963606] lr : dev_watchdog+0x29c/0x2b0
[472698.963868] sp : ffff800080003db0
[472698.964307] x29: ffff800080003db0 x28: ffffdbb20c82d4f0 x27: ffff800080003e80
[472698.965268] x26: ffffdbb20dc01008 x25: 0000000000001e58 x24: ffffdbb20e3d7000
[472698.965702] x23: 0000000000000000 x22: ffff43867646841c x21: ffff438676468000
[472698.966126] x20: ffff438677c58400 x19: ffff4386764684c8 x18: 0000000000000006
[472698.966565] x17: 3637372074756f20 x16: 64656d6974203020 x15: 6575657571207469
[472698.966996] x14: 6d736e617274203a x13: 205d393834353439 x12: 2e3839363237345b
[472698.967429] x11: 00000000ffffdfff x10: ffffdbb20e4ce560 x9 : ffffdbb20bb02cdc
[472698.967937] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
[472698.968469] x5 : ffff43867eb4d988 x4 : 0000000000000000 x3 : 0000000000000000
[472698.968911] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffdbb20e3f3f40
[472698.969351] Call trace:
[472698.969506]  dev_watchdog+0x29c/0x2b0
[472698.969731]  call_timer_fn+0x3c/0x1c8
[472698.970349]  __run_timers+0x264/0x338
[472698.970565]  run_timer_softirq+0x28/0x50
[472698.970814]  __do_softirq+0x120/0x394
[472698.971065]  ____do_softirq+0x18/0x30
[472698.971308]  call_on_irq_stack+0x24/0x30
[472698.971567]  do_softirq_own_stack+0x24/0x38
[472698.971903]  __irq_exit_rcu+0x110/0x120
[472698.972590]  irq_exit_rcu+0x18/0x30
[472698.973365]  el1_interrupt+0x38/0x88
[472698.975231]  el1h_64_irq_handler+0x18/0x28
[472698.975623]  el1h_64_irq+0x68/0x70
[472698.976144]  cpuidle_idle_call+0xb0/0x1b0
[472698.976942]  do_idle+0xa8/0xf8
[472698.977272]  cpu_startup_entry+0x3c/0x50
[472698.977747]  rest_init+0x100/0x108
[472698.978084]  arch_call_rest_init+0x18/0x20
[472698.978973]  start_kernel+0x334/0x410
[472698.979355]  __primary_switched+0xbc/0xd0
[472698.979879] ---[ end trace 0000000000000000 ]---
[472698.980653] e1000e 0000:00:01.0 enp0s1: Reset adapter unexpectedly
[472701.823312] e1000e 0000:00:01.0 enp0s1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Essentially driver performed automatically ip link down/ip link up cycle.

Now I've switched to e100 driver (i82801 in the network card list).

If someone posses a knowledge about codebase and can suggest better candidate to test will be glad to hear.

@xcl706
Copy link

xcl706 commented Mar 18, 2024

I also met this issue after host suspend/resume on Mac OS 14.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants