Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in QEMULauncher causes macOS to suspend UTM #4958

Open
legsak1mbo opened this issue Jan 20, 2023 · 63 comments
Open

Memory leak in QEMULauncher causes macOS to suspend UTM #4958

legsak1mbo opened this issue Jan 20, 2023 · 63 comments
Milestone

Comments

@legsak1mbo
Copy link

legsak1mbo commented Jan 20, 2023

Describe the issue
QEMULauncher appears to leak memory. Running a fairly simple Debian VM, about once a week I find that the host has suspended UTM having run out of memory and I have to force quit it.

Configuration
UTM Version: 4.1.5 (74)
macOS Version: 13.0.1 (22A400)
Mac Chip: M1
Memory: 16GB
VM: Debian 5.10.0-19-arm64

Crash log
No crash log, UTM has to be force quit.

Debug log
As above, I can't get the debug log as UTM is unresponsive and has to be force quit.

I've updated to UTM 4.1.5 but the issue remains.

Note my symptoms are similar to #4449 which is marked as closed. I'm running a fairly simple Debian VM which is itself running Home Assistant. About once a week I find that QEMULauncher has used up all available memory and UTM has to be Force Quit.

This happened earlier and despite me rebooting the host and relaunching everything QEMULauncher I've watched the utilisation keep creeping up. It's currently sitting at ~4.7GB (the VM has 4GB assigned to it).

Running top on the VM shows 3293.7 MiB total with 1703.8 free and zero swap utilisation.

@osy
Copy link
Contributor

osy commented Jan 20, 2023

Running top on the VM shows 3293.7 MiB total with 1703.8 free and zero swap utilisation.

The question is when you see QEMULauncher’s memory use grow by X bytes, do you see a similar growth inside the VM?

@legsak1mbo
Copy link
Author

It would appear not. On this machine Activity monitor is showing that UTM is using 4.11GB and QEMULauncher has now crept up to 4.8GB. Utilisation has increased slightly on the VM but not by the same amount.

I'll keep monitoring and let you know.

@osy
Copy link
Contributor

osy commented Jan 20, 2023

Another test is to log out (not shut down) in the VM and log back in. If you repeat this after a long time, if the memory usage is more the second time then there’s some leak in QEMU. If it goes to the same value each time, there is a leak in your guest OS

@legsak1mbo
Copy link
Author

When you say "if the memory usage is more the second time" do you mean usage inside the VM or the host?

Perhaps I'm missing something about how QEMU/UTM should work but my expectation would be that even if there's a leak in the guest, it should never use more than the VM has configured (in this case 4GB). Is that not the case here?

@osy
Copy link
Contributor

osy commented Jan 21, 2023

do you mean usage inside the VM or the host?

On the host

Is that not the case here?

The configured ram is only what is visible to the guest. It does not include any internal resources held by QEMU including its caches and any graphical rendering resources.

@legsak1mbo
Copy link
Author

OK, thanks. So, just to be clear you're suggesting that I log out of the VM and then log back in (not the host?)

With regards to the configured RAM, that's what I thought. In which case I can't see how a leak within the guest OS would cause the host to suspend UTM due to it running out of memory? I would instead expect the guest to hit its 4GB limit and go through its own rounds of oomkiller internally.

I've just signed back in to the host again and UTM and QEMULauncher are both creeping up. In fact UTM itself is now using more than QEMULauncher - UTM is up to 5.5GB and QEMULauncher is now using almost 5GB. On the other hand, utilisation within the guest has increased by about 80MB. So perhaps its UTM itself which is leaking?

@osy
Copy link
Contributor

osy commented Jan 21, 2023

Log out and into the guest (Linux or whatever)

@legsak1mbo
Copy link
Author

OK, so it appears that the leak is actually in UTM and not QEMULauncher after all.

I've checked in on things this morning and whilst the guest's utilsation is fairly steady, QEMULauncher is up to 5.5GB but UTM itself is now using almost 13GB.

With regards to signing in and out of the guest, that's how I've been testing all along. Note that it's a console only VM with no X server running.

Let me know what else you need from me to help diagnose.

@osy
Copy link
Contributor

osy commented Jan 21, 2023

Please provide your VM’s config.plist. Also try using a display card without GL support and see if that helps.

Also, what OS are you running?

@legsak1mbo
Copy link
Author

Sure, no problem. Checking in again UTM had hit about 15.5GB. I've just powered off the VM and UTM has dropped to 7.18GB so things improved but that's still pretty high considering it's not running any VMs.

The VMs plist is as follows:-

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Backend</key>
	<string>QEMU</string>
	<key>ConfigurationVersion</key>
	<integer>4</integer>
	<key>Display</key>
	<array>
		<dict>
			<key>DownscalingFilter</key>
			<string>Linear</string>
			<key>DynamicResolution</key>
			<true/>
			<key>Hardware</key>
			<string>virtio-ramfb</string>
			<key>NativeResolution</key>
			<false/>
			<key>UpscalingFilter</key>
			<string>Nearest</string>
		</dict>
	</array>
	<key>Drive</key>
	<array>
		<dict>
			<key>Identifier</key>
			<string>76DEEFB4-470B-45BE-8098-6B6460DEEFEB</string>
			<key>ImageName</key>
			<string>76DEEFB4-470B-45BE-8098-6B6460DEEFEB.qcow2</string>
			<key>ImageType</key>
			<string>Disk</string>
			<key>Interface</key>
			<string>VirtIO</string>
		</dict>
	</array>
	<key>Information</key>
	<dict>
		<key>Icon</key>
		<string>linux</string>
		<key>IconCustom</key>
		<false/>
		<key>Name</key>
		<string>Home Assistant</string>
		<key>UUID</key>
		<string>F759508A-9BA4-4EC6-9AB4-2E4C447137A2</string>
	</dict>
	<key>Input</key>
	<dict>
		<key>MaximumUsbShare</key>
		<integer>5</integer>
		<key>UsbBusSupport</key>
		<string>3.0</string>
		<key>UsbSharing</key>
		<true/>
	</dict>
	<key>Network</key>
	<array>
		<dict>
			<key>Hardware</key>
			<string>virtio-net-pci</string>
			<key>IsolateFromHost</key>
			<false/>
			<key>MacAddress</key>
			<string>3E:BB:C1:13:CC:23</string>
			<key>Mode</key>
			<string>Bridged</string>
			<key>PortForward</key>
			<array/>
		</dict>
	</array>
	<key>QEMU</key>
	<dict>
		<key>AdditionalArguments</key>
		<array/>
		<key>BalloonDevice</key>
		<false/>
		<key>DebugLog</key>
		<false/>
		<key>Hypervisor</key>
		<true/>
		<key>PS2Controller</key>
		<false/>
		<key>RNGDevice</key>
		<true/>
		<key>RTCLocalTime</key>
		<false/>
		<key>TPMDevice</key>
		<false/>
		<key>UEFIBoot</key>
		<true/>
	</dict>
	<key>Serial</key>
	<array/>
	<key>Sharing</key>
	<dict>
		<key>ClipboardSharing</key>
		<true/>
		<key>DirectoryShareMode</key>
		<string>VirtFS</string>
		<key>DirectoryShareReadOnly</key>
		<false/>
	</dict>
	<key>Sound</key>
	<array>
		<dict>
			<key>Hardware</key>
			<string>intel-hda</string>
		</dict>
	</array>
	<key>System</key>
	<dict>
		<key>Architecture</key>
		<string>aarch64</string>
		<key>CPU</key>
		<string>default</string>
		<key>CPUCount</key>
		<integer>2</integer>
		<key>CPUFlagsAdd</key>
		<array/>
		<key>CPUFlagsRemove</key>
		<array/>
		<key>ForceMulticore</key>
		<false/>
		<key>JITCacheSize</key>
		<integer>0</integer>
		<key>MemorySize</key>
		<integer>4096</integer>
		<key>Target</key>
		<string>virt</string>
	</dict>
</dict>
</plist>

I'm just using the default display card with no GL support.

@osy
Copy link
Contributor

osy commented Jan 21, 2023

What OS are you running?

@legsak1mbo
Copy link
Author

As above:-

macOS Version: 13.0.1 (22A400)
...
VM: Debian 5.10.0-19-arm64

Let me know if you need more information about either.

@osy
Copy link
Contributor

osy commented Jan 21, 2023

I’m not sure what Debian 5.10.0-19 means. That looks like a kernel version? I’m curious what version of Debian (bookworm? sid?) so I can try to install it.

@legsak1mbo
Copy link
Author

Ah, OK my apologies. lsb_release -c reports Codename: bullseye

I used debian-11.5.0-arm64-netinst.iso for the install. Looks like 11.6 is the latest available now.

@legsak1mbo
Copy link
Author

Just a quick update on this. My HA devices (attached to the VM in question) were running slowly so I checked in on the host. As expected the host OS had suspended UTM. Screenshot attached.

Screenshot 2023-02-04 at 19 19 07

@osy osy added this to the v4.2 milestone Feb 26, 2023
@osy
Copy link
Contributor

osy commented Feb 26, 2023

@legsak1mbo I used debian-11.5.0-arm64-netinst.iso to create a brand new VM install (GNOME). I set my display card to virtio-ramfb. Leaving the VM idle for 20 minutes and I see UTM's memory usage hover around 250-275MB. Can you give more detailed steps to reproduce? Can you reproduce it using a brand new Debian install?

@legsak1mbo
Copy link
Author

Sure thing. I deployed the base OS (basic install, no window manager) and then ran the Home Assistant supervised installation exactly as described here: https://github.com/home-assistant/supervised-installer

Nothing more than that really.

I've just checked in again and according to Activity Monitory UTM is using 46.02GB. Top inside the VM itself reports ~1.5GB in use (with a total of 4GB assigned to the machine).

Strangely, despite Activity Monitor reporting that UTM is using that much memory, the bottom pane says that only 13.76GB is being used in total. I've attached screenshots of both of these.

Screenshot 2023-02-26 at 20 52 12

Screenshot 2023-02-26 at 20 52 01

Top on the host machine also reports 46G+ for UTM...

Screenshot 2023-02-26 at 20 58 16

@osy
Copy link
Contributor

osy commented Feb 26, 2023

How long does it take until you see the effects? It seems like it takes a day for it to run out of memory but what does it look like 10 minutes after launch? One hour? Trying to figure out if it’s steadily increasing or if there’s some trigger that blows it up.

@legsak1mbo
Copy link
Author

OK, so I stopped, quit and started UTM up about two hours ago. Started at around 125MB but it's already up to 1.75GB and climbing. I'll continue to monitor...

@osy
Copy link
Contributor

osy commented Feb 27, 2023

I set up another fresh Debian install. No window manager, followed the HA guide and installed HA. Kept it running overnight and it’s still at 120MB.

I think I need more info about your setup. Can you provide as much info about your host setup as possible? Include anything that could differ from a stock setup (any external devices plugged in for example). Did you chance any UTM global settings? Any OS modifications? Etc

@legsak1mbo
Copy link
Author

No OS modifications or global setting changes.

The VM does have a Conbee 2 - https://phoscon.de/en/conbee2 - attached. I could try disconnecting this and running for a while but it might have to be in the morning as it'll kill all my home "smarts" and might confuse the family somewhat.

UTM is up to 5.2GB and climbing at this point.

@osy
Copy link
Contributor

osy commented Feb 27, 2023

Oh, maybe I will test with a USB attached and have it overnight. Also: maybe you can clone the VM and start a second instance without the USB and see if that one also experiences the issue?

@legsak1mbo
Copy link
Author

Well, I've run the VM for most of today without the USB device attached and UTM is already up to 6.3GB used. I'm going to have to reboot and reattach the device but happy to try anything else.

@osy
Copy link
Contributor

osy commented Feb 28, 2023

I’ve also not been able to reproduce it with a USB camera attached overnight. I don’t think it’s related to USB. Are you still on 13.0.1 or have you update macOS?

@legsak1mbo
Copy link
Author

I've just updated macOS and will observe again. I'll report back shortly.

@legsak1mbo
Copy link
Author

Already up to 3GB and growing. I'll check back on it in the morning.

@osy
Copy link
Contributor

osy commented Mar 1, 2023

Does this happen with every VM? Does it happen only for virtualized or for emulated VMs? Does it happen for Apple Virtualization? Can you use one of the VMs from the gallery?

For comparison, my Debian VM running home assistant has been up for 48 hours and it’s still at 78MB

@osy
Copy link
Contributor

osy commented Mar 4, 2023

@legsak1mbo can you answer my comment above? Still need help root causing this as I'm not able to reproduce it on multiple machines with both Intel, M1, and M2

@legsak1mbo
Copy link
Author

Sure. I'm just trying to find some time to setup another VM to test.

@hilmi-ica
Copy link

I did, and it didn't work, still eat up 12 GB of memory.
config 2.txt

@osy
Copy link
Contributor

osy commented Apr 19, 2023

Is it possible for you to share the full VM? If there’s sensitive info, can you create a fresh VM and reproduce the issue? (You can email the link to dev at getutm.app) I’m having a really hard time getting this to reproduce on any of my machines.

@hilmi-ica
Copy link

@osy i sent it!

@osy
Copy link
Contributor

osy commented Apr 21, 2023

@Hilmi-KP if I understand correctly, your issue is not the same as what's being described here. When I booted up your VM, it hit about 11GB of usage in the log-in screen (I didn't log in) and stabilized there after a few minutes. Although it is worth investigating why it uses 11GB, I don't think this is the same issue. @legsak1mbo found the memory usage slowly creeps to 100GB+ after a day of use. Are you able to try the same thing on an older version of UTM? When I ran your VM on v4.0.9, I get the same memory usage idling at the log-in screen.

@hilmi-ica
Copy link

@osy yes, i experienced the same memory usage on UTM v4.0.8 when i didn'e eve log-in. It used 10.46GB of memory. I also tried the v3.2.4 but i guess it can't boot the windows 11.

@legsak1mbo
Copy link
Author

So I upped a second VM, attached the same Debian ISO and the same USB device (a Conbee stick) to it and left it running this morning. There didn't seem to be any appreciable memory use.

I've now re-upped the actual VM. I guess the only other thing to try would be the HA VM without the Conbee stick attached. I'll have to try to find some time to do that.

@osy osy modified the milestones: v4.2, Future Apr 23, 2023
@mrbitcoiner
Copy link

I have a similar issue here, no GPU acceleration enabled (Virtio Framebuffer), using latest aarch64 Debian in the latest version of UTM.

I only use the vm over SSH, with the UTM windows minimized. I've noticed that when I allocate and release memory, the QEMULauncher process memory usage keeps growing. But the memory inside the VM get's released as expected.

Screenshot 2023-05-28 at 14 19 44 Screenshot 2023-05-28 at 14 20 02

@mrbitcoiner
Copy link

Apparently, the memory usage will grow forever, until fill all the storage with swap files. As result:
Screenshot 2023-05-28 at 14 31 27

@mrbitcoiner
Copy link

These leaks result in mac os kernel doing a lot of swapping, writing a lot of garbage to the SSD, reducing it's lifespan.

@osy
Copy link
Contributor

osy commented May 28, 2023

@mrbitcoiner as you can see above, I have a very hard time reproducing this issue. Can you help by reading the conversation above and letting us know if any of the things discussed applies to you or not? Also is it possible to create a new VM and see if the issue still happens?

@mrbitcoiner
Copy link

mrbitcoiner commented May 28, 2023

Steps to reproduce the memory leak:

  • Create VM with 4GB ram
  • Install debian aarch64 inside it (without UI, only SSH)
  • Install git, docker and docker compose
  • usermod -aG docker <unprivileged_user>
  • switch to the unprivileged user
  • run cd to go to user home
  • git clone https://github.com/mrbitcoiner/docker-bitcoind
  • Start compilation of bitcoind with ./control.sh up regtest disabled

You will see the QEMULauncher memory usage growing a lot above the 4GB, and never releasing the memory.

Observation: memory usage inside the VM will be insignificant.

The leak happens even without swap in both vm and mac os (vm.compressor_mode: 2).

Another thing to note is that the memory used by QEMULauncher only grows, it's like every memory allocated is never freed, we need to stop and restart the vm to release the memory (reboot does not release the memory used by the process).

In this case, I've disabled mac os swap to reduce the degradation of my ssd (because there's no way to replace it).
Screenshot 2023-05-28 at 16 35 43

@mrbitcoiner
Copy link

mrbitcoiner commented May 28, 2023

@legsak1mbo I used debian-11.5.0-arm64-netinst.iso to create a brand new VM install (GNOME). I set my display card to virtio-ramfb. Leaving the VM idle for 20 minutes and I see UTM's memory usage hover around 250-275MB. Can you give more detailed steps to reproduce? Can you reproduce it using a brand new Debian install?

Leaving the VM idle will not leak memory, since there's almost no memory allocations. The QEMU is not giving back the allocated memory to the host system (Mac Os Kernel), so the unused memory (that was freed inside the vm) goes to swap. That is my guess.

@mrbitcoiner
Copy link

If it helps, I'm using the MBP M1 (2020) 8GB RAM.

@osy
Copy link
Contributor

osy commented May 29, 2023

@mrbitcoiner I've done some testing with your configuration and I found that (with 4GB guest memory configured), the host memory usage tops out at ~7.8GB. Is this what you experienced? If so, this is "normal", as in this is how QEMU works. You can try running QEMU directly (brew install qemu) and you will see the same usage. The reason for this is that the 4GB guest memory doesn't mean 4GB total QEMU usage; there are a lot of QEMU internals that would add up to ~3GB of usage. In terms of why the memory isn't released when guest memory pressure relieves--that is because QEMU doesn't support dynamic memory ballooning. I too was surprised when I discovered this as other commercial VM software have had this feature for decades. QEMU supports some form of manual memory ballooning (as in you can tell the monitor to reclaim memory) but nothing automatic. This isn't a "leak" but just the fact that the VM will not release host memory up to the configured limit (4GB).

The issue mentioned in this thread is different. People have observed usage in excess of 100GB where the memory usage slowly creeps up through days and days.

@mrbitcoiner
Copy link

@mrbitcoiner I've done some testing with your configuration and I found that (with 4GB guest memory configured), the host memory usage tops out at ~7.8GB. Is this what you experienced? If so, this is "normal", as in this is how QEMU works. You can try running QEMU directly (brew install qemu) and you will see the same usage. The reason for this is that the 4GB guest memory doesn't mean 4GB total QEMU usage; there are a lot of QEMU internals that would add up to ~3GB of usage. In terms of why the memory isn't released when guest memory pressure relieves--that is because QEMU doesn't support dynamic memory ballooning. I too was surprised when I discovered this as other commercial VM software have had this feature for decades. QEMU supports some form of manual memory ballooning (as in you can tell the monitor to reclaim memory) but nothing automatic. This isn't a "leak" but just the fact that the VM will not release host memory up to the configured limit (4GB).

The issue mentioned in this thread is different. People have observed usage in excess of 100GB where the memory usage slowly creeps up through days and days.

Yep, I was doing some more stress testing and with 4GB ram I achieved a point where the resource usage stopped at these 7.8 GB that you've mentioned.

Thanks for your time!

@DUOLabs333
Copy link

I have a similar issue, but not actually the same: while the "Real Memory" column (I'm guess the column that actually matters) is not super high, the "Memory"column is much higher than expected (~13 GB). There's also memory that coincide with CPU spikes (I'm not sure which one causes the other). I've also had the same problem with akihikodaki's fork, so if it is a QEMU bug, it's not unique to UTM.

@arif-desu
Copy link

Just chipping in. The memory consumption keeps on growing on idle VMs. Tried with and without balloon, no difference. I can understand few hundred MBs used by QEMULauncher other than what is assigned to the VM itself, but it goes beyond that, and increases with the passing of time.

@osy
Copy link
Contributor

osy commented Jul 17, 2023

@parasaito as you can gather from this thread we’ve been trying to reproduce this issue but couldn’t. Please be more specific and even better follow the instructions from the conversation and post your results.

@arif-desu
Copy link

@osy Here's I could get it to reproduce. Allocate 1GB RAM (or less) to a Linux distro. In my case I used Fedora 38 Server aarch64. Leave it idle for and in an hour or two and check Activity Monitor. In my case the QEMULauncher was using 2.24GB of RAM and trying to ssh into the VM just fails, the VM seems frozen.

@osy
Copy link
Contributor

osy commented Aug 19, 2023

@parasaito does it stop at 2.24GB? Because the issue reported here is that it goes up to like 100GB. 2.24GB of memory usage is "normal" for QEMU and isn't something we can control.

@MichaelKofler
Copy link

This issue has gone quiet, but I don't think it has been fixed.

I have currently five VMs running (debian, ubuntu, fedora, arch, opensuse), which have been assigned 2, 4, 3, 2 and 3 GiB RAM resp. (sum: 14). However, UTM + five QEMULauncher together need more than 25 GIB RAM (UTM itself only 400 MiB, but each VM much more than the assigned RAM). All VMs are virtualized, i.e. they run aarch64 natively. Speed is good, but the growing memory consumption is bad.

I have used QEMU quite a lot before on Linux and never had memory issues. Any ideas?

@osy
Copy link
Contributor

osy commented Mar 12, 2024

@MichaelKofler there has been various diagnostics/tests mentioned in this thread. As of today, I still am not able to reproduce this issue so I don't know how to begin to fix it. However, if you can provide any feedback based on what's been mentioned so far maybe it can help with reproduction.

@MichaelKofler
Copy link

Thanks for your swift reply. I tried to observe this for a while. Basically, I don't believe there is a real leak in UTM or QEMU. The VMs are not consuming more and more memory the longer the run. Instead, I believe the VMs simply use twice the RAM reserved if left running for a while (and doing something inside the VMs, starting Firefox and opening a few sites etc.). I.e. a VM with 4 GiB RAM uses 4 GiB inside the VM, but needs 8 GiB in macOS.

2x seems to be the upper limit, at least for me with five different VMs, some of them running for days without reboot. (Stability is great, by the way.)

This does not happen on Linux with x86. There QEMU needs somewhat more memory then reserved for the VM (one example: ca. 4,5 GiB for a VM with 4 GiB virtual RAM; part of the extra memory is probably for graphics). But this amount stays constant.

Also , this does not happen with VMs using Apple Virtualization (Docker Desktop, Liviable). Again, the VM gets the memory reserved and Apple Acitivity Monitor shows almost exactly this amount.

I don't fully understand macOS memory management. macOS certainly is pretty good with compressing and swapping out unused memory. So UTM is usable for me, even if VMs seem to take twice the memory they should. Still, I am almost sure something is amiss. Not necessarily UTM, could also be QEMU for aarch64 in general or even QEMU for macOS for aarch64 specifically.

If there is anything I can do to help, I am willing to provide more details. My hardware is a MacBook Pro M3 Pro with 36 GiB RAM, latest macOS.

@billxc
Copy link

billxc commented Jun 26, 2024

I also noticed that 16GB memory consumed when I have a MacOS VM of 8GB memory running. Stop the VM and start it again seems to ease the problem.

@linkdata
Copy link

See the same. M3 Pro, with QEMULauncher using twice the RAM that is allotted to the VM.

@MichaelKofler
Copy link

I am now using Apple Virtualization for UTM exclusively. Less features than qemu, but runs better (for my purpose) and does not need so much memory.

@WinkelCode
Copy link

WinkelCode commented Sep 3, 2024

I noticed the same issue today with a VM running Windows 11 24H2 ARM: 8GiB allotted RAM but QEMULauncher ended up using around 14 GiB. Only noteworthy non-default option is that I have a shared folder with the VM, but there wasn't much activity on that.

@elyulka
Copy link

elyulka commented Oct 5, 2024

I am now using Apple Virtualization for UTM exclusively. Less features than qemu, but runs better (for my purpose) and does not need so much memory.

@MichaelKofler I'm not sure Apple Virtualization has no leaks because i see increase in real memory usage over the week while running lima vm (maybe enabled Rosetta contributes into leaking, but i didn't run any x86 process):

Screenshot 2024-10-05 at 22 24 38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests