Linux virtual machines (guests) running under Apple Virtualization Framework experience data corruption on disk writes when virtio storage is used. This usually happens under heavier write load, like system upgrades, code compilation, `stress-ng --iomix 2` or parallel runs of cp. I was able to reproduce this problem on Linux 5.10, 5.15, 6.5.x, 6.6.1 and 6.7.0-rc1. I tested on ext4, btrfs and bcachefs – all are affected. Due to lack of data checksumming on ext4, the problem is harder to notice. Problem occurs on various versions of Ubuntu and NixOS. How to reproduce? - Download GUILinuxVirtualMachineSampleApp: https://developer.apple.com/documentation/virtualization/running_gui_linux_in_a_virtual_machine_on_a_mac https://docs-assets.developer.apple.com/published/ea15178d84/RunningGUILinuxInAVirtualMachineOnAMac.zip - Download Ubuntu 22.04 aarch64 server iso: https://cdimage.ubuntu.com/releases/jammy/release/ https://cdimage.ubuntu.com/releases/jammy/release/ubuntu-22.04.3-live-server-arm64.iso https://cdimage.ubuntu.com/releases/22.04.3/release/SHA256SUMS - Verify SHA256 hash - Open and run GUILinuxVirtualMachineSampleApp in Xcode - In the file picker, choose ubuntu-22.04.3-live-server-arm64.iso - Perform a default installation. The only change I did was to enable OpenSSH server. Note your username and password. - Press "Reboot now" and log in - `ip addr | grep 192.168` and note vm ip address - Open your terminal and ssh user@ip - In your VM again download a big file, for example: `wget https://releases.ubuntu.com/mantic/ubuntu-23.10.1-desktop-amd64.iso` or use scp to copy over another big file. It seems that files closer to 5 GB do the trick. Note that `scp` sometimes results in corrupted files as well. - `FILE=ubuntu-23.10.1-desktop-amd64.iso` - `cp --reflink=never $FILE "$FILE"1 & cp --reflink=never $FILE "$FILE"2 & cp --reflink=never $FILE "$FILE"3 && fg` - `sha256sum *.iso*` Expected: all SHA256 hashes are equal. Actual: Different hashes calculated for some of the files than for the original file, therefore file corruption has occurred. Example: $ wget https://releases.ubuntu.com/mantic/ubuntu-23.10.1-desktop-amd64.iso
$ FILE=ubuntu-23.10.1-desktop-amd64.iso
$ cp --reflink=never $FILE "$FILE"1 & cp --reflink=never $FILE "$FILE"2 & cp --reflink=never $FILE "$FILE"3 && fg $ sha256sum *.iso* d95b87463c8d7879187a59f8b29601a8cb09c7c67734cdee7b2a60d03f9369e5 ubuntu-23.10.1-desktop-amd64.iso 45032dcbbbdc45b894ca21c065f3c275e7daae1858acd3050f338e0529054171 ubuntu-23.10.1-desktop-amd64.iso1 c3e0b95f652d66db7fe85c3a25324745aaa44cb6554d5b3ad66c656e17849523 ubuntu-23.10.1-desktop-amd64.iso2 fa04d00ddf79d5a30a8dbd5a530833c93bccd14b6acbfedb30b9faa2594d16b6 ubuntu-23.10.1-desktop-amd64.iso3 $ dmesg # ext4 seldom reports any errors
[ 4328.201793] EXT4-fs error (device dm-0): ext4_validate_block_bitmap:420: comm kworker/u18:2: bg 132: bad block bitmap checksum [ 4328.202496] EXT4-fs (dm-0): Delayed block allocation failed for inode 1062998 at logical offset 110592 with max blocks 2048 with error 74 [ 4328.202952] EXT4-fs (dm-0): This should not happen!! Data will be lost Other ways to reproduce: - with data checksumming fs: BTRFS, bcachefs `stress-ng --iomix 8` causes i/o errors after a couple of minutes. Also errors are visible in dmesg. - sometimes `scp` of a big file from host to guest results in a corruption Related issues online: https://github.com/utmapp/UTM/issues/4840 https://github.com/lima-vm/lima/issues/1957 Xcode: 15.0.1 Workarounds: 1. Switch to NVMe storage with VZNVMExpressControllerDeviceConfiguration(). In this case corruption doesn’t occur. 2. Use VZVirtioBlockDeviceConfiguration with caching mode to `.cached`: VZDiskImageStorageDeviceAttachment(url: URL(fileURLWithPath: path), readOnly: false, cachingMode: .cached, synchronizationMode: .full) Notes: * synchronization mode doesn’t have any effect * NVMe works with all caching modes * `.automatic` caching mode defaults to `uncached` on M1 Pro 32 GB RAM with 10 GB RAM used by apps. It would be great to explain in the docs what are the conditions for the `automatic` mode. * there’s a small performance penalty when using NVMe