Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

Closed
chrone81 opened this issue Feb 26, 2018 · 13 comments
Closed

Comments

@chrone81
Copy link

System information

Type | Version/Name
Distribution Name | Proxmox VE
Distribution Version | 5.1-41
Linux Kernel | 4.13.13-6-pve
Architecture | amd64
ZFS Version | 0.7.6-1
SPL Version | 0.7.6-1

Describe the problem you're observing

"qemu-img convert" will use all available memory as buffered pages if the source image is bigger than total memory and is in ZFS ZVOL image format. This causes high swapping activity and overall system slow down.

This issue only happens on "qemu-img convert" with ZFS ZVOL as the source image format. Is this some kind of memory leak or just a bug from qemu-img or bug ZFS?

Describe how to reproduce the problem

On host with 32GB RAM and ZFS zpool mirror on 2 spinning drives, the problem is reproducible every time:

  • Full clone 100GB VM with source image ZFS ZVOL to target image ZFS ZVOL uses all available memory as buffered pages.
  • Full clone 100GB VM with source image ZFS ZVOL to target image qcow2 on ZFS uses all available memory as buffered pages.
  • Full clone 100GB VM with source image qcow2 on ZFS to target image ZFS ZVOL uses more less 20% buffered pages.
  • Full clone 100GB VM with source image qcow2 on ZFS to target image qcow2 does not use buffered pages or does not increase memory usage at all.

Include any warning/errors/backtraces from the system logs

Didn't notice any error listed in syslog when the overall system gets slow down and high swap usage occurred.

zfs zvol qemu-img convert issue

@gmelikov
Copy link
Member

Could you try to reproduce it with e9a7729 ?

@bunder2015
Copy link
Contributor

bunder2015 commented Feb 26, 2018

I wonder if 03b60ee would also help

edit: my theory is that qemu is using linux buffer/cache and not returning them when finished, this would normally be okay for non-zfs systems, but on zfs this forces an arc eviction, and the cache sits around forever doing nothing.

@Fabian-Gruenbichler
Copy link
Contributor

Fabian-Gruenbichler commented Feb 26, 2018 via email

@chrone81
Copy link
Author

@gmelikov @bunder2015 Thanks for the tips, unfortunately, I'm no good at compiling ZFS from scratch especially on PVE kernel.

@Fabian-Gruenbichler Whoa, this is great. Sure, I could do a test on the PVE kernel. Are you Fabian from Proxmox staff? If you are then I could leave a message on Proxmox forum.

@bunder2015
Copy link
Contributor

bunder2015 commented Feb 26, 2018

@chrone81 I believe you should have 03b60ee already as part of 0.7.x, however its disabled by default. Setting zfs_arc_pc_percent to something like 100-500 may help block a mass arc eviction due to linux buffer/cache. (that is, unless I'm misinterpreting how that option works)

edit:

echo 3 > /proc/sys/vm/drop_caches
echo 100 > /sys/module/zfs/parameters/zfs_arc_pc_percent
((re-prime arc))
((test again to see if buffers cause arc eviction and/or massive buffer leftovers))

@chrone81
Copy link
Author

Hi @bunder2015 thanks for pointing that out. The memory buffered pages are removed when the 'qemu-img convert' is finished or cancelled though.

I just tested with 100 or 500 and it didn't help limit the buffered pages used by qemu-img convert. The memory buffered pages keeps growing and spilled to swap over time.

@bunder2015
Copy link
Contributor

My apologies, it must only work on cache then (rather than both cache and buffers). ☹️

@Fabian-Gruenbichler
Copy link
Contributor

PVE 5 kernel with #7170 backported:
http://download.proxmox.com/temp/pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb

md5/sha256sum:

e6b0f499110093121a7d9a84922010b0  pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb
984786973c94b4583252c40f73cc1d35ae5f9e482bc10e117703498e16169838  pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb

@Fabian-Gruenbichler
Copy link
Contributor

I think this is only tangentially ZFS related, as I can still reproduce it on a test system with #7170 included.

@chrone81: do you see an improvement if you manually re-try the qemu-img command (you might have to create the target zvol first if it does not exist already), but add "-t none -T none" before the zvol paths? I think qemu-img just has a very bad choice of default caching mode, which should be fixable on the calling side...

@fcicq
Copy link

fcicq commented Feb 27, 2018

try again with memcg / memory controller enabled. maybe like this
cgcreate -g memory:/sandbox
cgset -r memory.limit_in_bytes=100M sandbox
cgexec -g memory:/sandbox (your actual work)

@chrone81
Copy link
Author

@Fabian-Gruenbichler

Just tested without the patched PVE kernel and with the qemu-img convert -t none and -T none options. The RAM usage is normal as in attached screenshot.

zfs zvol qemu-img convert issue fix

I'll test with the patched kernel and report back later.

@fcicq Unfortunately, I didn't test this with Linux containers yet.

@chrone81
Copy link
Author

@Fabian-Gruenbichler using the ZFS patched PVE kernel didn't help. Only with the qemu-img convert -t none and -T none options did fix this issue.

Blub pushed a commit to Blub/pve-qemu-server that referenced this issue Sep 26, 2018
this fixes an issue with zvols, which require cache=none and eat up all
free memory as buffered pages otherwise

openzfs/zfs#7235

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@fcicq @chrone81 @Fabian-Gruenbichler @gmelikov @bunder2015 and others