Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

chrone81 · 2018-02-26T07:12:48Z

System information

Describe the problem you're observing

"qemu-img convert" will use all available memory as buffered pages if the source image is bigger than total memory and is in ZFS ZVOL image format. This causes high swapping activity and overall system slow down.

This issue only happens on "qemu-img convert" with ZFS ZVOL as the source image format. Is this some kind of memory leak or just a bug from qemu-img or bug ZFS?

Describe how to reproduce the problem

On host with 32GB RAM and ZFS zpool mirror on 2 spinning drives, the problem is reproducible every time:

Full clone 100GB VM with source image ZFS ZVOL to target image ZFS ZVOL uses all available memory as buffered pages.
Full clone 100GB VM with source image ZFS ZVOL to target image qcow2 on ZFS uses all available memory as buffered pages.
Full clone 100GB VM with source image qcow2 on ZFS to target image ZFS ZVOL uses more less 20% buffered pages.
Full clone 100GB VM with source image qcow2 on ZFS to target image qcow2 does not use buffered pages or does not increase memory usage at all.

Include any warning/errors/backtraces from the system logs

Didn't notice any error listed in syslog when the overall system gets slow down and high swap usage occurred.

gmelikov · 2018-02-26T07:18:10Z

Could you try to reproduce it with e9a7729 ?

bunder2015 · 2018-02-26T09:16:31Z

I wonder if 03b60ee would also help

edit: my theory is that qemu is using linux buffer/cache and not returning them when finished, this would normally be okay for non-zfs systems, but on zfs this forces an arc eviction, and the cache sits around forever doing nothing.

Fabian-Gruenbichler · 2018-02-26T09:51:37Z

@chrone: if you need a test PVE kernel with either (or both) of these commits cherry-picked, feel free to ping me.

chrone81 · 2018-02-26T10:07:46Z

@gmelikov @bunder2015 Thanks for the tips, unfortunately, I'm no good at compiling ZFS from scratch especially on PVE kernel.

@Fabian-Gruenbichler Whoa, this is great. Sure, I could do a test on the PVE kernel. Are you Fabian from Proxmox staff? If you are then I could leave a message on Proxmox forum.

bunder2015 · 2018-02-26T10:59:02Z

@chrone81 I believe you should have 03b60ee already as part of 0.7.x, however its disabled by default. Setting zfs_arc_pc_percent to something like 100-500 may help block a mass arc eviction due to linux buffer/cache. (that is, unless I'm misinterpreting how that option works)

edit:

echo 3 > /proc/sys/vm/drop_caches
echo 100 > /sys/module/zfs/parameters/zfs_arc_pc_percent
((re-prime arc))
((test again to see if buffers cause arc eviction and/or massive buffer leftovers))

chrone81 · 2018-02-26T11:28:42Z

Hi @bunder2015 thanks for pointing that out. The memory buffered pages are removed when the 'qemu-img convert' is finished or cancelled though.

I just tested with 100 or 500 and it didn't help limit the buffered pages used by qemu-img convert. The memory buffered pages keeps growing and spilled to swap over time.

bunder2015 · 2018-02-26T11:37:46Z

My apologies, it must only work on cache then (rather than both cache and buffers). ☹️

Fabian-Gruenbichler · 2018-02-26T15:02:52Z

PVE 5 kernel with #7170 backported:
http://download.proxmox.com/temp/pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb

md5/sha256sum:

e6b0f499110093121a7d9a84922010b0  pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb
984786973c94b4583252c40f73cc1d35ae5f9e482bc10e117703498e16169838  pve-kernel-4.13.13-6-pve_4.13.13-42~test1_amd64.deb

Fabian-Gruenbichler · 2018-02-26T16:02:00Z

I think this is only tangentially ZFS related, as I can still reproduce it on a test system with #7170 included.

@chrone81: do you see an improvement if you manually re-try the qemu-img command (you might have to create the target zvol first if it does not exist already), but add "-t none -T none" before the zvol paths? I think qemu-img just has a very bad choice of default caching mode, which should be fixable on the calling side...

fcicq · 2018-02-27T01:12:27Z

try again with memcg / memory controller enabled. maybe like this
cgcreate -g memory:/sandbox
cgset -r memory.limit_in_bytes=100M sandbox
cgexec -g memory:/sandbox (your actual work)

chrone81 · 2018-02-27T03:28:46Z

@Fabian-Gruenbichler

Just tested without the patched PVE kernel and with the qemu-img convert -t none and -T none options. The RAM usage is normal as in attached screenshot.

I'll test with the patched kernel and report back later.

@fcicq Unfortunately, I didn't test this with Linux containers yet.

chrone81 · 2018-02-27T04:49:01Z

@Fabian-Gruenbichler using the ZFS patched PVE kernel didn't help. Only with the qemu-img convert -t none and -T none options did fix this issue.

this fixes an issue with zvols, which require cache=none and eat up all free memory as buffered pages otherwise openzfs/zfs#7235 Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>

Fabian-Gruenbichler mentioned this issue Feb 27, 2018

z_null_int high disk I/O #6171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

chrone81 commented Feb 26, 2018

gmelikov commented Feb 26, 2018

bunder2015 commented Feb 26, 2018 •

edited

Fabian-Gruenbichler commented Feb 26, 2018 via email

chrone81 commented Feb 26, 2018

bunder2015 commented Feb 26, 2018 •

edited

chrone81 commented Feb 26, 2018

bunder2015 commented Feb 26, 2018

Fabian-Gruenbichler commented Feb 26, 2018

Fabian-Gruenbichler commented Feb 26, 2018

fcicq commented Feb 27, 2018

chrone81 commented Feb 27, 2018

chrone81 commented Feb 27, 2018

Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

Possible memory leak on ZFS ZVOL as source image format and qemu-img convert. #7235

Comments

chrone81 commented Feb 26, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

gmelikov commented Feb 26, 2018

bunder2015 commented Feb 26, 2018 • edited

Fabian-Gruenbichler commented Feb 26, 2018 via email

chrone81 commented Feb 26, 2018

bunder2015 commented Feb 26, 2018 • edited

chrone81 commented Feb 26, 2018

bunder2015 commented Feb 26, 2018

Fabian-Gruenbichler commented Feb 26, 2018

Fabian-Gruenbichler commented Feb 26, 2018

fcicq commented Feb 27, 2018

chrone81 commented Feb 27, 2018

chrone81 commented Feb 27, 2018

bunder2015 commented Feb 26, 2018 •

edited

bunder2015 commented Feb 26, 2018 •

edited