ARC throughput is halved by init_on_alloc #9910

adamdmoss · 2020-01-29T00:13:11Z

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	18.04.3 LTS
Linux Kernel	5.3.0-26-generic (ubuntu hwe)
Architecture	x86_64
ZFS Version	ZoL git master `25df8fb`
SPL Version	ZoL git master `25df8fb`

Describe the problem you're observing

Bandwidth for reads from the ARC is approximately half of the bandwidth of reads from the native Linux cache when ABD scatter is enabled. ARC read speed is fairly comparable to the native Linux cache when ABD scatter is disabled (i.e. 'legacy' linear mode).

Describe how to reproduce the problem

On a 16GB system with ~12GB free:
$ head --bytes=8G < /dev/urandom > testfile
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time cp testfile /dev/null
$ time cp testfile /dev/null
$ time cp testfile /dev/null
...
Repeat for the scatter-on, scatter-off, and non-ZFS cases for comparison.
On my system, 8GB from a linear-ABD warm ARC takes 1.1 seconds, scatter-ABD takes 2.1 seconds, and 'native' takes 0.9 seconds.

(ARC also sometimes takes an arbitrarily high number of 'cp' repeats to get warm again in the presence of a lot of [even rather cold] native cache memory in use, hence the drop_caches above. I can file that as a separate issue if desirable.)

Apologies in advance if I cause blood to boil with the slipshod nature of this benchmark; I know it's only measuring one dimension of ARC performance (no writes, no latency measurements, no ARC<->L2ARC interactions) but it's a metric I was interested in at the time.

The text was updated successfully, but these errors were encountered:

h1z1 · 2020-01-29T21:36:59Z

Might help to have a bit more information like the contents of

grep . /sys/module/zfs/parameters/*

Or atleast arc_max and output from arcstat.py

JulietDeltaGolf · 2020-02-06T08:21:58Z

Isn't it a known issue ?

#7896 (comment)

ahrens · 2020-02-06T23:06:33Z

The 5.3 linux kernel adds a new feature which allows pages to be zeroed when allocating or freeing them: init_on_alloc and init_on_free. init_on_alloc is enabled by default on Ubuntu 18.04 HWE kernel. ZFS allocates and frees pages frequently (via the ABD structure), e.g. for every disk access. The additional overhead of zeroing these pages is significant. I measured a ~40% regression in performance of an uncached "zfs send ... >/dev/null".

This new "feature" can be disabled by setting init_on_alloc=0 in the GRUB kernel boot parameters, which undoes the performance regression.

Linux kernel commit: torvalds/linux@6471384

adamdmoss · 2020-02-10T22:05:31Z

ZFS allocates and frees pages frequently

Would it be insane to recycle these pages rather than continually alloc-and-freeing them, at least while ABD requests are 'hot'?

I could look into this if the idea isn't struck down immediately.

ahrens · 2020-02-11T04:29:07Z

We could. The challenge would be to not reintroduce the problems that ABD solved (locking down excess memory). I think it would be possible, but obviously much easier to just turn off this new kernel feature - which I will suggest to the folks at Canonical.

sxc731 · 2020-12-29T09:53:13Z

Sincerely sorry for stating the obvious but there are quite substantial security benefits to init_on_alloc=1 as outlined here.

IMHO, it would be helpful to devise a workaround (or perhaps guideline ARC configuration options pending a code fix?) that would mitigate the perf impact on those who value both ZFS and security. Can anyone chip in?

Also, I don't think that the issue is Canonical-specific; recommendations (such as this) exist that suggest turning this feature on by default - for good reasons IMHO. I happen to use Ubuntu but it would be good to hear from ppl using other distros as to the perf impact on ZFS.

stale · 2022-03-17T10:53:31Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Ramalama2 · 2023-05-20T15:18:24Z

Is there still no workaround other as disabling?

ednadolski-ix · 2023-11-03T21:26:18Z

@behlendorf @amotin Replicating the OP's test on Linux 6.5 with current OpenZFS master branch, it appears that the difference for init_on_alloc=[0|1] is negligible. Suggest this be closed, provided there are no objections.

script:

# !/bin/bash
export TESTFILE="testfile"
export TESTSIZE="64G"
time head --bytes=${TESTSIZE} < /dev/urandom > ${TESTFILE}
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
time cp ${TESTFILE} /dev/null
time cp ${TESTFILE} /dev/null
time cp ${TESTFILE} /dev/null

init_on_alloc=0:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=0
[    0.197222] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=0

root@walong-test2:/mypool2/test_init_on_alloc# ./test0

real    4m10.609s
user    0m2.549s
sys     4m7.485s

real    0m13.298s
user    0m0.128s
sys     0m13.167s

real    0m10.984s
user    0m0.112s
sys     0m10.872s

real    0m10.993s
user    0m0.136s
sys     0m10.857s

init_on_alloc=1:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=1
[    0.197075] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=1

root@walong-test2:/mypool2/test_init_on_alloc# ./test0

real    4m12.338s
user    0m2.520s
sys     4m8.554s

real    0m13.899s
user    0m0.148s
sys     0m13.740s

real    0m10.783s
user    0m0.120s
sys     0m10.662s

real    0m11.333s
user    0m0.176s
sys     0m11.157s

amotin · 2023-11-03T22:46:04Z

@ednadolski-ix It becomes ANYTHING BUT negligible as soon as memory bandwidth reaches saturation, AKA application traffic reaches 10-20% of it. The test you are using is just inadequate. Single write stream from /dev/urandom or even ARC read into /dev/null is just unable to saturate anything, they both are limited by single core speed at best. We've disabled it in TrueNAS SCALE (truenas/linux@d165d39) , and we did notice performance improvements. IMO it is a paranoid patch against brain dead programmers unable to properly initialize memory. The only question is whether ZFS can control it somehow for its own allocations or it has to be done by sane distributions.

behlendorf added the Type: Performance Performance improvement or performance problem label Feb 6, 2020

ahrens changed the title ~~ARC throughput is halved by ABD scatter~~ ARC throughput is halved by init_on_alloc Feb 6, 2020

andrewrynhard mentioned this issue Dec 10, 2020

feat: update kernel to 5.9.13, new KSPP requirements siderolabs/talos#2939

Merged

stale bot added the Status: Stale No recent activity for issue label Mar 17, 2022

behlendorf added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARC throughput is halved by init_on_alloc #9910

ARC throughput is halved by init_on_alloc #9910

adamdmoss commented Jan 29, 2020

h1z1 commented Jan 29, 2020

JulietDeltaGolf commented Feb 6, 2020

ahrens commented Feb 6, 2020

adamdmoss commented Feb 10, 2020 •

edited

ahrens commented Feb 11, 2020

sxc731 commented Dec 29, 2020 •

edited

stale bot commented Mar 17, 2022

Ramalama2 commented May 20, 2023

ednadolski-ix commented Nov 3, 2023

amotin commented Nov 3, 2023

ARC throughput is halved by init_on_alloc #9910

ARC throughput is halved by init_on_alloc #9910

Comments

adamdmoss commented Jan 29, 2020

System information

Describe the problem you're observing

Describe how to reproduce the problem

h1z1 commented Jan 29, 2020

JulietDeltaGolf commented Feb 6, 2020

ahrens commented Feb 6, 2020

adamdmoss commented Feb 10, 2020 • edited

ahrens commented Feb 11, 2020

sxc731 commented Dec 29, 2020 • edited

stale bot commented Mar 17, 2022

Ramalama2 commented May 20, 2023

ednadolski-ix commented Nov 3, 2023

amotin commented Nov 3, 2023

adamdmoss commented Feb 10, 2020 •

edited

sxc731 commented Dec 29, 2020 •

edited