Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC throughput is halved by init_on_alloc #9910

Open
adamdmoss opened this issue Jan 29, 2020 · 10 comments
Open

ARC throughput is halved by init_on_alloc #9910

adamdmoss opened this issue Jan 29, 2020 · 10 comments
Labels
Bot: Not Stale Override for the stale bot Type: Performance Performance improvement or performance problem

Comments

@adamdmoss
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04.3 LTS
Linux Kernel 5.3.0-26-generic (ubuntu hwe)
Architecture x86_64
ZFS Version ZoL git master 25df8fb
SPL Version ZoL git master 25df8fb

Describe the problem you're observing

Bandwidth for reads from the ARC is approximately half of the bandwidth of reads from the native Linux cache when ABD scatter is enabled. ARC read speed is fairly comparable to the native Linux cache when ABD scatter is disabled (i.e. 'legacy' linear mode).

Describe how to reproduce the problem

On a 16GB system with ~12GB free:
$ head --bytes=8G < /dev/urandom > testfile
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time cp testfile /dev/null
$ time cp testfile /dev/null
$ time cp testfile /dev/null
...
Repeat for the scatter-on, scatter-off, and non-ZFS cases for comparison.
On my system, 8GB from a linear-ABD warm ARC takes 1.1 seconds, scatter-ABD takes 2.1 seconds, and 'native' takes 0.9 seconds.

(ARC also sometimes takes an arbitrarily high number of 'cp' repeats to get warm again in the presence of a lot of [even rather cold] native cache memory in use, hence the drop_caches above. I can file that as a separate issue if desirable.)

Apologies in advance if I cause blood to boil with the slipshod nature of this benchmark; I know it's only measuring one dimension of ARC performance (no writes, no latency measurements, no ARC<->L2ARC interactions) but it's a metric I was interested in at the time.

@h1z1
Copy link

h1z1 commented Jan 29, 2020

Might help to have a bit more information like the contents of

grep . /sys/module/zfs/parameters/*

Or atleast arc_max and output from arcstat.py

@JulietDeltaGolf
Copy link

Isn't it a known issue ?

#7896 (comment)

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Feb 6, 2020
@ahrens
Copy link
Member

ahrens commented Feb 6, 2020

The 5.3 linux kernel adds a new feature which allows pages to be zeroed when allocating or freeing them: init_on_alloc and init_on_free. init_on_alloc is enabled by default on Ubuntu 18.04 HWE kernel. ZFS allocates and frees pages frequently (via the ABD structure), e.g. for every disk access. The additional overhead of zeroing these pages is significant. I measured a ~40% regression in performance of an uncached "zfs send ... >/dev/null".

This new "feature" can be disabled by setting init_on_alloc=0 in the GRUB kernel boot parameters, which undoes the performance regression.

Linux kernel commit: torvalds/linux@6471384

@ahrens ahrens changed the title ARC throughput is halved by ABD scatter ARC throughput is halved by init_on_alloc Feb 6, 2020
@adamdmoss
Copy link
Contributor Author

adamdmoss commented Feb 10, 2020

ZFS allocates and frees pages frequently

Would it be insane to recycle these pages rather than continually alloc-and-freeing them, at least while ABD requests are 'hot'?

I could look into this if the idea isn't struck down immediately.

@ahrens
Copy link
Member

ahrens commented Feb 11, 2020

We could. The challenge would be to not reintroduce the problems that ABD solved (locking down excess memory). I think it would be possible, but obviously much easier to just turn off this new kernel feature - which I will suggest to the folks at Canonical.

@sxc731
Copy link

sxc731 commented Dec 29, 2020

Sincerely sorry for stating the obvious but there are quite substantial security benefits to init_on_alloc=1 as outlined here.

IMHO, it would be helpful to devise a workaround (or perhaps guideline ARC configuration options pending a code fix?) that would mitigate the perf impact on those who value both ZFS and security. Can anyone chip in?

Also, I don't think that the issue is Canonical-specific; recommendations (such as this) exist that suggest turning this feature on by default - for good reasons IMHO. I happen to use Ubuntu but it would be good to hear from ppl using other distros as to the perf impact on ZFS.

@stale
Copy link

stale bot commented Mar 17, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Mar 17, 2022
@behlendorf behlendorf added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Mar 17, 2022
@Ramalama2
Copy link

Is there still no workaround other as disabling?

@ednadolski-ix
Copy link
Contributor

@behlendorf @amotin Replicating the OP's test on Linux 6.5 with current OpenZFS master branch, it appears that the difference for init_on_alloc=[0|1] is negligible. Suggest this be closed, provided there are no objections.

script:

# !/bin/bash
export TESTFILE="testfile"
export TESTSIZE="64G"
time head --bytes=${TESTSIZE} < /dev/urandom > ${TESTFILE}
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
time cp ${TESTFILE} /dev/null
time cp ${TESTFILE} /dev/null
time cp ${TESTFILE} /dev/null

init_on_alloc=0:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=0
[    0.197222] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=0

root@walong-test2:/mypool2/test_init_on_alloc# ./test0

real    4m10.609s
user    0m2.549s
sys     4m7.485s

real    0m13.298s
user    0m0.128s
sys     0m13.167s

real    0m10.984s
user    0m0.112s
sys     0m10.872s

real    0m10.993s
user    0m0.136s
sys     0m10.857s

init_on_alloc=1:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=1
[    0.197075] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0 root=UUID=18a1c019-31e3-433b-92bf-a5809af9cdc1 ro init_on_alloc=1

root@walong-test2:/mypool2/test_init_on_alloc# ./test0

real    4m12.338s
user    0m2.520s
sys     4m8.554s

real    0m13.899s
user    0m0.148s
sys     0m13.740s

real    0m10.783s
user    0m0.120s
sys     0m10.662s

real    0m11.333s
user    0m0.176s
sys     0m11.157s

@amotin
Copy link
Member

amotin commented Nov 3, 2023

@ednadolski-ix It becomes ANYTHING BUT negligible as soon as memory bandwidth reaches saturation, AKA application traffic reaches 10-20% of it. The test you are using is just inadequate. Single write stream from /dev/urandom or even ARC read into /dev/null is just unable to saturate anything, they both are limited by single core speed at best. We've disabled it in TrueNAS SCALE (truenas/linux@d165d39) , and we did notice performance improvements. IMO it is a paranoid patch against brain dead programmers unable to properly initialize memory. The only question is whether ZFS can control it somehow for its own allocations or it has to be done by sane distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

9 participants