ARC Data cache been evict when hitting Metadata limit #5537

Open
AndCycle opened this Issue Dec 29, 2016 · 12 comments

Projects

None yet

7 participants

@AndCycle
AndCycle commented Dec 29, 2016 edited

this is an observation recently on my system,
I just see the behavior as I work on my munin to visualize ARC usage

Gentoo
kernel ver 4.4.39
spl-zfs-0.6.5.8
zfs-0.6.5.8

with patch #4850

as you can see the data size is drop to almost zero when metadata usage hit the limit.

yazol_zfs_stats_arc_size_breakdown-day

yazol_zfs_stats_arc_size-day

meminfo_phisical-day

@perfinion
Contributor

Related to #5418 ?

@AndCycle
AndCycle commented Dec 29, 2016 edited

@perfinion maybe, it's hard to tell what happened because currently there is no tool to monitor arcstats continuously,

here is my current work in progress for munin to monitor it,

yazol_zfs_stats_

@dweeezil
Member

@AndCycle I have a hunch this is due to the balanced-mode adjuster. Try setting zfs_arc_meta_strategy=0.

@richardelling

FYI, both telegraf and collectd open source aggregators have agents that collect ARC stats. In the commercial world, there have been collectors available for a very long time, I'd recommend Circonus.

@kpande
Member
kpande commented Dec 29, 2016

i think by 'continuously' they mean more frequent intervals than those collectors allow for.

@AndCycle

@kpande I think he got me, I didn't try out that much monitoring tool as many of them lack builtin visualize tool, and as a personal server I only search through free solution for this,

munin is one that easy enough to do, although the base of munin is pretty sucks and full of bugs, and many contribute plugin have incorrect way to do the calculation which force me to write one for my own,

@AndCycle
AndCycle commented Dec 29, 2016 edited

@dweeezil you got it.

yazol_zfs_stats_arc_size-day 1
yazol_zfs_stats_arc_size_breakdown-day 1
yazol_zfs_stats_efficiency_pct-day

@kernelOfTruth
Contributor

referencing #5128 (comment) (Poor cache performance) and #5418 (ARC efficiency regression) again

@kernelOfTruth
Contributor
kernelOfTruth commented Dec 29, 2016 edited

@dweeezil could also be memcg

here's some notes that I collected while investigating into the matter and which landed in /etc/modprobe.d/zfs.conf some time ago

# Your system is having trouble keeping the metadata under the limit and its not showing much evictable memory. 
# Try setting the tunable zfs_arc_meta_strategy to zero and see if the traditional metadata-only adjuster doesn't work better.
#
# The problem appears to be the continuing evolution of memory cgroups (memcg). 
# If you boot with cgroup_disable=memory the reclaiming should start working again. I've not worked up a patch yet.
#
# options zfs zfs_arc_meta_strategy=0

I've seen people over at Ubuntu running into that kind of issue and

cgroup_disable=memory

appending to boot seemed to have helped

@AndCycle

for ref.
I do have Memory Resource Controller for Control Groups and lot's cgroup related option selected in my kernel.

@dweeezil
Member

This was added to help deal with the memcg issue. There didn't seem to be any way to coax the normal SB shrinker into doing the Right Thing.

@kernelOfTruth
Contributor
kernelOfTruth commented Dec 29, 2016 edited

referencing #3303 (comment) arc_adapt left spinning after rsync with lots of small files

it might be also worth it to take a look at

/sys/module/zfs/parameters/zfs_arc_meta_adjust_restarts

and

/sys/module/zfs/parameters/zfs_arc_meta_prune

in that issue NUMA is also mentioned which should already be addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment