this is an observation recently on my system,
I just see the behavior as I work on my munin to visualize ARC usage
kernel ver 4.4.39
with patch #4850
as you can see the data size is drop to almost zero when metadata usage hit the limit.
Related to #5418 ?
@perfinion maybe, it's hard to tell what happened because currently there is no tool to monitor arcstats continuously,
here is my current work in progress for munin to monitor it,
@AndCycle I have a hunch this is due to the balanced-mode adjuster. Try setting zfs_arc_meta_strategy=0.
FYI, both telegraf and collectd open source aggregators have agents that collect ARC stats. In the commercial world, there have been collectors available for a very long time, I'd recommend Circonus.
i think by 'continuously' they mean more frequent intervals than those collectors allow for.
@kpande I think he got me, I didn't try out that much monitoring tool as many of them lack builtin visualize tool, and as a personal server I only search through free solution for this,
munin is one that easy enough to do, although the base of munin is pretty sucks and full of bugs, and many contribute plugin have incorrect way to do the calculation which force me to write one for my own,
@dweeezil you got it.
referencing #5128 (comment) (Poor cache performance) and #5418 (ARC efficiency regression) again
@dweeezil could also be memcg
here's some notes that I collected while investigating into the matter and which landed in /etc/modprobe.d/zfs.conf some time ago
# Your system is having trouble keeping the metadata under the limit and its not showing much evictable memory.
# Try setting the tunable zfs_arc_meta_strategy to zero and see if the traditional metadata-only adjuster doesn't work better.
# The problem appears to be the continuing evolution of memory cgroups (memcg).
# If you boot with cgroup_disable=memory the reclaiming should start working again. I've not worked up a patch yet.
# options zfs zfs_arc_meta_strategy=0
I've seen people over at Ubuntu running into that kind of issue and
appending to boot seemed to have helped
I do have Memory Resource Controller for Control Groups and lot's cgroup related option selected in my kernel.
Memory Resource Controller for Control Groups
This was added to help deal with the memcg issue. There didn't seem to be any way to coax the normal SB shrinker into doing the Right Thing.
referencing #3303 (comment) arc_adapt left spinning after rsync with lots of small files
it might be also worth it to take a look at
in that issue NUMA is also mentioned which should already be addressed