Update ARC memory limits to account for SLUB internal fragmentation #660

ryao · 2012-04-12T21:29:57Z

23bdb07 updated the ARC memory limits
to be 1/2 of memory or all but 4GB. Unfortunately, these values assume
zero internal fragmentation in the SLUB allocator, when in reality, the
internal fragmentation could be as high as 50%, effectively doubling
memory usage. This poses clear safety issues, because it permits the
size of ARC to exceed system memory.

This patch changes this so that the default value of arc_c_max is always
1/2 of system memory. This effectively limits the ARC to the memory that
the system has physically installed.

ryao · 2012-04-12T21:43:11Z

This patch seems to have had the side effect of increasing the arc hit rate on my desktop. Memory pressure had led to the ARC being purged quite frequently, which reduced its effective hit rate to about 95%. With this patch, the effective hit rate is 99%.

I calculated this according to 1 - hits / ( hits + misses ), using the values from /proc/spl/kstat/zfs/arcstats.

behlendorf · 2012-04-13T17:56:46Z

Interesting, and down right counter intuitive. Decreasing the cache since increases the odds of a cache hit, at least for desktop workloads. I think this might be reasonable for desktops but probably not OK for large memory servers.

I think we'd have better luck addressing these issues in the short term by decreasing SPL_KMEM_CACHE_OBJ_PER_SLAB and SPL_KMEM_CACHE_OBJ_PER_SLAB_MIN. By decreasing these values we put fewer objects on a slab by default which increases the odds of being able to free them despite fragmentation. I picked these values originally as a best guess long before I had a working ZPL, so there's a good chance they are not optimal. It would be interesting to decrease them by half and see the effect.

Additionally, making the changes described here would also be a good short term improvement.

#618 (comment)

The long term solution of course is to move these buffers off the slab and in to the Linux page cache. That however, will need to wait for 0.7.0 at the earliest since that's a significant change.

ryao · 2012-04-13T18:06:10Z

The commit message for 23bdb07 suggests that no accounting is done for internal fragmentation. If I recall correctly, the SLUB allocator's internal fragmentation chracteristics match those of the HOARD allocator, for which a paper was published. Here a link from Google:

https://parasol.tamu.edu/~rwerger//Courses/689/spring2002/day-3-ParMemAlloc/papers/berger00hoard.pdf

The theoretical limit on fragmentation for HOARD is 50%, which I believe is also true for SLUB. With the current limit of all but 4GB on a 16GB system, the maximum arc size with worst case fragmentation would be 24GB, with the unweighted average being 18GB. Both figures exceed what the system can actually store. This reality should only become worse with larger memory systems.

My home server has 16GB of RAM with a 6-disk raidz2 vdev for its pool. This patch appears to have addressed instability that I observed when doing simultaneous rsyncs of the mirrors for Gentoo, FreeBSD, OpenBSD, NetBSD and Dragonfly BSD with a few virtual machines running. Without it and with the patch from issue #618, the system would crash within 12 hours.

My desktop has 8GB of RAM. This effect was not nearly as pronounced as it was on my server. I believe that is because the probability of internal fragmentation causing the ARC to exceed a sane limit was remarkably low. Despite that, there was still significant memory pressure. I believe that caused excessive reclaims, which had a negative impact on the ARC hit rate.

Making the zfs_arc_max default of 1/2 of system memory might be safe, although 1/3 of system memory would seem to be a better figure. With that said, I am fairly certain that the current all but 4GB default on large memory systems causes issues.

behlendorf · 2012-04-13T19:15:45Z

It's true that the ARC doesn't account for internal fragmentation in either the spl or slub allocators. It's also true that we do over commit memory on the system. However, most of that memory should be easily reclaimable by the shrinker callback since we only allow 1/4 of the ARC to be dirty.

That said, I'm not one to argue with the empirical evidence it helps. So I'm certainly willing to consider pulling in this change once I understand why it helps. When you mention instability on these systems what exactly happens? Does the system panic, go non-responsive, deadlock in some way?

Do these systems happen to have an L2ARC device? We recently identified a deadlock in that code which is made worse by the VM patch, although it's been a long standing but rare issue. It may be possible this is in fact what your hitting, but it's hard to say without a stack from the system. See commit behlendorf/zfs@85ab09b

ryao · 2012-04-13T20:05:18Z

My systems lack L2ARC devices. I described what happened on my server in the following comment:

#618 (comment)

ryao · 2012-04-16T13:07:05Z

This patch closes issue #642.

behlendorf · 2012-04-17T22:13:18Z

I'm looking at pulling this patch in to the master tree and wondering if you've done any testing with it set to 1/2 of memory instead of a 1/3.

ryao · 2012-04-17T23:18:46Z

My desktop has 8GB of RAM and I had no stability issues with it before applying this patch. However, it has a small SSD, so I am not able to test the kinds of rsyncs on it that I tested on my server. For what it is worth, I now think 1/2 would be safe because allocations seem to be biased toward powers of 2.

I will not have time to do testing on my server until next week. I have asked @tstudios to test zfs_arc_max set to 1/2 of his RAM in issue #642. If he has time that he is willing to volunteer, we should be able to get feedback before I can look into this.

ryao · 2012-04-23T21:45:19Z

I have changed this patch to set arc_c_max to 1/2 instead of 1/3.

23bdb07 updated the ARC memory limits to be 1/2 of memory or all but 4GB. Unfortunately, these values assume zero internal fragmentation in the SLUB allocator, when in reality, the internal fragmentation could be as high as 50%, effectively doubling memory usage. This poses clear safety issues, because it permits the size of ARC to exceed system memory. This patch changes this so that the default value of arc_c_max is always 1/2 of system memory. This effectively limits the ARC to the memory that the system has physically installed.

Add missing parenthesis around btop and ptob macros to ensure operation ordering is preserved after expansion. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#660

This was referenced Apr 12, 2012

Integrate ARC more tightly with Linux #618

Closed

RC8 release - rsync crash #642

Closed

ryao mentioned this pull request Apr 16, 2012

Disable direct reclaim on zvols #669

Closed

This was referenced Apr 19, 2012

File system locks when many concurent threads are opened #608

Closed

'BUG: scheduling while atomic' #688

Closed

ryao mentioned this pull request Apr 24, 2012

Task hang 100% cpu with lots of memory #695

Closed

behlendorf closed this in 518b487 Apr 30, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ARC memory limits to account for SLUB internal fragmentation #660

Update ARC memory limits to account for SLUB internal fragmentation #660

ryao commented Apr 12, 2012

ryao commented Apr 12, 2012

behlendorf commented Apr 13, 2012

ryao commented Apr 13, 2012

behlendorf commented Apr 13, 2012

ryao commented Apr 13, 2012

ryao commented Apr 16, 2012

behlendorf commented Apr 17, 2012

ryao commented Apr 17, 2012

ryao commented Apr 23, 2012

Update ARC memory limits to account for SLUB internal fragmentation #660

Update ARC memory limits to account for SLUB internal fragmentation #660

Conversation

ryao commented Apr 12, 2012

ryao commented Apr 12, 2012

behlendorf commented Apr 13, 2012

ryao commented Apr 13, 2012

behlendorf commented Apr 13, 2012

ryao commented Apr 13, 2012

ryao commented Apr 16, 2012

behlendorf commented Apr 17, 2012

ryao commented Apr 17, 2012

ryao commented Apr 23, 2012