Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ARC memory limits to account for SLUB internal fragmentation #660

Closed
wants to merge 1 commit into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Apr 12, 2012

23bdb07 updated the ARC memory limits
to be 1/2 of memory or all but 4GB. Unfortunately, these values assume
zero internal fragmentation in the SLUB allocator, when in reality, the
internal fragmentation could be as high as 50%, effectively doubling
memory usage. This poses clear safety issues, because it permits the
size of ARC to exceed system memory.

This patch changes this so that the default value of arc_c_max is always
1/2 of system memory. This effectively limits the ARC to the memory that
the system has physically installed.

@ryao
Copy link
Contributor Author

ryao commented Apr 12, 2012

This patch seems to have had the side effect of increasing the arc hit rate on my desktop. Memory pressure had led to the ARC being purged quite frequently, which reduced its effective hit rate to about 95%. With this patch, the effective hit rate is 99%.

I calculated this according to 1 - hits / ( hits + misses ), using the values from /proc/spl/kstat/zfs/arcstats.

@behlendorf
Copy link
Contributor

Interesting, and down right counter intuitive. Decreasing the cache since increases the odds of a cache hit, at least for desktop workloads. I think this might be reasonable for desktops but probably not OK for large memory servers.

I think we'd have better luck addressing these issues in the short term by decreasing SPL_KMEM_CACHE_OBJ_PER_SLAB and SPL_KMEM_CACHE_OBJ_PER_SLAB_MIN. By decreasing these values we put fewer objects on a slab by default which increases the odds of being able to free them despite fragmentation. I picked these values originally as a best guess long before I had a working ZPL, so there's a good chance they are not optimal. It would be interesting to decrease them by half and see the effect.

Additionally, making the changes described here would also be a good short term improvement.

#618 (comment)

The long term solution of course is to move these buffers off the slab and in to the Linux page cache. That however, will need to wait for 0.7.0 at the earliest since that's a significant change.

@ryao
Copy link
Contributor Author

ryao commented Apr 13, 2012

The commit message for 23bdb07 suggests that no accounting is done for internal fragmentation. If I recall correctly, the SLUB allocator's internal fragmentation chracteristics match those of the HOARD allocator, for which a paper was published. Here a link from Google:

https://parasol.tamu.edu/~rwerger//Courses/689/spring2002/day-3-ParMemAlloc/papers/berger00hoard.pdf

The theoretical limit on fragmentation for HOARD is 50%, which I believe is also true for SLUB. With the current limit of all but 4GB on a 16GB system, the maximum arc size with worst case fragmentation would be 24GB, with the unweighted average being 18GB. Both figures exceed what the system can actually store. This reality should only become worse with larger memory systems.

My home server has 16GB of RAM with a 6-disk raidz2 vdev for its pool. This patch appears to have addressed instability that I observed when doing simultaneous rsyncs of the mirrors for Gentoo, FreeBSD, OpenBSD, NetBSD and Dragonfly BSD with a few virtual machines running. Without it and with the patch from issue #618, the system would crash within 12 hours.

My desktop has 8GB of RAM. This effect was not nearly as pronounced as it was on my server. I believe that is because the probability of internal fragmentation causing the ARC to exceed a sane limit was remarkably low. Despite that, there was still significant memory pressure. I believe that caused excessive reclaims, which had a negative impact on the ARC hit rate.

Making the zfs_arc_max default of 1/2 of system memory might be safe, although 1/3 of system memory would seem to be a better figure. With that said, I am fairly certain that the current all but 4GB default on large memory systems causes issues.

@behlendorf
Copy link
Contributor

It's true that the ARC doesn't account for internal fragmentation in either the spl or slub allocators. It's also true that we do over commit memory on the system. However, most of that memory should be easily reclaimable by the shrinker callback since we only allow 1/4 of the ARC to be dirty.

That said, I'm not one to argue with the empirical evidence it helps. So I'm certainly willing to consider pulling in this change once I understand why it helps. When you mention instability on these systems what exactly happens? Does the system panic, go non-responsive, deadlock in some way?

Do these systems happen to have an L2ARC device? We recently identified a deadlock in that code which is made worse by the VM patch, although it's been a long standing but rare issue. It may be possible this is in fact what your hitting, but it's hard to say without a stack from the system. See commit behlendorf/zfs@85ab09b

@ryao
Copy link
Contributor Author

ryao commented Apr 13, 2012

My systems lack L2ARC devices. I described what happened on my server in the following comment:

#618 (comment)

@ryao
Copy link
Contributor Author

ryao commented Apr 16, 2012

This patch closes issue #642.

@behlendorf
Copy link
Contributor

I'm looking at pulling this patch in to the master tree and wondering if you've done any testing with it set to 1/2 of memory instead of a 1/3.

@ryao
Copy link
Contributor Author

ryao commented Apr 17, 2012

My desktop has 8GB of RAM and I had no stability issues with it before applying this patch. However, it has a small SSD, so I am not able to test the kinds of rsyncs on it that I tested on my server. For what it is worth, I now think 1/2 would be safe because allocations seem to be biased toward powers of 2.

I will not have time to do testing on my server until next week. I have asked @tstudios to test zfs_arc_max set to 1/2 of his RAM in issue #642. If he has time that he is willing to volunteer, we should be able to get feedback before I can look into this.

@ryao
Copy link
Contributor Author

ryao commented Apr 23, 2012

I have changed this patch to set arc_c_max to 1/2 instead of 1/3.

23bdb07 updated the ARC memory limits
to be 1/2 of memory or all but 4GB. Unfortunately, these values assume
zero internal fragmentation in the SLUB allocator, when in reality, the
internal fragmentation could be as high as 50%, effectively doubling
memory usage. This poses clear safety issues, because it permits the
size of ARC to exceed system memory.

This patch changes this so that the default value of arc_c_max is always
1/2 of system memory. This effectively limits the ARC to the memory that
the system has physically installed.
behlendorf added a commit to behlendorf/zfs that referenced this pull request May 21, 2018
Add missing parenthesis around btop and ptob macros to ensure
operation ordering is preserved after expansion.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants