Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC target size (arc_c) unnecessarily low #4869

Closed
snajpa opened this issue Jul 22, 2016 · 55 comments
Closed

ARC target size (arc_c) unnecessarily low #4869

snajpa opened this issue Jul 22, 2016 · 55 comments
Labels
Type: Performance Performance improvement or performance problem
Milestone

Comments

@snajpa
Copy link
Contributor

snajpa commented Jul 22, 2016

After a system has been running for a while and the ARC rotated through quite bit of data, it happens from time to time, that the system keeps reporting lots (>60%) free memory, but arc_c keeps targeting numbers way below arc_c_max.

It has been happening across all the workloads I put on ZFS - our production nodes with OpenVZ containers with 256G RAM, where there can be over 70G free memory according to /proc/meminfo, and arc_c is bouncing around lower tens of GB, rarely hitting above 30 GB. Meanwhile, arc_c_max is set to 128 GB.
I also have a node doing network block device server (userspace zvol consumer), which exhibits the same behavior.

The common denominator is that we're using RHEL6 kernel (with and without OpenVZ containers patchset).

I've tested 0.6.5-stable, with and without @tuxoko's ABD patches and both with and without recent @dweeezil changes regarding ARC dnode metadata limit (you can see what we're currently running in production at https://github.com/vpsfreecz/zfs/commits/master).

I'll attach example meminfo and arcstats from the NBD server, which has by far the cleanest workload (only about 15 clients for the nbd server and nothing else).

[root@node4.base48.cz]
 /proc/spl/kstat/zfs # cat /proc/meminfo 
MemTotal:       24527548 kB
MemFree:         3830496 kB
Buffers:        12981668 kB
Cached:             7484 kB
SwapCached:            0 kB
MemCommitted:    1048576 kB
VirtualSwap:           0 kB
Active:          2563892 kB
Inactive:       10436420 kB
Active(anon):       2568 kB
Inactive(anon):     9456 kB
Active(file):    2561324 kB
Inactive(file): 10426964 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:            327328 kB
Writeback:             0 kB
AnonPages:         11740 kB
Mapped:             2440 kB
Shmem:               284 kB
Slab:            1663372 kB
SReclaimable:     418692 kB
SUnreclaim:      1244680 kB
KernelStack:       10944 kB
PageTables:         2164 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12263772 kB
Committed_AS:      78476 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      405660 kB
VmallocChunk:   34359241856 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7680 kB
DirectMap2M:    25149440 kB
[root@node4.base48.cz]
 /proc/spl/kstat/zfs # cat arcstats 
6 1 0x01 94 4512 6146206719 269527224866007
name                            type data
hits                            4    303773997
misses                          4    115969204
demand_data_hits                4    197146302
demand_data_misses              4    38765205
demand_metadata_hits            4    68901307
demand_metadata_misses          4    9865768
prefetch_data_hits              4    37567591
prefetch_data_misses            4    67171298
prefetch_metadata_hits          4    158797
prefetch_metadata_misses        4    166933
mru_hits                        4    122996016
mru_ghost_hits                  4    5057401
mfu_hits                        4    150057636
mfu_ghost_hits                  4    624491
deleted                         4    946030744
mutex_miss                      4    67004
evict_skip                      4    95510569
evict_not_enough                4    560548
evict_l2_cached                 4    0
evict_l2_eligible               4    3836462893568
evict_l2_ineligible             4    179837358080
evict_l2_skip                   4    0
hash_elements                   4    1262924
hash_elements_max               4    2724817
hash_collisions                 4    236008565
hash_chains                     4    155721
hash_chain_max                  4    9
p                               4    2539293112
c                               4    5272808320
c_min                           4    33554432
c_max                           4    12558104576
size                            4    5272565160
hdr_size                        4    496561736
data_size                       4    4150026240
metadata_size                   4    341274624
dbuf_size                       4    284191200
dnode_size                      4    410880
bonus_size                      4    100480
anon_size                       4    20541440
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    3792111616
mru_evictable_data              4    3602395136
mru_evictable_metadata          4    2121728
mru_ghost_size                  4    1336942592
mru_ghost_evictable_data        4    452907008
mru_ghost_evictable_metadata    4    884035584
mfu_size                        4    678647808
mfu_evictable_data              4    529334272
mfu_evictable_metadata          4    70795264
mfu_ghost_size                  4    390600192
mfu_ghost_evictable_data        4    242814976
mfu_ghost_evictable_metadata    4    147785216
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    1529
memory_direct_count             4    0
memory_indirect_count           4    7769736
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    1122538920
arc_meta_limit                  4    9418578432
arc_dnode_limit                 4    941857843
arc_meta_max                    4    2216990600
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    392437760

@dweeezil
Copy link
Contributor

@snajpa It looks like something broke with your application of one of the dnode limit patches. The arcstats are completely missing "p" through "data_size". I figured I'd better mention this before looking at the repo link you posted.

@snajpa
Copy link
Contributor Author

snajpa commented Jul 22, 2016

@dweezil, it looks like something went wrong during copy/paste of the output :) I've updated the paste.

@snajpa
Copy link
Contributor Author

snajpa commented Jul 22, 2016

One thing that stands out is the 'buffers' in /proc/meminfo always jumps up to insane levels that I've never seen on a healthy system. I have no idea, which buffers does that indicate, if it's not SLAB nor vmalloc'd memory, then what kind of kernel memory is that?

@tuxoko
Copy link
Contributor

tuxoko commented Jul 22, 2016

@snajpa Can you tell the nbd to use O_DIRECT on zvol device?

@snajpa
Copy link
Contributor Author

snajpa commented Jul 22, 2016

@tuxoko I think (and Google seems to agree) that the NBD server doesn't support O_DIRECT. Btw, this is happening even when using ZPL datasets, not just ZVOLs.

@tuxoko
Copy link
Contributor

tuxoko commented Jul 22, 2016

@snajpa
Well, I asked that because you're wondering about buffers in /proc/meminfo. That's what you'll get when you don't use O_DIRECT on block device. But if the arc size issue have nothing to do with it then you don't need to worry about the buffers.

@tuxoko
Copy link
Contributor

tuxoko commented Jul 22, 2016

c_max 4 12558104576
You're arc only have 12.5GB, can you tune it up?

@snajpa
Copy link
Contributor Author

snajpa commented Jul 22, 2016

arc_c_max = 20GB:

[root@node4.base48.cz]
 /sys/module/zfs/parameters # cat /proc/spl/kstat/zfs/arcstats 
6 1 0x01 94 4512 6146206719 277601477176117
name                            type data
hits                            4    317718922
misses                          4    121598717
demand_data_hits                4    206101937
demand_data_misses              4    40767053
demand_metadata_hits            4    72080574
demand_metadata_misses          4    10399479
prefetch_data_hits              4    39368566
prefetch_data_misses            4    70255781
prefetch_metadata_hits          4    167845
prefetch_metadata_misses        4    176404
mru_hits                        4    128634979
mru_ghost_hits                  4    5279619
mfu_hits                        4    156933331
mfu_ghost_hits                  4    681828
deleted                         4    993556380
mutex_miss                      4    69347
evict_skip                      4    97195396
evict_not_enough                4    588513
evict_l2_cached                 4    0
evict_l2_eligible               4    4029363145216
evict_l2_ineligible             4    188845801472
evict_l2_skip                   4    0
hash_elements                   4    540371
hash_elements_max               4    2724817
hash_collisions                 4    247177269
hash_chains                     4    31939
hash_chain_max                  4    9
p                               4    17668274
c                               4    2234051808
c_min                           4    33554432
c_max                           4    21474836480
size                            4    2227068280
hdr_size                        4    222652312
data_size                       4    1662328832
metadata_size                   4    225165312
dbuf_size                       4    116410464
dnode_size                      4    410880
bonus_size                      4    100480
anon_size                       4    177047552
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1282109440
mru_evictable_data              4    1192435712
mru_evictable_metadata          4    307200
mru_ghost_size                  4    964448256
mru_ghost_evictable_data        4    299401216
mru_ghost_evictable_metadata    4    665047040
mfu_size                        4    428337152
mfu_evictable_data              4    309563392
mfu_evictable_metadata          4    51412992
mfu_ghost_size                  4    276713472
mfu_ghost_evictable_data        4    169824256
mfu_ghost_evictable_metadata    4    106889216
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    1529
memory_direct_count             4    0
memory_indirect_count           4    8163940
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    564739448
arc_meta_limit                  4    9418578432
arc_dnode_limit                 4    941857843
arc_meta_max                    4    2216990600
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    392437760
[root@node4.base48.cz]
 /sys/module/zfs/parameters # cat /proc/meminfo 
MemTotal:       24527548 kB
MemFree:        12624340 kB
Buffers:         8648076 kB
Cached:             3376 kB
SwapCached:            0 kB
MemCommitted:    1048576 kB
VirtualSwap:           0 kB
Active:          1632100 kB
Inactive:        7032596 kB
Active(anon):       4720 kB
Inactive(anon):     9464 kB
Active(file):    1627380 kB
Inactive(file):  7023132 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:            255164 kB
Writeback:             0 kB
AnonPages:         12752 kB
Mapped:             2832 kB
Shmem:               284 kB
Slab:            1138056 kB
SReclaimable:     382036 kB
SUnreclaim:       756020 kB
KernelStack:       10992 kB
PageTables:         2732 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12263772 kB
Committed_AS:      80004 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      405660 kB
VmallocChunk:   34359241856 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7680 kB
DirectMap2M:    25149440 kB

@dweeezil
Copy link
Contributor

@snajpa Is your kernel the 2.6.32 version typically used on EL6 systems? ZFS probably thinks the system is low on memory and, therefore, is not increasing the ARC target size. Could you please show /proc/vmstat (it has more detail than /proc/meminfo). It should be able to tell us a bit more about the memory situation. The system definitely has been under some memory pressure as shown by the high memory_indirect_count in both cases you've shown. In fact, it would be interesting to know whether it's increasing when the system is in this situation in which it won't expand the ARC.

As to nbd-server, if you're using it to serve zvols, you ought to be able to bypass the buffer hogging by using raw(8).

@snajpa
Copy link
Contributor Author

snajpa commented Jul 24, 2016

Yes, it's EL6 kernel.

Here's a current example from a system with OpenVZ (node usecase: general purpose containers - web, mails, ...)

[root@node10.prg.vpsfree.cz]
 ~ # cat /proc/spl/kstat/zfs/arcstats 
6 1 0x01 94 4512 8143005555 74338340547107
name                            type data
hits                            4    8041365039
misses                          4    16955272
demand_data_hits                4    7871813931
demand_data_misses              4    5485647
demand_metadata_hits            4    161372870
demand_metadata_misses          4    5034543
prefetch_data_hits              4    4029357
prefetch_data_misses            4    2978100
prefetch_metadata_hits          4    4148881
prefetch_metadata_misses        4    3456982
mru_hits                        4    27453972
mru_ghost_hits                  4    12193451
mfu_hits                        4    8005759251
mfu_ghost_hits                  4    79972
deleted                         4    2627990
mutex_miss                      4    4063
evict_skip                      4    64119896
evict_not_enough                4    428170
evict_l2_cached                 4    428227957248
evict_l2_eligible               4    309535373824
evict_l2_ineligible             4    217760807936
evict_l2_skip                   4    0
hash_elements                   4    5506597
hash_elements_max               4    5525231
hash_collisions                 4    5705407
hash_chains                     4    404103
hash_chain_max                  4    5
p                               4    3611612011
c                               4    54559713456
c_min                           4    33554432
c_max                           4    135144644608
size                            4    52206280104
hdr_size                        4    1289043360
data_size                       4    38245069312
metadata_size                   4    7297491456
dbuf_size                       4    1210645440
dnode_size                      4    3024507368
bonus_size                      4    950021440
anon_size                       4    74607616
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    3862830080
mru_evictable_data              4    214134272
mru_evictable_metadata          4    6647296
mru_ghost_size                  4    50135908864
mru_ghost_evictable_data        4    32160917504
mru_ghost_evictable_metadata    4    17974991360
mfu_size                        4    41605123072
mfu_evictable_data              4    37994991616
mfu_evictable_metadata          4    1613108224
mfu_ghost_size                  4    3799307264
mfu_ghost_evictable_data        4    2696947712
mfu_ghost_evictable_metadata    4    1102359552
l2_hits                         4    2548448
l2_misses                       4    14406782
l2_feeds                        4    78321
l2_rw_clash                     4    2
l2_read_bytes                   4    49781817856
l2_write_bytes                  4    181896283136
l2_writes_sent                  4    75635
l2_writes_done                  4    75635
l2_writes_error                 4    0
l2_writes_lock_retry            4    250
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    11128
l2_cdata_free_on_write          4    4105
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    193018721280
l2_asize                        4    82216541696
l2_hdr_size                     4    189501728
l2_compress_successes           4    6416113
l2_compress_zeros               4    0
l2_compress_failures            4    769235
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    1789250
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    13961210792
arc_meta_limit                  4    101358483456
arc_dnode_limit                 4    10135848345
arc_meta_max                    4    18094462128
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    4223266816
[root@node10.prg.vpsfree.cz]
 ~ # cat /proc/vmstat 
nr_free_pages 31807359
nr_inactive_anon 7504528
nr_active_anon 8316548
nr_inactive_file 523060
nr_active_file 545156
nr_unevictable 354234
nr_mlock 354234
nr_anon_pages 14388942
nr_mapped 1448491
nr_file_pages 2933178
nr_dirty 743
nr_writeback 127
nr_slab_reclaimable 773155
nr_slab_unreclaimable 4551536
nr_page_table_pages 304866
nr_kernel_stack 14158
nr_unstable 0
nr_bounce 0
nr_vmscan_write 4895779
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 838821
nr_vswap 0
numa_hit 8557637759
numa_miss 8840080
numa_foreign 8840080
numa_interleave 117581
numa_local 8552048676
numa_other 14429163
nr_anon_transparent_hugepages 0
pgpgin 376065096
pgpgout 2457110017
pswpin 1224208
pswpout 4895779
vswpin 0
vswpout 0
pgalloc_dma 0
pgalloc_dma32 8
pgalloc_normal 9938620996
pgalloc_movable 0
pgfree 9970435495
pgactivate 47194929
pgdeactivate 31650755
pgfault 11680230925
pgmajfault 11053985
pgrefill_dma 99554
pgrefill_dma32 99554
pgrefill_normal 29252129
pgrefill_movable 0
pgsteal_dma 0
pgsteal_dma32 0
pgsteal_normal 5850448
pgsteal_movable 0
pgscan_kswapd_dma 99554
pgscan_kswapd_dma32 99554
pgscan_kswapd_normal 15552896
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_dma32 0
pgscan_direct_normal 0
pgscan_direct_movable 0
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 239637760
kswapd_steal 4990294
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 6758
kswapd_high_wmark_hit_quickly 4778
kswapd_skip_congestion_wait 3
pageoutrun 100395
allocstall 0
pgrotated 5001698
compact_blocks_moved 14498
compact_pages_moved 462475
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 223755
unevictable_pgs_scanned 0
unevictable_pgs_rescued 555279
unevictable_pgs_mlocked 912022
unevictable_pgs_munlocked 542919
unevictable_pgs_cleared 13934
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 0

@behlendorf behlendorf added this to the 0.7.0 milestone Jul 26, 2016
@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Jul 26, 2016
@dweeezil
Copy link
Contributor

@snajpa Thanks for the fresh stats. The thing that still sticks out is the high value for memory_indirect_count. Is that counter increasing pretty much continuously? Have you made any VM tunings to the system; possibly adjusting swappiness? Also, the nr_vmscan_write 4895779 is a pretty clear indication the system has been trying to swap/page stuff out. Is the memory evenly distributed between the NUMA nodes? If memory_indirect_count is increasing regularly, you could probably find the cause pretty easily with perf top -ag.

@snajpa
Copy link
Contributor Author

snajpa commented Jul 27, 2016

@dweeezil Some of the systems where I see this aren't even NUMA, so I would guess that's not relevant, but I do have swappiness set to 0, and we don't use any swap.

I can see kswapd hitting the CPU, but I have no idea how to trace why is it running.

@snajpa
Copy link
Contributor Author

snajpa commented Jul 27, 2016

We've increased zfs_arc_min to 70GB on that one node, because it was unusable with just a few GBs cached, when the workload dataset is clearly larger.

That seems to help, though there's now kswapd0/1 chewing up on something heavily (each one is using ~70-80% of a core).

@kernelOfTruth
Copy link
Contributor

@snajpa

Swapping pages to a swap partition isn't the only
job of kswapd. It also reaps the kernel caches
(inode cache, dentry cache, etc.), which does not
involve any swapping. So disabling kswapd isn't the
right thing to do, even on a swapless system

source: http://kernelnewbies.kernelnewbies.narkive.com/RMhKyKxj/ok-but-what-about-kswapd-in-a-swapless-system

@snajpa
Copy link
Contributor Author

snajpa commented Jul 29, 2016

@kernelOfTruth I know +- why kswapd exists in the system, I have no idea though, how to say why the system thinks it's low on memory, when there's - according to the kernels own accounting - a lot of memory free.

Also, is there any way how to make perf skip searching for symbols? I've got ~5k processes in the system, all of them are run in containers, but perf keeps searching for symbols locally - so I'd like to skip that, but haven't figured out how yet. Also it looks like it's impossible to launch perf even if I tried to wait for the search to fail through all of the different executables.

Any ideas on how to debug this in a way that actually works would be extremely appreciated...

Thanks!

@dweeezil
Copy link
Contributor

@snajpa A good start might be perf top -ag -e vmscan:\*. Also check out /proc/zoneinfo which breaks down the vmstats per-node/per-zone and see if any of the zones look to be low on free memory relative to their watermarks.

@snajpa
Copy link
Contributor Author

snajpa commented Aug 1, 2016

@dweeezil invalid or unsupported event: 'vmscan:*'

https://gist.github.com/snajpa/c07187597bb9eb13779c6544861b43d5

@dweeezil
Copy link
Contributor

dweeezil commented Aug 1, 2016

@snajpa I was kind of afraid that the 2.6.32 kernel wouldn't have those trace points. In fact, after looking, I see its mm/vmscan.c file doesn't have any trace points. Did you find anything interesting in /proc/zoneinfo?

@snajpa
Copy link
Contributor Author

snajpa commented Aug 1, 2016

@dweeezil nothing which would have caught my eye, take a look: https://gist.github.com/snajpa/a8a02c207bec3b9b62c8b32b7118a99f

@snajpa
Copy link
Contributor Author

snajpa commented Aug 1, 2016

@ireverent
Copy link

Hey, any luck so far? I am seeing a similar issue in our environment in which our metadata size drops to an unacceptable level.

@dweeezil
Copy link
Contributor

dweeezil commented Aug 8, 2016

@ireverent Are you also using a 2.6.32 EL kernel?

@ireverent
Copy link

No, we are using 4.4

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

I believe I am also seeing this. 128GB RAM, I have set zfs_arc_max to 113G. At some point, the system seems to be refusing to grow ARC at all. Snipped from arc_summary.pl:

ARC Size: 70.81% 82.44 GiB
Target Size: (Adaptive) 70.84% 82.47 GiB
Min Size (Hard Limit): 0.03% 32.00 MiB
Max Size (High Water): 3725:1 116.42 GiB

I'm pretty sure this is wrong, as running arcstat.pl 600 shows lines like this:

11:23:54 68 21 30 21 30 0 0 3 48 82G 82G

e.g. an ARC miss rate of about 70%, so one would think it would be filling ARC on these misses, no? There is no swapping going on at all:

[root@nas1 ~]# free -m
total used free shared buff/cache available
Mem: 128831 122230 4723 47 1877 4657
Swap: 4095 0 4095

exporting and re-importing the pool seems to 'fix' this. This is v0.6.5.7-1 on CentOS 7, kernel 3.10.0-327.28.2. Is there any useful info I can provide?

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

Interesting. I tried setting zfs_arc_max again, but nothing happened. I then tried setting to a value 1GB bigger, and voila:

11:50:45 66 20 31 20 31 0 0 4 52 82G 117G

if I then reset it to the original max:

11:51:40 4 2 50 2 50 0 0 0 0 83G 116G

that works as expected (e.g. it is 1GB smaller.)

I suppose I can create a cron job that runs every hour to keep diddling it, although that could be tricky, as it doesn't seem to do anything if the size hasn't changed...

@dweeezil
Copy link
Contributor

dweeezil commented Sep 2, 2016

@dswartz Could you please provide a couple samples of your arcstats a few seconds apart.

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

I have it running with 60 second polling. I will tee the output into a file so I can see when it changes...

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

I restarted with 10 second interval. I have seen instances where the target arc size was reduced (but not drastically.) In all 3 cases, it seemed to be triggered when the current arc size hit the target size. I will post an update when I have another event...

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

How do I see arc_no_grow and arc_need_free? Neither is printed by arc_summary.pl...

@dweeezil
Copy link
Contributor

dweeezil commented Sep 2, 2016

@dswartz cat /proc/spl/kstat/zfs/arcstats is what we usually need to see to troubleshoot memory-related issues. The summarization scripts are exactly that, summary.

@dswartz
Copy link
Contributor

dswartz commented Sep 2, 2016

Cool, thanks!

@dswartz
Copy link
Contributor

dswartz commented Sep 5, 2016

Happened again. 79GB target ARC size now. Info you requested:

6 1 0x01 91 4368 1609989890 154836914265905
name type data
hits 4 12917314
misses 4 6688351
demand_data_hits 4 10422089
demand_data_misses 4 5281203
demand_metadata_hits 4 1406593
demand_metadata_misses 4 704798
prefetch_data_hits 4 894332
prefetch_data_misses 4 641441
prefetch_metadata_hits 4 194300
prefetch_metadata_misses 4 60909
mru_hits 4 3872046
mru_ghost_hits 4 87626
mfu_hits 4 8864932
mfu_ghost_hits 4 319701
deleted 4 1124818
mutex_miss 4 20
evict_skip 4 856
evict_not_enough 4 88
evict_l2_cached 4 0
evict_l2_eligible 4 233959578112
evict_l2_ineligible 4 35349286912
evict_l2_skip 4 0
hash_elements 4 1227049
hash_elements_max 4 1838292
hash_collisions 4 919697
hash_chains 4 42493
hash_chain_max 4 5
p 4 9745039972
c 4 84852810315
c_min 4 33554432
c_max 4 125000000000
size 4 84819256536
hdr_size 4 465331608
data_size 4 83892419584
metadata_size 4 262365184
other_size 4 199140160
anon_size 4 11997696
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 9740599296
mru_evictable_data 4 9614925824
mru_evictable_metadata 4 36272128
mru_ghost_size 4 73121461248
mru_ghost_evictable_data 4 72675520000
mru_ghost_evictable_metadata 4 445941248
mfu_size 4 74402187776
mfu_evictable_data 4 74269839872
mfu_evictable_metadata 4 5433344
mfu_ghost_size 4 153026560
mfu_ghost_evictable_data 4 152961024
mfu_ghost_evictable_metadata 4 65536
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_cdata_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 1048
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 926836952
arc_meta_limit 4 50658455040
arc_meta_max 4 1844651392
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2110767104

Apologies, but I don't know how to insert the command output and have everything line up correctly as the earlier poster did.

@lnicola
Copy link
Contributor

lnicola commented Sep 5, 2016

Apologies, but I don't know how to insert the command output and have everything line up correctly as the earlier poster did.

You can add a line with "```" above it and one below:

6 1 0x01 91 4368 1609989890 154836914265905
name type data
hits 4 12917314

@dswartz
Copy link
Contributor

dswartz commented Sep 5, 2016

Umm, that didn't work. I ended up with one pile of output smooshed together :(

@dswartz
Copy link
Contributor

dswartz commented Sep 5, 2016

6 1 0x01 91 4368 1609989890 175020932728847 name type data hits 4 14476668 misses 4 7335953

@dswartz
Copy link
Contributor

dswartz commented Sep 5, 2016

Like the above I just posted... Unless I did something wrong?

@lnicola
Copy link
Contributor

lnicola commented Sep 5, 2016

Well, in the comment you've deleted you started with a single "`" instead of three and ended it with a ~ instead of the three backticks. The second try was fine, but it's only the first line.

Anyway, don't worry about it 😄.

@dweeezil
Copy link
Contributor

dweeezil commented Sep 6, 2016

@dswartz Your ARC appears to have reached a stasis point with the ARC target likely growing and shrinking at the same rate. If you want to effectively give the ARC priority over other memory consumers, you can set zfs_arc_min which will be reflected as c_min. In the most extreme case, set it to the same value as zfs_arc_max and then nothing will be freed in response to memory pressure.

@dswartz
Copy link
Contributor

dswartz commented Sep 6, 2016

Lnicola, thanks. I think I misread the 3 backticks as a single backtick in quotes :)

@dswartz
Copy link
Contributor

dswartz commented Sep 6, 2016

Dweezil, thanks. This host is doing very little that would exert significant memory pressure. I just checked and it's now 53GB! I can't for the life of me, imagine what would have demanded 60GB. For now though, I'll just set zfs_arc_min to 100GB or some such...

@behlendorf behlendorf removed this from the 0.7.0 milestone Oct 11, 2016
@DeHackEd
Copy link
Contributor

I've seen this happen on my own systems.

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

See if this helps.

@behlendorf behlendorf added this to the 0.8.0 milestone Oct 11, 2016
@jonathanvaughn
Copy link

jonathanvaughn commented Jan 19, 2017

We've seen similar behavior on several systems. Our working set and available RAM aren't as large, but similarly we see ARC shrinking well below what would make sense. The constant ARC eviction when we perform git clones or status etc, really kills performance.

For example, this system:


# free -m
              total        used        free      shared  buff/cache   available
Mem:          32031        8519        6624        1663       16887       16296
Swap:         32767        3651       29116
# arcstat.py
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
14:56:34     0     0      0     0    0     0    0     0    0   2.6G  2.1G

Kernel is 4.8.2-1.el7.elrepo.x86_64 on CentOS7 (Kernel is from EL-Repo, not the default kernel of CentOS7)

zfs_arc_min is 512MB and zfs_arc_max is 8GB. There's 16GB free, and yet it's trying to constantly shrink well below the max (which it could allocate and still have ~8GB left over). There's no deduplication going on, and no L2ARC to manage, but there is about 1.8TB of 2.6TB of the ZFS pool in use, mostly lots and lots of source code / images / etc for projects (lots of small files).

I've just tentatively tried DeHackEd's suggestion to set hugepage enabled to never, and arc size appears to be (very slowly) growing at the moment, but I'm not sure what the trade offs are if this fixes it.

@rob-c
Copy link

rob-c commented Jun 19, 2017

I have just seen the same problem on a production system running zfs to host a web forum (with the zfs docker storage if it is of interest).
The only parameter which had been changed prior to this was to set the zfs_arc_max to 10Gb which seemed a safe on a system with 24Gb RAM.

I was seeing arc_need_free fixed at 1 whilst the arc cache size dropped to 32M and stayed there for at least 1 hr and only increased when I changed the zfs_arc_min.
Edit: Throughout the time where ARC was refusing to grow there was at least 6Gb of RAM completely free on the system with a very small amount of kernel caches being used.

After deactivating transparent_hugepage as suggested by @DeHackEd I saw the arc_need_free finally set itself to 0 and the arc cache began to grow again.
The arc reclaim seemed fixed at 100% of 1 of the CPU cores throughout this period.

Is there some scenario where the memory isn't being freed correctly which leads to the cache draining to the minimum size?

@gregwalters
Copy link

gregwalters commented Aug 28, 2017

I have a similar problem with a host that has 32GiB of ram and is a isci/nfs target. I've got ~28GiB dedicated to ZFS and the ARC target size keeps shrinking despite seemingly low pressure for ram and huge pages are disabled. Running 3.10.0-514.26.2.el7.x86_64 on CentOS 7 w/ zfs-0.7.0-1.el7_3.x86_64.

Free output:

              total        used        free      shared  buff/cache   available
Mem:          31986       10581         336          24       21068       13440
Swap:         15219           1       15218

zpool status

pool: tank
 state: ONLINE
  scan: scrub in progress since Mon Aug 28 12:01:02 2017
	3.44T scanned out of 8.61T at 186M/s, 8h5m to go
	0B repaired, 40.03% done
config:

	NAME         STATE     READ WRITE CKSUM
	tank         ONLINE       0     0     0
	  raidz2-0   ONLINE       0     0     0
	    crypt-9  ONLINE       0     0     0
	    crypt-0  ONLINE       0     0     0
	    crypt-5  ONLINE       0     0     0
	    crypt-6  ONLINE       0     0     0
	    crypt-4  ONLINE       0     0     0
	    crypt-8  ONLINE       0     0     0
	    crypt-3  ONLINE       0     0     0
	    crypt-1  ONLINE       0     0     0
	    crypt-7  ONLINE       0     0     0
	    crypt-2  ONLINE       0     0     0

errors: No known data errors

arcstat.py 5 5

    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
17:25:33     0     0    100     0    0     0  100     0  100    13G   13G
17:25:38   271   140     51     1    2   139   68   140   51    13G   13G
17:25:43   214    74     34     1    1    73   56    74   34    13G   13G
17:25:48   216    82     38     1    1    80   61    82   38    13G   13G
17:25:53   209    83     39     2    2    81   61    83   39    13G   13G

meminfo, vmstat, and arcstats:

MemTotal:       32754556 kB
MemFree:          312836 kB
MemAvailable:   13751116 kB
Buffers:        12844304 kB
Cached:           515612 kB
SwapCached:          528 kB
Active:          6688140 kB
Inactive:        6714184 kB
Active(anon):      20428 kB
Inactive(anon):    47136 kB
Active(file):    6667712 kB
Inactive(file):  6667048 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      15585276 kB
SwapFree:       15583956 kB
Dirty:                 8 kB
Writeback:             0 kB
AnonPages:         42196 kB
Mapped:            39504 kB
Shmem:             25156 kB
Slab:            8237024 kB
SReclaimable:     510508 kB
SUnreclaim:      7726516 kB
KernelStack:        8816 kB
PageTables:         6076 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    31962552 kB
Committed_AS:     441084 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      270712 kB
VmallocChunk:   34359232592 kB
HardwareCorrupted:     0 kB
AnonHugePages:      4096 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      152980 kB
DirectMap2M:    33374208 kB
nr_free_pages 77714
nr_alloc_batch 2473
nr_inactive_anon 11784
nr_active_anon 5107
nr_inactive_file 1666469
nr_active_file 1666428
nr_unevictable 0
nr_mlock 0
nr_anon_pages 9525
nr_mapped 9876
nr_file_pages 3339320
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 127604
nr_slab_unreclaimable 1931600
nr_page_table_pages 1519
nr_kernel_stack 409
nr_unstable 0
nr_bounce 0
nr_vmscan_write 671
nr_vmscan_immediate_reclaim 7
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 6289
nr_dirtied 6748689
nr_written 6744700
numa_hit 1417613244
numa_miss 0
numa_foreign 0
numa_interleave 28076
numa_local 1417613244
numa_other 0
workingset_refault 343023
workingset_activate 258681
workingset_nodereclaim 0
nr_anon_transparent_hugepages 2
nr_free_cma 0
nr_dirty_threshold 1000261
nr_dirty_background_threshold 333420
pgpgin 14074856909
pgpgout 27446361
pswpin 524
pswpout 670
pgalloc_dma 0
pgalloc_dma32 679466584
pgalloc_normal 6981146798
pgalloc_movable 0
pgfree 7660706688
pgactivate 2430223
pgdeactivate 91017
pgfault 25084946
pgmajfault 802
pgrefill_dma 0
pgrefill_dma32 74508
pgrefill_normal 17593
pgrefill_movable 0
pgsteal_kswapd_dma 0
pgsteal_kswapd_dma32 240219
pgsteal_kswapd_normal 1077263
pgsteal_kswapd_movable 0
pgsteal_direct_dma 0
pgsteal_direct_dma32 11919
pgsteal_direct_normal 558107
pgsteal_direct_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_dma32 242605
pgscan_kswapd_normal 1078725
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_dma32 11983
pgscan_direct_normal 558980
pgscan_direct_movable 0
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 15078144
kswapd_inodesteal 35
kswapd_low_wmark_hit_quickly 178824
kswapd_high_wmark_hit_quickly 174320
pageoutrun 369485
allocstall 3620
pgrotated 762
drop_pagecache 0
drop_slab 0
numa_pte_updates 0
numa_huge_pte_updates 0
numa_hint_faults 0
numa_hint_faults_local 0
numa_pages_migrated 0
pgmigrate_success 11342
pgmigrate_fail 0
compact_migrate_scanned 2989350
compact_free_scanned 814953
compact_isolated 45855
compact_stall 6
compact_fail 4
compact_success 2
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 279079
unevictable_pgs_scanned 0
unevictable_pgs_rescued 280048
unevictable_pgs_mlocked 282570
unevictable_pgs_munlocked 282570
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
thp_fault_alloc 9
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0
balloon_inflate 0
balloon_deflate 0
balloon_migrate 0
12 1 0x01 92 4416 69097315429 938040014394344
name                            type data
hits                            4    12480640
misses                          4    20122451
demand_data_hits                4    465375
demand_data_misses              4    11449129
demand_metadata_hits            4    6633623
demand_metadata_misses          4    1117724
prefetch_data_hits              4    123692
prefetch_data_misses            4    945730
prefetch_metadata_hits          4    5257950
prefetch_metadata_misses        4    6609868
mru_hits                        4    6040817
mru_ghost_hits                  4    978817
mfu_hits                        4    3981513
mfu_ghost_hits                  4    62618
deleted                         4    51936306
mutex_miss                      4    32666
evict_skip                      4    4466820
evict_not_enough                4    28854
evict_l2_cached                 4    0
evict_l2_eligible               4    56731549696
evict_l2_ineligible             4    98687961088
evict_l2_skip                   4    0
hash_elements                   4    2623205
hash_elements_max               4    18117912
hash_collisions                 4    47186611
hash_chains                     4    545919
hash_chain_max                  4    16
p                               4    395538832
c                               4    15053859264
c_min                           4    33554432
c_max                           4    30064771072
size                            4    15042669096
compressed_size                 4    11783371776
uncompressed_size               4    28226892800
overhead_size                   4    1029297664
hdr_size                        4    868818424
data_size                       4    2048253952
metadata_size                   4    10764415488
dbuf_size                       4    309333680
dnode_size                      4    759828992
bonus_size                      4    292018560
anon_size                       4    1230848
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    72796160
mru_evictable_data              4    1024
mru_evictable_metadata          4    757760
mru_ghost_size                  4    13356555776
mru_ghost_evictable_data        4    99496960
mru_ghost_evictable_metadata    4    13257058816
mfu_size                        4    12738642432
mfu_evictable_data              4    2016910336
mfu_evictable_metadata          4    9458328576
mfu_ghost_size                  4    1378353152
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    1378353152
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    414
memory_indirect_count           4    84597
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    12994415144
arc_meta_limit                  4    22548578304
arc_dnode_limit                 4    2254857830
arc_meta_max                    4    16680802792
arc_meta_min                    4    16777216
sync_wait_for_async             4    196473
demand_hit_predictive_prefetch  4    454417
arc_need_free                   4    0
arc_sys_free                    4    524072896

@lorenz
Copy link
Contributor

lorenz commented Dec 17, 2017

Same thing here on Linux 4.13.9 / ZoL 0.7.3 / 32GB RAM / single socket (no NUMA) / Disabled THPs. Currently 12GB used, no defined arc_min/max, ARC sits around 1.3GB (which gives unacceptable performance).
The node runs anywhere between 100 and 200 containers with dedicated cgroup limits, so some memory cgroup OOMs happen (I can see some correlation between that and ARC tanking). Enabling THP basically kills these nodes, so there might be some interesting memory issues with high-density stuff at play here.

@Pascal666
Copy link

Have any devs been able to reproduce this issue or do you still need more information? I'm running into this problem too.

@chrisrd
Copy link
Contributor

chrisrd commented Mar 4, 2018

If you're on a v3.14+ kernel, it's worth seeing if e9a7729 helps. See also #7170

@Pascal666
Copy link

Thank you chrisrd, but I am using v0.7.6 on Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017.

I did not see this problem until I upgraded from v0.6.5.9 on Linux version 3.10.0-229.20.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Tue Nov 3 19:10:07 UTC 2015 a few weeks ago.

Like most (all?) of the users above, I too am using containers, but mine do not have memory limits. I am using LXC, but I see OpenVZ and Docker listed above. No idea if this bug is actually related to containers or if there is simply a good percentage of ZFS users using containers.

A couple days ago I set minimum arc size to maximum arc size:

# cat /sys/module/zfs/parameters/zfs_arc_min
0
# grep c_max /proc/spl/kstat/zfs/arcstats
c_max                           4    75964534784
# echo 75964534784 > /sys/module/zfs/parameters/zfs_arc_min

waited for the arc to grow to max, then set it back to 0. Over the space of 10 minutes target dropped to just over 92% of max and stayed there for a day, but then began slowly climbing again and is now back to 100%.

I will monitor target to see if it falls again and if so attempt to determine why, but this box does not normally experience memory pressure.

@mailinglists35
Copy link

behlendorf closed this on Aug 4

@behlendorf any reason why this was closed?
observing this behaviour on 0.8.0-rc2

@behlendorf
Copy link
Contributor

@mailinglists35 I was under the impression this has been resolved. If you're seeing behavior similar to this in 0.8.0-rc2 can you open a new issue with the relevant details.

@mailinglists35
Copy link

resolved in what commit? why open a new issue and throw away all valuable information from this one, since this issue was never solved and there are multiple users affected?

@Pascal666
Copy link

Problem still exists in 0.7.12 on Linux version 3.10.0-957.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Nov 8 23:39:32 UTC 2018

@izzy
Copy link

izzy commented Jul 22, 2019

Can confirm the problem still exists on 0.7.12-2+deb10u1 under Debian 10 running with Kernel 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux. We're not using containers on the system, it's just a plain file server storing mostly small files(a bit above 100M inodes total in the affected node's zpool)

No obvious memory pressure(8GB free mem) but low ARC size and also super low L2ARC usage. Drops in both ARC and L2ARC have occured after the upgrade from 0.6.5 to 0.7.x, before both were used extensively, ARC and L2ARC were usually filled(even when primarycache was set to metadata). Now ARC would utilise ~30% of arc_max(or whatever arc_min was manually set to) and L2ARC <5% of it's size.

I really can't see any reason why this happens and what's even worse imho is I can't really tell if it's even a problem. The system itself seems to perform similar to before. I can't conclusively say that that's good though, since the upgrade happened to get around the bug in 0.6.5 that would eat a lot of performance with arc_prune processes, not sure if that's maybe also related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests