Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

spl_taskq_thread_dynamic=1 zvol performance regression #470

Closed
akorn opened this issue Aug 12, 2015 · 9 comments
Closed

spl_taskq_thread_dynamic=1 zvol performance regression #470

akorn opened this issue Aug 12, 2015 · 9 comments
Milestone

Comments

@akorn
Copy link

akorn commented Aug 12, 2015

Hi,

after upgrading from an earlier 0.6.3 version without spl_taskq_thread_dynamic to one with spl_taskq_thread_dynamic=1 by default I noticed very high load and poor interactive response times on one of my boxes.

Investigating further I found that in iostat -x 3 the zvols were reported at 100% %util while the physical disks were idle; await and svctime for the zvols were also high, in the thousands (sometimes independently, sometimes both).

There were 32 zvol threads running (the configured maximum).

@DeHackEd suggested on IRC that the problem may have to do with spl_taskq_thread_dynamic, so I set it to 0 and rebooted, and the problem is gone.

FWIW, some information about thet box:

  • kernel 3.10.56-vs2.3.6.9 (linux-vserver patch)
  • 40GB RAM
  • 2 Xeon X5650 CPUs (12 physical cores, 24 with HT in total)
  • zpool consists of 6 6TB SAS disks, attached to a HP p410 with 512MB BBWC cache, arranged in 2 mirror vdevs of 3 disks each, plus a 200G SATA SSD as L2ARC.

The box runs twenty-odd vserver guests; its swap as well as the shared root directory of the guests are on zvols. These are the only two zvols; there are; there are about 260 zfs filesystems. Most write I/O is asynchronous.

@akorn
Copy link
Author

akorn commented Aug 14, 2015

OK, the problem is not quite gone; I still get load spikes like this occasionally:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.42    0.56    0.81    7.67    0.00   90.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.33    0.00     2.67     0.00    16.00     0.00   10.00   10.00    0.00  10.00   0.33
sdb               0.00     0.00    1.00    0.00     5.33     0.00    10.67     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    1.00    0.00     5.33     0.00    10.67     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    1.00    0.00     8.00     0.00    16.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    1.33    0.00     6.67     0.00    10.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   25.00    0.00    63.83     0.00     5.11     0.00    0.00    0.00    0.00   0.00   0.00
zd0               0.00     0.00    2.00    0.67    34.67     4.00    29.00     2.95 1823.75 2231.67  600.00 375.00 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.42    0.54    0.60    9.08    0.00   89.36

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    1.00    0.00     9.33     0.00    18.67     0.01    6.67    6.67    0.00   6.67   0.67
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.67    0.00    18.67     0.00    56.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    2.33    0.00    30.67     0.00    26.29     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    2.33    0.00    21.33     0.00    18.29     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    2.33    0.00    25.33     0.00    21.71     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   35.67    0.00   108.83     0.00     6.10     0.00    0.00    0.00    0.00   0.00   0.00
zd0               0.00     3.33    1.67    0.00    41.33     0.00    49.60     4.60 1718.00 1718.00    0.00 600.00 100.00
zd16              0.00     0.00    0.67    0.00     2.67     0.00     8.00     0.01   20.00   20.00    0.00  10.00   0.67

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.38    0.53    0.83   12.01    0.00   86.25

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.33    0.00     4.00     0.00    24.00     0.00   10.00   10.00    0.00  10.00   0.33
sdc               0.00     0.00    0.33    0.00    14.67     0.00    88.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.33    0.00     8.00     0.00    48.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    1.33    0.00    30.67     0.00    46.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    2.00    0.00    34.67     0.00    34.67     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   22.00    0.00    58.67     0.00     5.33     0.01    0.45    0.45    0.00   0.15   0.33
zd0               0.00     0.00    1.00    0.33    16.00    14.67    46.00     6.98 3120.00 3280.00 2640.00 750.00 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.50    0.63   12.24    0.00   86.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.33    0.00     1.33     0.00     8.00     0.01   20.00   20.00    0.00  20.00   0.67
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.33    0.00     1.33     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    2.00    0.00    58.67     0.00    58.67     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.33    0.00    14.67     0.00    88.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   33.33    0.00   106.83     0.00     6.41     0.00    0.00    0.00    0.00   0.00   0.00
zd0               0.00     0.33    1.00    0.00    14.67     0.00    29.33    11.69 4340.00 4340.00    0.00 1000.00 100.00
zd16              1.67     0.00    0.67    0.00     9.33     0.00    28.00     0.39  580.00  580.00    0.00 580.00  38.67

While this is happening, the following processes are in D state:

  PID USER      PR  NI    VIRT    RES   CODE    SHR    DATA   SWAP S nTH %MEM  %CPU     TIME+ WCHAN      COMMAND                                                                                                                      
 2364 root      39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:16.37 buf_hash_+ [z_wr_int_6]                                                                                                                 
 2667 root       0 -20       0      0      0      0       0      0 D   1  0.0   0.3   5:15.89 cv_wait_c+ [txg_sync]                                                                                                                   
 7088 root      20   0  176804 111940    456   1064  154144      0 D   1  0.3   0.0   1:31.78 cv_wait_c+ rsync -rvltDSH --no-g --perms --chmod=D2750,F640 --chmod=Dug+s,Dug+x,Fug+r,Dgo-w,Fgo-w --contimeout=60 --timeout=300 --parti+
13947 root      20   0   64796  32708    100   1220   32912      0 D   1  0.1   0.0   0:08.61 cv_wait_c+ zfs list -H -t snapshot -S creation -o name                                                                                  
14948 xxxxxxx   20   0  129516   8152   9644   6540    1424    196 D   1  0.0   0.0   0:00.58 cv_wait_c+ /usr/sbin/smbd -F                                                                                                            
22383 nobody    20   0  133612   5284   9644   3780    1284    192 D   1  0.0   0.0   0:00.03 cv_wait_c+ /usr/sbin/smbd -F                                                                                                            

@akorn
Copy link
Author

akorn commented Aug 15, 2015

Another similar occurrence:

  PID USER                PR  NI    VIRT    RES   CODE    SHR    DATA   SWAP S nTH %MEM  %CPU     TIME+ WCHAN                COMMAND                                                                                                  
 1243 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.12 call_rwsem_down_rea+ [zvol]                                                                                                   
 1247 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.49 call_rwsem_down_rea+ [zvol]                                                                                                   
 1248 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.56 call_rwsem_down_rea+ [zvol]                                                                                                   
 1249 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.29 call_rwsem_down_rea+ [zvol]                                                                                                   
 1250 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.95 call_rwsem_down_rea+ [zvol]                                                                                                   
 1254 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.01 call_rwsem_down_rea+ [zvol]                                                                                                   
 1256 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.80 call_rwsem_down_rea+ [zvol]                                                                                                   
 1257 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.70 call_rwsem_down_rea+ [zvol]                                                                                                   
 1260 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.60 call_rwsem_down_rea+ [zvol]                                                                                                   
 1261 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.74 call_rwsem_down_rea+ [zvol]                                                                                                   
 1262 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.86 call_rwsem_down_rea+ [zvol]                                                                                                   
 1263 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.53 call_rwsem_down_rea+ [zvol]                                                                                                   
 1264 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.02 call_rwsem_down_rea+ [zvol]                                                                                                   
 1265 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.69 call_rwsem_down_rea+ [zvol]                                                                                                   
 1266 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.57 call_rwsem_down_rea+ [zvol]                                                                                                   
 1267 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.97 call_rwsem_down_wri+ [zvol]                                                                                                   
 1268 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:15.08 call_rwsem_down_rea+ [zvol]                                                                                                   
 1270 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.90 cv_wait_common       [zvol]                                                                                                   
 1272 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.77 call_rwsem_down_rea+ [zvol]                                                                                                   
 1273 root                39  19       0      0      0      0       0      0 D   1  0.0   0.0   0:14.72 call_rwsem_down_rea+ [zvol]                                                                                                   
 2667 root                 0 -20       0      0      0      0       0      0 D   1  0.0   0.0  22:38.80 cv_wait_common       [txg_sync]                                                                                               
 3063 root                20   0   16900   4392    180   3360    1184      0 D   1  0.0   0.0   0:00.02 sleep_on_page_killa+ svn -q update     
 3958 root                20   0   11104   3144    688   1560    1864    336 D   1  0.0   0.0   0:00.84 sleep_on_buffer      /bin/zsh                                                                                                 
 4252 root                20   0       0      0      0      0       0      0 D   1  0.0   0.0   0:07.71 blkdev_issue_discard [jbd2/zd0-8]                                                                                             
 7072 root                20   0   33684   1472    100   1168    1800      0 D   1  0.0   0.0   0:00.01 cv_wait_common       zfs list -H -t filesystem volume -s name -o name com.sun:auto-snapshot com.sun:auto-snapshot:hourly      
22498 root                20   0   34616   2516    100   1216    2732      0 D   1  0.0   0.0   0:01.06 cv_wait_common       zfs list -H -t filesystem volume -s name -o name com.sun:auto-snapshot com.sun:auto-snapshot:frequent    
24685 postgres            20   0  119560    968   5736    520    1456   1380 D   1  0.0   0.0   0:22.84 sleep_on_buffer      postgres: stats collector process                                                                        
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.67    2.67     2.67    28.00    18.40     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    2.33     0.00    28.00    24.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    1.33    2.67     5.33    28.00    16.67     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    2.33     0.00    28.00    24.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    2.67     0.00    28.00    21.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.67    1.67     2.67    28.00    26.29     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   18.00    0.00    64.50     0.00     7.17     0.00    0.00    0.00    0.00   0.00   0.00
zd0               0.00     0.00    0.33    0.33    25.33     2.67    84.00    10.84 8600.00 8960.00 8240.00 1500.00 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sysrq-w stack traces:

[310548.219601] SysRq : Show Blocked State
[310548.220647]   task                        PC stack   pid father
[310548.220956] txg_sync        D ffff880a17a931c0     0  2667      2 0x00000000
[310548.220961]  ffff8809cc6a16e0 0000000000000000 0000000101d9610b ffff8809cc6a16a8
[310548.222911]  ffff88064f516598 ffff8809cc6a16e8 ffffffff8105e241 00000001cc6a1718
[310548.224780]  ffff880a178d31c0 ffffffff81607f40 ffff8809e3682450 ffff880a178d31c0
[310548.226664] Call Trace:
[310548.227200]  [<ffffffff8143c02f>] ? _raw_spin_lock+0x9/0xb
[310548.228440]  [<ffffffff8105e241>] ? idle_balance+0xce/0xf6
[310548.229667]  [<ffffffff8143b4f8>] schedule+0x5f/0x61
[310548.230796]  [<ffffffffa009b20c>] cv_wait_common+0xbb/0x11e [spl]
[310548.232212]  [<ffffffff8104e2fa>] ? finish_wait+0x5d/0x5d
[310548.233458]  [<ffffffffa009b27f>] __cv_wait+0x10/0x12 [spl]
[310548.234757]  [<ffffffffa013fdf1>] arc_get_data_buf+0x20e/0x31b [zfs]
[310548.236243]  [<ffffffffa0148a86>] ? dbuf_rele_and_unlock+0x2d4/0x2d4 [zfs]
[310548.237868]  [<ffffffffa01433eb>] arc_read+0x51a/0x8a9 [zfs]
[310548.239167]  [<ffffffffa0147d41>] dbuf_read+0x415/0x5a0 [zfs]
[310548.240530]  [<ffffffffa015dab4>] ? dnode_rele_and_unlock+0x4c/0x71 [zfs]
[310548.242142]  [<ffffffffa0150227>] dmu_buf_hold+0x3c/0x63 [zfs]
[310548.243531]  [<ffffffffa01a2786>] zap_lockdir+0x4b/0x4c0 [zfs]
[310548.244888]  [<ffffffffa0148b9d>] ? dbuf_rele+0x32/0x36 [zfs]
[310548.246250]  [<ffffffffa01a2ec9>] zap_lookup_norm+0x37/0x16f [zfs]
[310548.247741]  [<ffffffffa009573d>] ? kmem_asprintf+0x52/0x64 [spl]
[310548.249164]  [<ffffffffa01a3012>] zap_lookup+0x11/0x13 [zfs]
[310548.250507]  [<ffffffffa016f2be>] dsl_prop_get_dd+0xfc/0x1f1 [zfs]
[310548.251956]  [<ffffffffa016f56c>] dsl_prop_get_ds+0x1b9/0x1ce [zfs]
[310548.253451]  [<ffffffffa015192f>] ? copies_changed_cb+0xd/0xd [zfs]
[310548.254941]  [<ffffffffa016f59a>] dsl_prop_get_int_ds+0x19/0x1b [zfs]
[310548.256483]  [<ffffffffa016f5ca>] dsl_prop_register+0x2e/0x12b [zfs]
[310548.257968]  [<ffffffffa0151cdb>] dmu_objset_open_impl+0x201/0x64f [zfs]
[310548.259544]  [<ffffffffa0152236>] dmu_objset_from_ds+0x68/0xfe [zfs]
[310548.261037]  [<ffffffffa0172cb0>] dsl_scan_visitds+0x7e/0x3f3 [zfs]
[310548.262488]  [<ffffffffa017415f>] dsl_scan_sync+0x6ee/0x9e8 [zfs]
[310548.263931]  [<ffffffffa01caa00>] ? zio_destroy+0xc8/0xcb [zfs]
[310548.265525]  [<ffffffffa018446c>] spa_sync+0x464/0x934 [zfs]
[310548.266868]  [<ffffffffa01924df>] txg_sync_thread+0x294/0x4a4 [zfs]
[310548.268327]  [<ffffffff810ee0fa>] ? __cache_free.isra.48+0xea/0x111
[310548.269809]  [<ffffffffa019224b>] ? txg_fini+0x1e7/0x1e7 [zfs]
[310548.271158]  [<ffffffffa0097548>] thread_generic_wrapper+0x69/0x73 [spl]
[310548.272721]  [<ffffffffa00974df>] ? __thread_exit+0x12/0x12 [spl]
[310548.274142]  [<ffffffff8104d9d4>] kthread+0x9a/0xa2
[310548.275263]  [<ffffffff8104d93a>] ? __kthread_parkme+0x66/0x66
[310548.276560]  [<ffffffff8143cb6c>] ret_from_fork+0x7c/0xb0
[310548.277830]  [<ffffffff8104d93a>] ? __kthread_parkme+0x66/0x66
[310548.280370] apache2         D ffff880a178131c0     0 16167  14421 0x00000000
[310548.280374]  ffff8803962e7a18 0000000000000082 ffffffff818114c0 ffff8803962e7fd8
[310548.282256]  ffff8803962e7fd8 00000000000131c0 ffff8807d96d77f0 ffffffffa02828c0
[310548.284141]  ffffffffa0282990 ffffffffa02828c8 ffffffffa02828fc 0000000000000000
[310548.286050] Call Trace:
[310548.286586]  [<ffffffff8143b4f8>] schedule+0x5f/0x61
[310548.287732]  [<ffffffffa009b20c>] cv_wait_common+0xbb/0x11e [spl]
[310548.289188]  [<ffffffff8104e2fa>] ? finish_wait+0x5d/0x5d
[310548.290454]  [<ffffffffa009b27f>] __cv_wait+0x10/0x12 [spl]
[310548.291751]  [<ffffffffa013fdf1>] arc_get_data_buf+0x20e/0x31b [zfs]
[310548.293217]  [<ffffffffa0148a86>] ? dbuf_rele_and_unlock+0x2d4/0x2d4 [zfs]
[310548.294830]  [<ffffffffa01433eb>] arc_read+0x51a/0x8a9 [zfs]
[310548.296175]  [<ffffffffa0147d41>] dbuf_read+0x415/0x5a0 [zfs]
[310548.297532]  [<ffffffffa0148ead>] __dbuf_hold_impl+0x1d5/0x425 [zfs]
[310548.299027]  [<ffffffffa014916b>] dbuf_hold_impl+0x6e/0x8e [zfs]
[310548.300451]  [<ffffffffa01493f4>] dbuf_hold+0x18/0x27 [zfs]
[310548.301780]  [<ffffffffa014f68e>] dmu_buf_hold_array_by_dnode+0x157/0x2db [zfs]
[310548.303476]  [<ffffffffa014fb4d>] dmu_read_uio_dnode+0x3a/0xaf [zfs]
[310548.304961]  [<ffffffff8143c02f>] ? _raw_spin_lock+0x9/0xb
[310548.306217]  [<ffffffffa0150a3a>] dmu_read_uio_dbuf+0x38/0x4e [zfs]
[310548.307735]  [<ffffffffa01c0f42>] zfs_read+0x2ad/0x322 [zfs]
[310548.309053]  [<ffffffffa01d0702>] zpl_read_common_iovec+0x66/0xa8 [zfs]
[310548.310647]  [<ffffffffa01d1092>] zpl_read_common+0x29/0x2b [zfs]
[310548.312066]  [<ffffffffa01d10eb>] zpl_read+0x57/0x70 [zfs]
[310548.313335]  [<ffffffff810fc4f4>] vfs_read+0xa2/0xe6
[310548.314510]  [<ffffffff810fcbd6>] SyS_read+0x48/0x6e
[310548.315655]  [<ffffffff8143cc16>] system_call_fastpath+0x1a/0x1f
[310548.317093] zfs             D ffff880a178b31c0     0 22498  22491 0x00000000
[310548.317096]  ffff88011a0f3858 0000000000000086 ffff8809e3d227f0 ffff88011a0f3fd8
[310548.318969]  ffff88011a0f3fd8 00000000000131c0 ffff88094f7f97b0 ffffffffa02828c0
[310548.320832]  ffffffffa0282990 ffffffffa02828c8 ffffffffa02828fc 0000000000000000
[310548.322704] Call Trace:
[310548.323235]  [<ffffffff8143b4f8>] schedule+0x5f/0x61
[310548.324435]  [<ffffffffa009b20c>] cv_wait_common+0xbb/0x11e [spl]
[310548.325847]  [<ffffffff8104e2fa>] ? finish_wait+0x5d/0x5d
[310548.327078]  [<ffffffffa009b27f>] __cv_wait+0x10/0x12 [spl]
[310548.328328]  [<ffffffffa013fdf1>] arc_get_data_buf+0x20e/0x31b [zfs]
[310548.329814]  [<ffffffffa0148a86>] ? dbuf_rele_and_unlock+0x2d4/0x2d4 [zfs]
[310548.331468]  [<ffffffffa01433eb>] arc_read+0x51a/0x8a9 [zfs]
[310548.332821]  [<ffffffffa0147d41>] dbuf_read+0x415/0x5a0 [zfs]
[310548.334148]  [<ffffffffa015dab4>] ? dnode_rele_and_unlock+0x4c/0x71 [zfs]
[310548.335751]  [<ffffffffa0150227>] dmu_buf_hold+0x3c/0x63 [zfs]
[310548.337111]  [<ffffffffa01a2786>] zap_lockdir+0x4b/0x4c0 [zfs]
[310548.338429]  [<ffffffffa0148b9d>] ? dbuf_rele+0x32/0x36 [zfs]
[310548.339786]  [<ffffffffa01a2ec9>] zap_lookup_norm+0x37/0x16f [zfs]
[310548.341199]  [<ffffffffa009573d>] ? kmem_asprintf+0x52/0x64 [spl]
[310548.342645]  [<ffffffffa01a3012>] zap_lookup+0x11/0x13 [zfs]
[310548.344000]  [<ffffffffa016f2be>] dsl_prop_get_dd+0xfc/0x1f1 [zfs]
[310548.345477]  [<ffffffffa016f56c>] dsl_prop_get_ds+0x1b9/0x1ce [zfs]
[310548.346954]  [<ffffffffa0152129>] ? dmu_objset_open_impl+0x64f/0x64f [zfs]
[310548.348592]  [<ffffffffa016f59a>] dsl_prop_get_int_ds+0x19/0x1b [zfs]
[310548.350082]  [<ffffffffa016f5ca>] dsl_prop_register+0x2e/0x12b [zfs]
[310548.351512]  [<ffffffffa0151e01>] dmu_objset_open_impl+0x327/0x64f [zfs]
[310548.353098]  [<ffffffffa0152236>] dmu_objset_from_ds+0x68/0xfe [zfs]
[310548.354598]  [<ffffffffa015231a>] dmu_objset_hold+0x4e/0x7e [zfs]
[310548.356056]  [<ffffffffa01ae42b>] zfs_ioc_objset_stats+0x1c/0x47 [zfs]
[310548.357569]  [<ffffffffa01b19d3>] zfs_ioc_dataset_list_next+0xf8/0x119 [zfs]
[310548.359216]  [<ffffffffa01b3475>] zfsdev_ioctl+0x2f9/0x3b7 [zfs]
[310548.360619]  [<ffffffff8110a207>] vfs_ioctl+0x18/0x34
[310548.361758]  [<ffffffff8110aa20>] do_vfs_ioctl+0x37c/0x421
[310548.363016]  [<ffffffff810db2f2>] ? do_munmap+0x302/0x313
[310548.364255]  [<ffffffff8110ab08>] SyS_ioctl+0x43/0x61
[310548.365427]  [<ffffffff8143cc16>] system_call_fastpath+0x1a/0x1f
[310548.366833] zfs             D ffff880a178131c0     0  7072   7053 0x00000000
[310548.366836]  ffff8803388b7908 0000000000000086 ffffffff818114c0 ffff8803388b7fd8
[310548.368718]  ffff8803388b7fd8 00000000000131c0 ffff88094dfc1080 ffffffffa02828c0
[310548.370586]  ffffffffa0282990 ffffffffa02828c8 ffffffffa02828fc 0000000000000000
[310548.372465] Call Trace:
[310548.373015]  [<ffffffff8143b4f8>] schedule+0x5f/0x61
[310548.374149]  [<ffffffffa009b20c>] cv_wait_common+0xbb/0x11e [spl]
[310548.375575]  [<ffffffff8104e2fa>] ? finish_wait+0x5d/0x5d
[310548.376818]  [<ffffffffa009b27f>] __cv_wait+0x10/0x12 [spl]
[310548.378124]  [<ffffffffa013fdf1>] arc_get_data_buf+0x20e/0x31b [zfs]
[310548.379635]  [<ffffffffa01429bf>] arc_buf_alloc+0x142/0x15f [zfs]
[310548.381065]  [<ffffffffa0148a86>] ? dbuf_rele_and_unlock+0x2d4/0x2d4 [zfs]
[310548.382690]  [<ffffffffa014325b>] arc_read+0x38a/0x8a9 [zfs]
[310548.384015]  [<ffffffffa0147d41>] dbuf_read+0x415/0x5a0 [zfs]
[310548.385349]  [<ffffffffa015dab4>] ? dnode_rele_and_unlock+0x4c/0x71 [zfs]
[310548.386959]  [<ffffffffa0150227>] dmu_buf_hold+0x3c/0x63 [zfs]
[310548.388338]  [<ffffffffa01a2786>] zap_lockdir+0x4b/0x4c0 [zfs]
[310548.389706]  [<ffffffffa0148a74>] ? dbuf_rele_and_unlock+0x2c2/0x2d4 [zfs]
[310548.391306]  [<ffffffffa0147915>] ? RW_WRITE_HELD+0x2d/0x44 [zfs]
[310548.392737]  [<ffffffffa0147a79>] ? dbuf_read+0x14d/0x5a0 [zfs]
[310548.394096]  [<ffffffffa0095949>] ? spl_kmem_free_debug+0x11/0x13 [spl]
[310548.395670]  [<ffffffffa01a2c4c>] zap_cursor_retrieve+0x51/0x1ed [zfs]
[310548.397248]  [<ffffffffa0147915>] ? RW_WRITE_HELD+0x2d/0x44 [zfs]
[310548.398680]  [<ffffffffa0147a79>] ? dbuf_read+0x14d/0x5a0 [zfs]
[310548.400232]  [<ffffffffa015d96c>] ? dnode_hold_impl+0x390/0x3a8 [zfs]
[310548.401759]  [<ffffffffa0154d9f>] dmu_dir_list_next+0x8b/0x13c [zfs]
[310548.403281]  [<ffffffffa0150e0e>] ? dmu_object_info_from_db+0x33/0x37 [zfs]
[310548.404921]  [<ffffffffa016170c>] ? dsl_dataset_hold_obj+0x69/0x606 [zfs]
[310548.406520]  [<ffffffff81055d7d>] ? should_resched+0x9/0x25
[310548.407802]  [<ffffffff8143b6b1>] ? _cond_resched+0x9/0x19
[310548.409128]  [<ffffffffa0148b9d>] ? dbuf_rele+0x32/0x36 [zfs]
[310548.410477]  [<ffffffff81055d7d>] ? should_resched+0x9/0x25
[310548.411771]  [<ffffffff8143b6b1>] ? _cond_resched+0x9/0x19
[310548.413038]  [<ffffffff8143c02f>] ? _raw_spin_lock+0x9/0xb
[310548.414258]  [<ffffffffa01522b5>] ? dmu_objset_from_ds+0xe7/0xfe [zfs]
[310548.415807]  [<ffffffffa015231a>] ? dmu_objset_hold+0x4e/0x7e [zfs]
[310548.417300]  [<ffffffffa01b1987>] zfs_ioc_dataset_list_next+0xac/0x119 [zfs]
[310548.418980]  [<ffffffffa01b3475>] zfsdev_ioctl+0x2f9/0x3b7 [zfs]
[310548.420421]  [<ffffffff8110a207>] vfs_ioctl+0x18/0x34
[310548.421588]  [<ffffffff8110aa20>] do_vfs_ioctl+0x37c/0x421
[310548.422848]  [<ffffffff810db2f2>] ? do_munmap+0x302/0x313
[310548.424070]  [<ffffffff8110ab08>] SyS_ioctl+0x43/0x61
[310548.425202]  [<ffffffff8143cc16>] system_call_fastpath+0x1a/0x1f
[310548.426610] tryto           D ffff880a17a931c0     0  9035  22864 0x00000000
[310548.426614]  ffff880269cf9ab8 0000000000000082 ffff8809e3d5f140 ffff880269cf9fd8
[310548.428495]  ffff880269cf9fd8 00000000000131c0 ffff88097fd7a080 ffffffffa02828c0
[310548.430374]  ffffffffa0282990 ffffffffa02828c8 ffffffffa02828fc 0000000000000000
[310548.432261] Call Trace:
[310548.432794]  [<ffffffff8143b4f8>] schedule+0x5f/0x61
[310548.433943]  [<ffffffffa009b20c>] cv_wait_common+0xbb/0x11e [spl]
[310548.435363]  [<ffffffff8104e2fa>] ? finish_wait+0x5d/0x5d
[310548.436651]  [<ffffffffa009b27f>] __cv_wait+0x10/0x12 [spl]
[310548.437974]  [<ffffffffa013fdf1>] arc_get_data_buf+0x20e/0x31b [zfs]
[310548.439436]  [<ffffffffa0148a86>] ? dbuf_rele_and_unlock+0x2d4/0x2d4 [zfs]
[310548.441036]  [<ffffffffa01433eb>] arc_read+0x51a/0x8a9 [zfs]
[310548.442311]  [<ffffffffa0147d41>] dbuf_read+0x415/0x5a0 [zfs]
[310548.443657]  [<ffffffffa0095949>] ? spl_kmem_free_debug+0x11/0x13 [spl]
[310548.445225]  [<ffffffffa014f6ee>] dmu_buf_hold_array_by_dnode+0x1b7/0x2db [zfs]
[310548.446953]  [<ffffffffa014fb4d>] dmu_read_uio_dnode+0x3a/0xaf [zfs]
[310548.448438]  [<ffffffff8143c02f>] ? _raw_spin_lock+0x9/0xb
[310548.449740]  [<ffffffffa0150a3a>] dmu_read_uio_dbuf+0x38/0x4e [zfs]
[310548.451203]  [<ffffffffa01c0f42>] zfs_read+0x2ad/0x322 [zfs]
[310548.452538]  [<ffffffffa01d0702>] zpl_read_common_iovec+0x66/0xa8 [zfs]
[310548.454106]  [<ffffffffa01d1092>] zpl_read_common+0x29/0x2b [zfs]
[310548.455564]  [<ffffffffa01d10eb>] zpl_read+0x57/0x70 [zfs]
[310548.456823]  [<ffffffff810fc4f4>] vfs_read+0xa2/0xe6
[310548.457945]  [<ffffffff810fcbd6>] SyS_read+0x48/0x6e
[310548.459064]  [<ffffffff8143cc16>] system_call_fastpath+0x1a/0x1f

@akorn
Copy link
Author

akorn commented Aug 16, 2015

A slowdown this morning lasted about 10 minutes (but I wasn't around to check what else could be seen):

----system---- sda--sdb--sdc--sdd--sde--sdf--sdg--sdi--sdj--zd0--zd16
     time     |util util util util util util util util util util util
16-08 03:48:11|9.07 9.00 8.45 9.37 9.67 10.1 0.83    0    0 40.6    0  
16-08 03:49:11|15.1 16.1 15.8 15.7 15.4 14.9 1.70    0 0.02 88.0    0  
16-08 03:50:11|16.9 17.3 16.0 16.2 15.7 15.6 0.07    0    0 98.5    0  
16-08 03:51:11|15.7 16.6 16.4 15.5 16.8 15.0    0    0 0.02 98.4    0  
16-08 03:52:11|18.4 18.1 17.8 17.5 17.9 17.6 2.00 0.90 0.93 98.8    0  
16-08 03:53:11|16.5 15.7 15.8 15.7 16.7 15.4 0.08    0    0 98.7    0  
16-08 03:54:11|15.9 15.8 16.1 16.7 15.4 16.4 0.02    0    0 98.6    0  
16-08 03:55:11|15.2 16.1 15.0 16.0 16.0 17.9 1.52    0    0 98.4    0  
16-08 03:56:11|16.0 16.2 15.8 15.6 16.6 15.8    0    0    0 98.5    0  
16-08 03:57:11|16.3 17.1 16.5 16.2 15.7 16.4 1.00 0.02    0 98.8    0  
16-08 03:58:11|17.6 19.2 17.7 18.3 18.2 18.7 2.03    0    0 98.8    0  
16-08 03:59:11|13.2 13.5 12.9 12.5 12.8 12.8 0.45    0    0 64.8    0  

arcstat.py from this period:

    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  ph%  l2miss%  l2hit%  hit%
03:46:41    36     6     16     6   18     0    0     3   18    27G   29G  100       98       1    83
03:47:41    19     8     41     8   41     0   50     3   42    27G   29G   50      100       0    58
03:48:41   831   575     69   562   70    12   35    30   15    27G   29G   64       99       0    30
03:49:41   148   123     83   123   83     0  100     2   19    27G   29G    0      100       0    16
03:50:41   166   114     68   114   68     0   54     3   38    27G   29G   45      100       0    31
03:51:41   275   174     63   164   64    10   50    13   23    27G   29G   49       99       0    36
03:52:41   125   115     92   114   92     0  100     2   43    27G   29G    0      100       0     7
03:53:41   155   114     73   113   73     0  100     2   36    27G   29G    0      100       0    26
03:54:41   119   110     92   110   92     0  100     2   44    27G   29G    0      100       0     7
03:55:41   152   113     74   113   74     0  100     3   37    27G   29G    0       99       0    25
03:56:41   124   112     89   111   89     0  100     2   32    27G   29G    0      100       0    10
03:57:41   682   334     49   320   47    14   95   207   39    27G   29G    4       99       0    50

And some system level stats:

----system---- ---paging-- ---load-avg--- ------memory-usage----- ---procs--- ----swap--- ---system-- ----most-expensive-block-i/o-process---->
     time     |  in   out | 1m   5m  15m | used  buff  cach  free|run blk new| used  free| int   csw |process              pid  read write cpu>
16-08 03:45:55|   0     0 |1.76 1.06 1.10|43.2G  183M 1946M 13.8G|1.1 0.0  15|   0    16G|2390    19k|zsh                  32346 179k  15k0.0%>
16-08 03:46:55|   0     0 |0.65 0.87 1.03|43.2G  183M 1946M 13.8G|0.3   0  13|   0    16G|1342  3183 |run-parts            14517 192k  21k  0%>
16-08 03:47:55|   0     0 |0.82 0.84 1.01|43.3G  212M 1948M 13.7G|0.6 0.2  19|   0    16G|1505  4214 |vserver-hashify      3210  619k1829B  0%>
16-08 03:48:55|   0     0 |1.65 1.08 1.08|43.6G  311M 1962M 13.3G|1.1 0.8  27|   0    16G|2077  7818 |vserver-hashify      3210 1788k 221k0.0%>
16-08 03:49:55|   0     0 |2.00 1.29 1.15|43.6G  337M 1962M 13.2G|0.3 1.0  12|   0    16G|1446  4778 |find                 5857  434k   0 0.1%>
16-08 03:50:55|   0     0 |2.87 1.65 1.28|43.7G  362M 1962M 13.2G|0.7 1.0  13|   0    16G|1518  4732 |find                 5857  434k   0 0.1%>
16-08 03:51:55|   0     0 |3.29 2.03 1.44|43.6G  387M 1996M 13.2G|1.3 1.2  16|   0    16G|2029  5774 |find                 5857  420k   0 0.1%>
16-08 03:52:55|   0     0 |5.38 2.80 1.73|43.6G  412M 1996M 13.1G|0.2 1.0  12|   0    16G|1494  4681 |find                 5857  429k   0 0.1%>
16-08 03:53:55|   0     0 |3.72 2.81 1.81|43.6G  437M 1996M 13.1G|0.3 1.0  12|   0    16G|1468  4635 |find                 5857  433k   0 0.1%>
16-08 03:54:55|   0     0 |2.74 2.71 1.83|43.7G  463M 1996M 13.0G|0.3 1.0  12|   0    16G|1440  4589 |find                 5857  432k   0 0.1%>
16-08 03:55:55|   0     0 |2.37 2.61 1.86|43.7G  488M 1996M 12.9G|0.4 1.0  12|   0    16G|1454  4622 |find                 5857  434k   0 0.1%>
16-08 03:56:55|   0     0 |2.44 2.58 1.89|43.8G  513M 2024M 12.8G|0.8 1.0  13|   0    16G|1613  4705 |zsh                  12392 479k   3B0.0%>
16-08 03:57:55|   0     0 |2.32 2.52 1.91|43.9G  538M 2024M 12.7G|1.2 1.1  12|   0    16G|1744  5188 |rsync                124273156k   0 0.2%>
16-08 03:58:55|   0     0 |2.86 2.61 1.98|44.1G  562M 2025M 12.5G|1.1 0.9  12|   0    16G|1753  7689 |run-parts            145176831k 223k  0%>
16-08 03:59:55|   0     0 |2.61 2.59 2.01|44.3G  562M 2025M 12.2G|2.0 0.1  15|   0    16G|3461    25k|zsh                  123929476k   5B  0%>
16-08 04:00:55|   0     0 |3.22 2.77 2.11|44.4G  562M 1875M 12.3G|1.9 0.1  17|   0    16G|4118    30k|rsync                14982 888k   0 0.3%>

The vserver-hashify process was likely the cause of the high load on zd0, but I still don't get how the zvol can be at 100% "bandwidth" utilization if all backing disks are under 20%?

Also, the entirety of data on zd0 would've fit into ARC:

Filesystem      Size  Used Avail Use% Mounted on
/dev/zd0         30G   21G  6.9G  76% /var/lib/vservers

@dweeezil
Copy link
Contributor

@akorn This may very well have been fixed by the numerous arc changes which have been made post-0.6.4.2. Your issue is being caused by the adapt thread not being able to free memory, likely metadata. You may want to try this with current master code but, unfortunately, there's a regression which was caused by the manner in which it calculates freemem. If you do try current master code, I'd suggest adding dweeezil/spl@08807f8 to the spl, particularly if your system is using any non-zfs filesystems and/or has any heavy page cache users.

As to your particular issue, it would be interesting to know what the arc_adapt thread is up to (which, btw, has been renamed back to arc_reclaim in current master code) and it would also be useful to see the full arcstats. That said, however, things are very different in current master code in these areas.

@behlendorf
Copy link
Contributor

@akorn going back to the original problem. If setting spl_taskq_thread_dynamic=0 resolved the initial issue with the zvol threads the issue must be caused by the creation/destruction of the taskq threads. That's all this module option controls.

From what you've described my first guess would be that the dynamic taskq isn't ramping up the number of zvol threads fast enough. That would explain why disabling the taskqs and always leaving 64 threads running would help. Could you try setting the following options and see if it helps. This will cause new taskq threads to be created more aggressively.

spl_taskq_thread_dynamic=1      # Defaults to 1
spl_taskq_thread_sequential=1   # Defaults to 4

@behlendorf behlendorf added this to the 0.6.5 milestone Aug 18, 2015
@akorn
Copy link
Author

akorn commented Aug 18, 2015

@behlendorf, I'll see what I can do but this is not a box I can easily experiment with, and I haven't had the same problem on any of the others.

@behlendorf
Copy link
Contributor

@akorn no problem, it was just a suggestion. I'm glad to hear the issue isn't more wide spread. FWIW these are both run time tunables so you don't need to restart anything to try it quick.

echo 1 >/sys/module/zfs/parameters/spl_taskq_thread_dynamic
echo 1 >/sys/module/zfs/parameters/spl_taskq_thread_sequential

@DeHackEd
Copy link
Contributor

DeHackEd commented Sep 6, 2015

With the zvol rework merged this should be effectively fixed, right?

@behlendorf
Copy link
Contributor

Yes. Yes it is. Closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants