Skip to content

Memory allocation spamming the system #917

Closed
cwedgwood opened this Issue Aug 31, 2012 · 23 comments

6 participants

@cwedgwood

During srub:

SPL: Fixing allocation for task txg_sync (6093) which used GFP flags 0x297bda7c with PF_NOFS set
SPL: Showing stack for process 6093
Pid: 6093, comm: txg_sync Tainted: G O 3.5.3-cw3 #5
Call Trace:
[] spl_debug_dumpstack+0x30/0x32 [spl]
[] sanitize_flags+0x73/0x82 [spl]
[] kmem_alloc_debug+0x13d/0x2d9 [spl]
[] ? load_balance+0xa8/0x5e2
[] fm_nvlist_create+0x36/0xb6 [zfs]
[] zfs_ereport_start+0x93/0x620 [zfs]
[] ? load_TLS+0xb/0xf
[] ? __switch_to+0x1a7/0x37c
[] ? paravirt_read_tsc+0x9/0xd
[] ? vdev_resilver_needed+0x8f/0x112 [zfs]
[] zfs_ereport_post+0x3f/0x5a [zfs]
[] spa_event_notify+0x22/0x24 [zfs]
[] dsl_scan_setup_sync+0xf6/0x1bd [zfs]
[] dsl_sync_task_group_sync+0x11a/0x175 [zfs]
[] dsl_pool_sync+0x1ac/0x3d2 [zfs]
[] spa_sync+0x475/0x82d [zfs]
[] txg_sync_thread+0x247/0x3b7 [zfs]
[] ? txg_do_callbacks+0x52/0x52 [zfs]
[] thread_generic_wrapper+0x71/0x7e [spl]
[] ? __thread_create+0x2db/0x2db [spl]
[] kthread+0x8b/0x93
[] kernel_thread_helper+0x4/0x10
[] ? retint_restore_args+0x13/0x13
[] ? kthread_worker_fn+0x149/0x149
[] ? gs_change+0x13/0x13

{ repeats over and over crushing the system beacuse i have a serial console enabled }

@dechamps

Confirmed here. Also while scrubbing.

@behlendorf
ZFS on Linux member

I knew that despite my best efforts to test the swap patches I'd miss a few instances of KM_SLEEP -> KM_PUSHPAGE. So I add the above warning which would correct this issue and log it to the console so it would get immediately fixed. The above patch will fix those particular allocations but there may still be a few more so please let me know if you see any.

@behlendorf behlendorf added a commit that referenced this issue Sep 1, 2012
@behlendorf behlendorf Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
b404a3f
@chrisrd chrisrd added a commit to chrisrd/zfs that referenced this issue Sep 1, 2012
@chrisrd chrisrd Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #917
1706452
@chrisrd
chrisrd commented Sep 1, 2012

With @b404a3f applied, on starting a replace:

[ 4201.014669] SPL: Fixing allocation for task txg_sync (13819) which used GFP flags 0x4ffbad8 with PF_NOFS set
[ 4201.014720] SPL: Showing stack for process 13819
[ 4201.014746] Pid: 13819, comm: txg_sync Tainted: G           O 3.4.10-otn-00021-g9eae348 #1
[ 4201.014791] Call Trace:
[ 4201.014821]  [<ffffffffa0471557>] spl_debug_dumpstack+0x27/0x40 [spl]
[ 4201.014854]  [<ffffffffa04834df>] sanitize_flags.part.12+0x68/0x77 [spl]
[ 4201.014887]  [<ffffffffa0476f22>] kmem_alloc_debug+0x3a2/0x3b0 [spl]
[ 4201.014919]  [<ffffffffa04a7bd9>] ? nvlist_common+0x129/0x1e0 [znvpair]
[ 4201.014980]  [<ffffffffa05d5c00>] ? zfs_post_remove+0x20/0x20 [zfs]
[ 4201.015032]  [<ffffffffa05d5c00>] ? zfs_post_remove+0x20/0x20 [zfs]
[ 4201.015081]  [<ffffffffa0582531>] zfs_zevent_post+0xd1/0x250 [zfs]
[ 4201.015132]  [<ffffffffa05d6418>] zfs_ereport_post+0x58/0x60 [zfs]
[ 4201.015182]  [<ffffffffa0590182>] spa_event_notify+0x22/0x30 [zfs]
[ 4201.015230]  [<ffffffffa057e918>] dsl_scan_setup_sync+0x1e8/0x340 [zfs]
[ 4201.015280]  [<ffffffffa0580347>] dsl_scan_sync+0x3b7/0xb90 [zfs]
[ 4201.015332]  [<ffffffffa0609b13>] ? zio_wait+0x203/0x430 [zfs]
[ 4201.015382]  [<ffffffffa0594fdc>] spa_sync+0x3dc/0xca0 [zfs]
[ 4201.015433]  [<ffffffffa05aa2b3>] txg_sync_thread+0x2c3/0x560 [zfs]
[ 4201.015484]  [<ffffffffa05a9ff0>] ? txg_quiesce_thread+0x4d0/0x4d0 [zfs]
[ 4201.015517]  [<ffffffffa0479968>] thread_generic_wrapper+0x78/0x90 [spl]
[ 4201.015549]  [<ffffffffa04798f0>] ? __thread_create+0x340/0x340 [spl]
[ 4201.015581]  [<ffffffff8106b493>] kthread+0x93/0xa0
[ 4201.015609]  [<ffffffff813d5c64>] kernel_thread_helper+0x4/0x10
[ 4201.015637]  [<ffffffff8106b400>] ? kthread_freezable_should_stop+0x70/0x70
[ 4201.015667]  [<ffffffff813d5c60>] ? gs_change+0x13/0x13

What's the approach to fixing these?

E.g. from the first non-'?' line in the call trace (what do the '?' lines actually mean?) after the kmem_alloc_debug, I can trace through:

zfs_zevent_post()
  => zfs_zevent_alloc()
    ev = kmem_zalloc(sizeof(zevent_t), KM_SLEEP);

So this KM_SLEEP should be changed to KM_PUSHPAGE?

If so, I can scan through the rest of the call traces with:

cat <<'END' > tmp.pl
$x=0;
while (<>) {
  if (m/kmem_alloc_debug/) { $x = 1; }
  elsif ($x && s/^.*\>\] (?!\?)//) { print; $x = 0; }
}
perl tmp.pl /var/log/kern.log | sort | uniq -c | sort -n
      1 dsl_scan_setup_sync+0x279/0x340 [zfs]
      1 dsl_scan_sync+0x1cd/0xb90 [zfs]
      1 dsl_scan_sync+0x1f2/0xb90 [zfs]
      1 dsl_scan_sync+0x736/0xb90 [zfs]
      1 dsl_scan_visit_rootbp+0x57/0x130 [zfs]
      1 zfs_zevent_post+0xd1/0x250 [zfs]
    137 dsl_scan_visitds+0xba/0x500 [zfs]
    176 dsl_scan_visitbp.isra.5+0xa30/0xdd0 [zfs]
 215635 dsl_scan_visitbp.isra.5+0x8a3/0xdd0 [zfs]
 487580 dsl_scan_visitbp.isra.5+0x5d/0xdd0 [zfs]

which gets me:

dsl_scan_sync()
  => dsl_scan_setup_sync()
    dp->dp_blkstats = kmem_alloc(sizeof (zfs_all_blkstats_t),
    KM_SLEEP | KM_NODEBUG);
  => dsl_scan_visit()
    => dsl_scan_visit_rootbp()
      => dsl_scan_visitbp()
        bp_toread = kmem_alloc(sizeof (blkptr_t), KM_SLEEP);
      => dsl_scan_visitds()
        dsname = kmem_alloc(ZFS_MAXNAMELEN, KM_SLEEP);
    zc = kmem_alloc(sizeof(zap_cursor_t), KM_SLEEP);
    za = kmem_alloc(sizeof(zap_attribute_t), KM_SLEEP);

...all encapsulated in chrisrd/zfs@1706452

@chrisrd chrisrd added a commit to chrisrd/zfs that referenced this issue Sep 1, 2012
@chrisrd chrisrd KM_PUSH => KM_PUSHPAGE
Fix typo in 1706452

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #917
ae3b38a
@chrisrd
chrisrd commented Sep 1, 2012

FWIW, with @b404a3f + chrisrd/zfs@1706452 + chrisrd/zfs@ae3b38a, my resilver has been running for 1.5 hrs & 2.3 TB without any "fixed allocation" kernel messages.

@behlendorf
ZFS on Linux member

Yes, that's exactly the right fix. We need to change those KM_SLEEPs to KM_PUSHPAGEs to disable direct memory reclaim for these paths. Otherwise we risk an unlikely but possible deadlock. The new warning we address to detect this possibility, fix the flags to prevent it, and log a message so we can get it fixed.

I'll get your fixes merged in to master. I'm glad it's running quietly now. Please make sure you report any new instances we make find some in less common code paths.

@behlendorf behlendorf added a commit that referenced this issue Sep 2, 2012
@chrisrd chrisrd Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
20a083c
@cwedgwood

More:

[40853.055382] SPL: Fixing allocation for task txg_sync (3871) which used GFP flags 0x1054758c with PF_NOFS set
[40853.065280] SPL: Showing stack for process 3871
[40853.069913] Pid: 3871, comm: txg_sync Tainted: G      D    O 3.4.10-cw3 #6
[40853.076861] Call Trace:
[40853.079372]  [<ffffffffa004b442>] spl_debug_dumpstack+0x30/0x32 [spl]
[40853.085902]  [<ffffffffa004f248>] sanitize_flags+0x73/0x82 [spl]
[40853.092012]  [<ffffffffa004f395>] kmalloc_nofail+0x21/0x3e [spl]
[40853.098094]  [<ffffffffa004f51f>] kmem_alloc_debug+0x16d/0x2d9 [spl]
[40853.104710]  [<ffffffffa01ba6ed>] ? zio_dva_free+0x23/0x23 [zfs]
[40853.110976]  [<ffffffffa01400ab>] arc_buf_data_free.isra.11.part.12+0x3a/0x89 [zfs]
[40853.118801]  [<ffffffffa0140505>] arc_buf_destroy+0x77/0x180 [zfs]
[40853.125212]  [<ffffffffa0141748>] arc_evict+0x23e/0x47c [zfs]
[40853.131173]  [<ffffffff810675db>] ? load_balance+0xb5/0x614
[40853.136899]  [<ffffffffa0141d8f>] arc_get_data_buf+0x290/0x3d5 [zfs]
[40853.143470]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.149735]  [<ffffffffa0143302>] arc_read_nolock+0x36a/0x676 [zfs]
[40853.156364]  [<ffffffffa01436ab>] arc_read+0x9d/0x103 [zfs]
[40853.162369]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.168799]  [<ffffffffa016da99>] dsl_read+0x31/0x33 [zfs]
[40853.174674]  [<ffffffffa0147005>] dbuf_read+0x397/0x4cc [zfs]
[40853.180873]  [<ffffffffa014ca6f>] dmu_buf_hold+0xaa/0xee [zfs]
[40853.187077]  [<ffffffffa004f51f>] ? kmem_alloc_debug+0x16d/0x2d9 [spl]
[40853.194118]  [<ffffffffa0195cd8>] zap_lockdir+0x4a/0x449 [zfs]
[40853.200417]  [<ffffffff814e0e89>] ? _raw_spin_unlock_irqrestore+0x32/0x37
[40853.207694]  [<ffffffffa0148050>] ? dmu_buf_rele+0x2a/0x31 [zfs]
[40853.214230]  [<ffffffffa0196eb9>] zap_lookup_norm+0x3f/0x156 [zfs]
[40853.220893]  [<ffffffffa019704c>] zap_lookup+0x33/0x35 [zfs]
[40853.226955]  [<ffffffffa0193733>] zap_increment+0x48/0x98 [zfs]
[40853.233351]  [<ffffffffa01937d2>] zap_increment_int+0x4f/0x5a [zfs]
[40853.240095]  [<ffffffffa004ef7a>] ? spl_kmem_cache_free+0x315/0x376 [spl]
[40853.247391]  [<ffffffffa01511c8>] do_userquota_update.part.7+0xa0/0xf5 [zfs]
[40853.254952]  [<ffffffffa01513cb>] dmu_objset_do_userquota_updates+0x100/0x213 [zfs]
[40853.263253]  [<ffffffffa0168ae7>] dsl_pool_sync+0xf5/0x3d2 [zfs]
[40853.269816]  [<ffffffffa017ac72>] spa_sync+0x475/0x82d [zfs]
[40853.275847]  [<ffffffff814e0f5c>] ? _raw_spin_lock+0xe/0x10
[40853.281906]  [<ffffffffa0186755>] txg_sync_thread+0x247/0x3b7 [zfs]
[40853.288592]  [<ffffffffa018650e>] ? txg_do_callbacks+0x52/0x52 [zfs]
[40853.295422]  [<ffffffffa0052470>] thread_generic_wrapper+0x71/0x7e [spl]
[40853.302678]  [<ffffffffa00523ff>] ? __thread_create+0x2db/0x2db [spl]
[40853.309618]  [<ffffffff8105496a>] kthread+0x8b/0x93
[40853.314787]  [<ffffffff814e8b34>] kernel_thread_helper+0x4/0x10
[40853.321182]  [<ffffffff814e125d>] ? retint_restore_args+0x13/0x13
[40853.327673]  [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[40853.334216]  [<ffffffff814e8b30>] ? gs_change+0x13/0x13


[40853.421300] SPL: Fixing allocation for task txg_sync (3871) which used GFP flags 0x105474fc with PF_NOFS set
[40853.431393] SPL: Showing stack for process 3871
[40853.436002] Pid: 3871, comm: txg_sync Tainted: G      D    O 3.4.10-cw3 #6
[40853.443146] Call Trace:
[40853.445680]  [<ffffffffa004b442>] spl_debug_dumpstack+0x30/0x32 [spl]
[40853.457463]  [<ffffffffa004f248>] sanitize_flags+0x73/0x82 [spl]
[40853.463608]  [<ffffffffa004f395>] kmalloc_nofail+0x21/0x3e [spl]
[40853.469792]  [<ffffffffa004f51f>] kmem_alloc_debug+0x16d/0x2d9 [spl]
[40853.476259]  [<ffffffff81064bf9>] ? cpumask_next+0x19/0x1b
[40853.481983]  [<ffffffffa01ba6ed>] ? zio_dva_free+0x23/0x23 [zfs]
[40853.488080]  [<ffffffffa01400ab>] arc_buf_data_free.isra.11.part.12+0x3a/0x89 [zfs]
[40853.495912]  [<ffffffffa0140505>] arc_buf_destroy+0x77/0x180 [zfs]
[40853.502217]  [<ffffffff8124de8a>] ? cpumask_next_and+0x2e/0x3b
[40853.508184]  [<ffffffffa0141748>] arc_evict+0x23e/0x47c [zfs]
[40853.514349]  [<ffffffffa0141d8f>] arc_get_data_buf+0x290/0x3d5 [zfs]
[40853.521229]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.527662]  [<ffffffffa0143302>] arc_read_nolock+0x36a/0x676 [zfs]
[40853.534408]  [<ffffffffa01436ab>] arc_read+0x9d/0x103 [zfs]
[40853.540511]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.546901]  [<ffffffffa016da99>] dsl_read+0x31/0x33 [zfs]
[40853.552792]  [<ffffffffa0147005>] dbuf_read+0x397/0x4cc [zfs]
[40853.558986]  [<ffffffffa01482cb>] __dbuf_hold_impl+0x19b/0x3b0 [zfs]
[40853.565765]  [<ffffffffa0148562>] dbuf_hold_impl+0x82/0xa4 [zfs]
[40853.572282]  [<ffffffffa0149203>] dbuf_hold+0x20/0x2e [zfs]
[40853.578244]  [<ffffffffa014cca1>] dmu_buf_hold_array_by_dnode+0x199/0x325 [zfs]
[40853.586236]  [<ffffffffa015942a>] ? dnode_hold_impl+0x350/0x369 [zfs]
[40853.593196]  [<ffffffffa014ce8a>] dmu_buf_hold_array+0x5d/0x7e [zfs]
[40853.600045]  [<ffffffffa014d234>] dmu_write+0x4d/0x121 [zfs]
[40853.606113]  [<ffffffffa0185849>] space_map_sync+0x2e2/0x344 [zfs]
[40853.612788]  [<ffffffffa0172920>] metaslab_sync+0x256/0x302 [zfs]
[40853.619398]  [<ffffffffa0189865>] vdev_sync+0xa5/0x11e [zfs]
[40853.625485]  [<ffffffffa017ad1d>] spa_sync+0x520/0x82d [zfs]
[40853.631546]  [<ffffffff814e0f5c>] ? _raw_spin_lock+0xe/0x10
[40853.637522]  [<ffffffffa0186755>] txg_sync_thread+0x247/0x3b7 [zfs]
[40853.644347]  [<ffffffffa018650e>] ? txg_do_callbacks+0x52/0x52 [zfs]
[40853.651187]  [<ffffffffa0052470>] thread_generic_wrapper+0x71/0x7e [spl]
[40853.658452]  [<ffffffffa00523ff>] ? __thread_create+0x2db/0x2db [spl]
[40853.665312]  [<ffffffff8105496a>] kthread+0x8b/0x93
[40853.670525]  [<ffffffff814e8b34>] kernel_thread_helper+0x4/0x10
[40853.676831]  [<ffffffff814e125d>] ? retint_restore_args+0x13/0x13
[40853.683372]  [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[40853.689995]  [<ffffffff814e8b30>] ? gs_change+0x13/0x13


[40853.696480] SPL: Showing stack for process 3871
[40853.701373] Pid: 3871, comm: txg_sync Tainted: G      D    O 3.4.10-cw3 #6
[40853.708413] Call Trace:
[40853.710910]  [<ffffffffa004b442>] spl_debug_dumpstack+0x30/0x32 [spl]
[40853.717486]  [<ffffffffa004f248>] sanitize_flags+0x73/0x82 [spl]
[40853.723644]  [<ffffffffa004f395>] kmalloc_nofail+0x21/0x3e [spl]
[40853.729796]  [<ffffffffa004f51f>] kmem_alloc_debug+0x16d/0x2d9 [spl]
[40853.736219]  [<ffffffff8105fb19>] ? resched_task+0x3b/0x64
[40853.741888]  [<ffffffff81060266>] ? check_preempt_curr+0x40/0x6e
[40853.747994]  [<ffffffff810657fa>] ? hrtick_update+0x1b/0x4e
[40853.753756]  [<ffffffff8106598a>] ? dequeue_task_fair+0x15d/0x16c
[40853.760103]  [<ffffffffa01ba6ed>] ? zio_dva_free+0x23/0x23 [zfs]
[40853.766273]  [<ffffffffa01400ab>] arc_buf_data_free.isra.11.part.12+0x3a/0x89 [zfs]
[40853.774186]  [<ffffffffa0140505>] arc_buf_destroy+0x77/0x180 [zfs]
[40853.780574]  [<ffffffffa0141748>] arc_evict+0x23e/0x47c [zfs]
[40853.786477]  [<ffffffffa0141d8f>] arc_get_data_buf+0x290/0x3d5 [zfs]
[40853.793120]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.799630]  [<ffffffffa0143302>] arc_read_nolock+0x36a/0x676 [zfs]
[40853.806346]  [<ffffffffa01436ab>] arc_read+0x9d/0x103 [zfs]
[40853.812308]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40853.818816]  [<ffffffffa016da99>] dsl_read+0x31/0x33 [zfs]
[40853.824690]  [<ffffffffa0147005>] dbuf_read+0x397/0x4cc [zfs]
[40853.830913]  [<ffffffffa0148fad>] dmu_buf_will_dirty+0x4f/0x62 [zfs]
[40853.837675]  [<ffffffffa014d2aa>] dmu_write+0xc3/0x121 [zfs]
[40853.843761]  [<ffffffffa0185849>] space_map_sync+0x2e2/0x344 [zfs]
[40853.850415]  [<ffffffffa0172920>] metaslab_sync+0x256/0x302 [zfs]
[40853.856922]  [<ffffffffa0189865>] vdev_sync+0xa5/0x11e [zfs]
[40853.863077]  [<ffffffffa017ad1d>] spa_sync+0x520/0x82d [zfs]
[40853.869132]  [<ffffffff814e0f5c>] ? _raw_spin_lock+0xe/0x10
[40853.875103]  [<ffffffffa0186755>] txg_sync_thread+0x247/0x3b7 [zfs]
[40853.881899]  [<ffffffffa018650e>] ? txg_do_callbacks+0x52/0x52 [zfs]
[40853.888654]  [<ffffffffa0052470>] thread_generic_wrapper+0x71/0x7e [spl]
[40853.895795]  [<ffffffffa00523ff>] ? __thread_create+0x2db/0x2db [spl]
[40853.902771]  [<ffffffff8105496a>] kthread+0x8b/0x93
[40853.908043]  [<ffffffff814e8b34>] kernel_thread_helper+0x4/0x10
[40853.914359]  [<ffffffff814e125d>] ? retint_restore_args+0x13/0x13
[40853.921005]  [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[40853.927461]  [<ffffffff814e8b30>] ? gs_change+0x13/0x13


[40948.108071] SPL: Fixing allocation for task txg_sync (3871) which used GFP flags 0x1054758c with PF_NOFS set
[40948.117994] SPL: Skipped 1 previous similar message
[40948.122920] SPL: Showing stack for process 3871
[40948.127564] Pid: 3871, comm: txg_sync Tainted: G      D    O 3.4.10-cw3 #6
[40948.134513] Call Trace:
[40948.137038]  [<ffffffffa004b442>] spl_debug_dumpstack+0x30/0x32 [spl]
[40948.143571]  [<ffffffffa004f248>] sanitize_flags+0x73/0x82 [spl]
[40948.149658]  [<ffffffffa004f395>] kmalloc_nofail+0x21/0x3e [spl]
[40948.155768]  [<ffffffffa004f51f>] kmem_alloc_debug+0x16d/0x2d9 [spl]
[40948.162247]  [<ffffffffa01ba6ed>] ? zio_dva_free+0x23/0x23 [zfs]
[40948.168425]  [<ffffffffa01400ab>] arc_buf_data_free.isra.11.part.12+0x3a/0x89 [zfs]
[40948.176254]  [<ffffffffa0140505>] arc_buf_destroy+0x77/0x180 [zfs]
[40948.182531]  [<ffffffffa0141748>] arc_evict+0x23e/0x47c [zfs]
[40948.188442]  [<ffffffff810675db>] ? load_balance+0xb5/0x614
[40948.194163]  [<ffffffffa0141d8f>] arc_get_data_buf+0x290/0x3d5 [zfs]
[40948.200670]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40948.206955]  [<ffffffffa0143302>] arc_read_nolock+0x36a/0x676 [zfs]
[40948.213654]  [<ffffffffa01436ab>] arc_read+0x9d/0x103 [zfs]
[40948.219584]  [<ffffffffa0148057>] ? dmu_buf_rele+0x31/0x31 [zfs]
[40948.225988]  [<ffffffffa016da99>] dsl_read+0x31/0x33 [zfs]
[40948.231853]  [<ffffffffa0147005>] dbuf_read+0x397/0x4cc [zfs]
[40948.237988]  [<ffffffffa014ca6f>] dmu_buf_hold+0xaa/0xee [zfs]
[40948.244188]  [<ffffffffa004f51f>] ? kmem_alloc_debug+0x16d/0x2d9 [spl]
[40948.251167]  [<ffffffffa0195cd8>] zap_lockdir+0x4a/0x449 [zfs]
[40948.257373]  [<ffffffff814e0e89>] ? _raw_spin_unlock_irqrestore+0x32/0x37
[40948.264567]  [<ffffffffa0148050>] ? dmu_buf_rele+0x2a/0x31 [zfs]
[40948.270983]  [<ffffffffa0196eb9>] zap_lookup_norm+0x3f/0x156 [zfs]
[40948.277566]  [<ffffffffa019704c>] zap_lookup+0x33/0x35 [zfs]
[40948.283623]  [<ffffffffa0193733>] zap_increment+0x48/0x98 [zfs]
[40948.289978]  [<ffffffffa01937d2>] zap_increment_int+0x4f/0x5a [zfs]
[40948.296676]  [<ffffffffa004ef7a>] ? spl_kmem_cache_free+0x315/0x376 [spl]
[40948.303897]  [<ffffffffa01511c8>] do_userquota_update.part.7+0xa0/0xf5 [zfs]
[40948.311446]  [<ffffffffa01513cb>] dmu_objset_do_userquota_updates+0x100/0x213 [zfs]
[40948.319559]  [<ffffffffa0168ae7>] dsl_pool_sync+0xf5/0x3d2 [zfs]
[40948.325945]  [<ffffffffa017ac72>] spa_sync+0x475/0x82d [zfs]
[40948.332031]  [<ffffffffa0186755>] txg_sync_thread+0x247/0x3b7 [zfs]
[40948.338677]  [<ffffffffa018650e>] ? txg_do_callbacks+0x52/0x52 [zfs]
[40948.345389]  [<ffffffffa0052470>] thread_generic_wrapper+0x71/0x7e [spl]
[40948.352542]  [<ffffffffa00523ff>] ? __thread_create+0x2db/0x2db [spl]
[40948.359378]  [<ffffffff8105496a>] kthread+0x8b/0x93
[40948.364535]  [<ffffffff814e8b34>] kernel_thread_helper+0x4/0x10
[40948.370844]  [<ffffffff814e125d>] ? retint_restore_args+0x13/0x13
[40948.377306]  [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[40948.383794]  [<ffffffff814e8b30>] ? gs_change+0x13/0x13
@behlendorf
ZFS on Linux member

Congrats, you found another. The above patch should resolve it, commit 17f3cc0648b8427d859c7573b3b2c9635d2eb8c8

@behlendorf behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 4, 2012
@behlendorf behlendorf Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
594b4dd
@behlendorf behlendorf added a commit that referenced this issue Sep 4, 2012
@behlendorf behlendorf Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
0ef0ff5
@behlendorf
ZFS on Linux member

Wash... rinse... repeat...

It has occurred to me that it might be wise to rate limit these warnings. Getting one warning with be enough to fix the next instance and there's no reason to spam the machine. Thoughts?

@chrisrd
chrisrd commented Sep 5, 2012

Whilst that's certainly better than filling people's log partition, filling the log partition has the nice side effect of
ensuring they complain about it so the problem can be fixed :-) Can it be rate limited to, say, one per minute or less rather than being a single "one shot" which may go unnoticed or unreported?

@cwedgwood

I was thinking maybe we log once without the stack-trace then halve the limit down to say order-1 (if you have a lots of those fail things are bad)

that way you'll get an order-5 faliure in most cases and nothing else, and in rare cases 4, 3, 2, then a series one order-1 failures which i expect to be rare (amd64 has 8k stacks and various other things so order-1 failures would be very noticable)

@behlendorf behlendorf added a commit that referenced this issue Sep 5, 2012
@behlendorf behlendorf Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
cafa970
@OmenWild
OmenWild commented Sep 5, 2012

[ Sorry this is so long, I am trying to include everything that might be relevant. ]

Last night I updated to the newest module available for Debian (v0.6.0.74-rc10) on my 64 bit machine, kernel 3.5.3 + Con Kolivas' BFS scheduler. About 3 hours after boot I got 5,507 stack traces after a single warning:

SPL: Fixing allocation for task mairix (9047) which used GFP flags 0xc9277bb4 with PF_NOFS set

The stack trace:

[12733.460212] SPL: Showing stack for process 9047
[12733.460214] Pid: 9047, comm: mairix Tainted: P           O 3.5.3-ck1-1 #1
[12733.460215] Call Trace:
[12733.460228]  [<ffffffffa03613e0>] ? spl_debug_dumpstack+0x23/0x29 [spl]
[12733.460231]  [<ffffffffa0364fcd>] ? sanitize_flags+0x6e/0x7b [spl]
[12733.460235]  [<ffffffffa0365249>] ? kmem_alloc_debug+0x139/0x2d7 [spl]
[12733.460247]  [<ffffffffa0000003>] ? usb_speed_string+0x3/0x1000 [usb_common]
[12733.460260]  [<ffffffffa053c84d>] ? zfs_get_data+0xd4/0x232 [zfs]
[12733.460263]  [<ffffffff810ec33d>] ? kfree+0x48/0xd7
[12733.460273]  [<ffffffffa05428c4>] ? zil_commit+0x391/0x5a6 [zfs]
[12733.460284]  [<ffffffffa0510b93>] ? txg_rele_to_sync+0x3d/0x4a [zfs]
[12733.460294]  [<ffffffffa053cb98>] ? zfs_putpage+0x1ed/0x225 [zfs]
[12733.460302]  [<ffffffffa05498b5>] ? zpl_putpage+0x22/0x2d [zfs]
[12733.460305]  [<ffffffff810ba8c9>] ? write_cache_pages+0x1d6/0x2e5
[12733.460313]  [<ffffffffa0549893>] ? zpl_readpage+0x4d/0x4d [zfs]
[12733.460315]  [<ffffffff810b3c80>] ? __filemap_fdatawrite_range+0x50/0x55
[12733.460317]  [<ffffffff810b3ca9>] ? filemap_write_and_wait_range+0x24/0x49
[12733.460325]  [<ffffffffa0549666>] ? zpl_fsync+0x2e/0x63 [zfs]
[12733.460327]  [<ffffffff8111bf28>] ? do_fsync+0x24/0x49
[12733.460328]  [<ffffffff8111c0b7>] ? sys_fsync+0x9/0xe
[12733.460331]  [<ffffffff81362c39>] ? system_call_fastpath+0x16/0x1b

A couple hours after that I started getting page allocation falures from z_wr_iss, though one was from a java backup system.

> grep 'page allocation failure:' /var/log/kern.log | sed 's/.*kernel: //'
[19877.031077] z_wr_iss/4: page allocation failure: order:7, mode:0x4050
[40271.941269] z_wr_iss/5: page allocation failure: order:7, mode:0x4050
[40271.941363] z_wr_iss/0: page allocation failure: order:7, mode:0x4050
[41233.902899] java: page allocation failure: order:5, mode:0x4030
[42013.999475] z_wr_iss_h/2: page allocation failure: order:7, mode:0x4050
[42013.999479] z_wr_iss_h/1: page allocation failure: order:7, mode:0x4050
[42013.999487] z_wr_iss_h/4: page allocation failure: order:7, mode:0x4050
[42013.999517] z_wr_iss_h/0: page allocation failure: order:7, mode:0x4050
[42013.999580] z_wr_iss_h/3: page allocation failure: order:7, mode:0x4050
[42014.217156] z_wr_iss_h/1: page allocation failure: order:7, mode:0x4050
[42014.217172] z_wr_iss_h/4: page allocation failure: order:7, mode:0x4050
[42014.271887] z_wr_iss_h/3: page allocation failure: order:7, mode:0x4050
[42014.271889] z_wr_iss_h/0: page allocation failure: order:7, mode:0x4050
[42240.143238] z_wr_iss/5: page allocation failure: order:7, mode:0x4050
[42240.144021] z_wr_iss/3: page allocation failure: order:7, mode:0x4050
[44867.039734] z_wr_iss/6: page allocation failure: order:7, mode:0x4050
[44867.039738] z_wr_iss/4: page allocation failure: order:7, mode:0x4050
[44867.097893] z_wr_iss/6: page allocation failure: order:7, mode:0x4050
[48598.524302] z_wr_iss/7: page allocation failure: order:7, mode:0x4050

If I understand order correctly, the system was trying to allocate 512MB? That seems huge. My kernel.shmmax is set to 512MB for zoneminder, but I have no ideas if this could be related. zoneminder is currently working correctly, but the java backup system is hung.

A full dump from the most recent allocation failure:

[48598.524302] z_wr_iss/7: page allocation failure: order:7, mode:0x4050
[48598.524309] Pid: 861, comm: z_wr_iss/7 Tainted: P           O 3.5.3-ck1-1 #1
[48598.524312] Call Trace:
[48598.524325]  [<ffffffff810b7021>] ? warn_alloc_failed+0x111/0x123
[48598.524332]  [<ffffffff81358189>] ? __alloc_pages_direct_compact+0xaa/0x193
[48598.524338]  [<ffffffff810b9d64>] ? __alloc_pages_nodemask+0x793/0x85e
[48598.524347]  [<ffffffff810e5055>] ? alloc_pages_current+0xb2/0xcd
[48598.524353]  [<ffffffff810b68aa>] ? __get_free_pages+0x7/0x34
[48598.524359]  [<ffffffff810e9c1f>] ? kmalloc_order_trace+0x22/0x56
[48598.524382]  [<ffffffffa0367159>] ? spl_kmem_cache_alloc+0x54d/0xa73 [spl]
[48598.524390]  [<ffffffff810537e7>] ? abort_exclusive_wait+0x79/0x79
[48598.524402]  [<ffffffffa036fa5e>] ? z_compress_level+0x43/0xd4 [spl]
[48598.524439]  [<ffffffffa04fd51a>] ? gzip_compress+0x2e/0x5a [zfs]
[48598.524471]  [<ffffffffa0547ea2>] ? zio_compress_data+0x68/0xa4 [zfs]
[48598.524503]  [<ffffffffa054463a>] ? zio_write_bp_init+0x1a7/0x3c1 [zfs]
[48598.524537]  [<ffffffffa0543f4f>] ? zio_execute+0xad/0xcf [zfs]
[48598.524550]  [<ffffffffa036908d>] ? taskq_thread+0x2b7/0x4eb [spl]
[48598.524556]  [<ffffffff8105cd53>] ? try_to_wake_up+0x91/0x91
[48598.524567]  [<ffffffffa0368dd6>] ? task_done+0xfd/0xfd [spl]
[48598.524573]  [<ffffffff810532a7>] ? kthread+0x7d/0x85
[48598.524580]  [<ffffffff81363f24>] ? kernel_thread_helper+0x4/0x10
[48598.524586]  [<ffffffff8105322a>] ? flush_kthread_worker+0x92/0x92
[48598.524591]  [<ffffffff81363f20>] ? gs_change+0x13/0x13
[48598.524593] Mem-Info:
[48598.524595] Node 0 DMA per-cpu:
[48598.524600] CPU    0: hi:    0, btch:   1 usd:   0
[48598.524603] CPU    1: hi:    0, btch:   1 usd:   0
[48598.524605] CPU    2: hi:    0, btch:   1 usd:   0
[48598.524608] CPU    3: hi:    0, btch:   1 usd:   0
[48598.524610] CPU    4: hi:    0, btch:   1 usd:   0
[48598.524613] CPU    5: hi:    0, btch:   1 usd:   0
[48598.524616] CPU    6: hi:    0, btch:   1 usd:   0
[48598.524618] CPU    7: hi:    0, btch:   1 usd:   0
[48598.524620] Node 0 DMA32 per-cpu:
[48598.524624] CPU    0: hi:  186, btch:  31 usd:   0
[48598.524627] CPU    1: hi:  186, btch:  31 usd:   0
[48598.524630] CPU    2: hi:  186, btch:  31 usd:   0
[48598.524632] CPU    3: hi:  186, btch:  31 usd:   0
[48598.524635] CPU    4: hi:  186, btch:  31 usd:   0
[48598.524637] CPU    5: hi:  186, btch:  31 usd:   0
[48598.524640] CPU    6: hi:  186, btch:  31 usd:   0
[48598.524643] CPU    7: hi:  186, btch:  31 usd:   0
[48598.524644] Node 0 Normal per-cpu:
[48598.524648] CPU    0: hi:  186, btch:  31 usd:   0
[48598.524650] CPU    1: hi:  186, btch:  31 usd:   0
[48598.524653] CPU    2: hi:  186, btch:  31 usd:   0
[48598.524656] CPU    3: hi:  186, btch:  31 usd: 157
[48598.524658] CPU    4: hi:  186, btch:  31 usd:   0
[48598.524661] CPU    5: hi:  186, btch:  31 usd:   0
[48598.524664] CPU    6: hi:  186, btch:  31 usd:   0
[48598.524666] CPU    7: hi:  186, btch:  31 usd:   0
[48598.524675] active_anon:553149 inactive_anon:186176 isolated_anon:0
[48598.524675]  active_file:730149 inactive_file:419837 isolated_file:0
[48598.524675]  unevictable:3165 dirty:43 writeback:1 unstable:0
[48598.524675]  free:149339 slab_reclaimable:130757 slab_unreclaimable:110307
[48598.524675]  mapped:120658 shmem:144341 pagetables:10805 bounce:0
[48598.524680] Node 0 DMA free:15884kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15644kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[48598.524692] lowmem_reserve[]: 0 3158 16008 16008
[48598.524698] Node 0 DMA32 free:306912kB min:51668kB low:64584kB high:77500kB active_anon:102720kB inactive_anon:221184kB active_file:809640kB inactive_file:463108kB unevictable:2220kB isolated(anon):0kB isolated(file):0kB present:3234292kB mlocked:2220kB dirty:0kB writeback:4kB mapped:1928kB shmem:42320kB slab_reclaimable:146080kB slab_unreclaimable:41360kB kernel_stack:152kB pagetables:1720kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[48598.524710] lowmem_reserve[]: 0 0 12850 12850
[48598.524715] Node 0 Normal free:274560kB min:210220kB low:262772kB high:315328kB active_anon:2109876kB inactive_anon:523520kB active_file:2110956kB inactive_file:1216240kB unevictable:10440kB isolated(anon):0kB isolated(file):0kB present:13158432kB mlocked:10440kB dirty:172kB writeback:0kB mapped:480704kB shmem:535044kB slab_reclaimable:376948kB slab_unreclaimable:399852kB kernel_stack:5304kB pagetables:41500kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:95 all_unreclaimable? no
[48598.524726] lowmem_reserve[]: 0 0 0 0
[48598.524731] Node 0 DMA: 1*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15884kB
[48598.524746] Node 0 DMA32: 2142*4kB 7814*8kB 4946*16kB 2283*32kB 927*64kB 179*128kB 9*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 307816kB
[48598.524760] Node 0 Normal: 15810*4kB 15690*8kB 5096*16kB 26*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 275224kB
[48598.524775] 1294922 total pagecache pages
[48598.524778] 0 pages in swap cache
[48598.524781] Swap cache stats: add 0, delete 0, find 0/0
[48598.524783] Free swap  = 4194300kB
[48598.524784] Total swap = 4194300kB
[48598.581868] 4193776 pages RAM
[48598.581873] 107028 pages reserved
[48598.581877] 1165502 pages shared
[48598.581880] 2346504 pages non-shared

Did I miss any needed info?

Thanks,
Omen

@behlendorf
ZFS on Linux member

The first issue, the stack traces, is already fixed in the latest source. You may need to pull from master.

The second memory allocations are a more interesting side effect. Your right, they are huge (blame gzip), and they are usually on a slab. But the emergency slab object look like the kicked in here. Probably the most straight forward fix is to simply add an slab flag to optionally disable this support for the gzip work slabs. That would basically revert this to the previous code which was working well. I'll make patch.

@OmenWild
OmenWild commented Sep 6, 2012

I installed v0.6.0.75-rc10 last night and got neither error. Thanks!

@behlendorf behlendorf added a commit to zfsonlinux/spl that referenced this issue Sep 10, 2012
@behlendorf behlendorf Set KMC_NOEMERGENCY for zlib workspaces
The workspace required by zlib to perform compression is roughly
512MB (order-7).  These allocations are so large that we should
never attempt to directly kmalloc an emergency object for them.

It is far preferable to asynchronously vmalloc an additional slab
in case it's needed.  Then simply block waiting for an existing
object to be released or for the new slab to be allocated.

This can be accomplished by disabling emergency slab objects by
passing the KMC_NOEMERGENCY flag at slab creation time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zfsonlinux/zfs#917
95331f4
@behlendorf
ZFS on Linux member

Has anyone running the latest source observed any more of these warnings?

@OmenWild

I have seen no warning from the SPL after upgrading to 0.6.0.75-0ubuntu1~oneiric1 (about 6 days ago).

@cwedgwood

The issue seems to have gone away. We can reopen this if it occurs again.

@cwedgwood cwedgwood closed this Sep 11, 2012
@behlendorf
ZFS on Linux member

Good. To ensure we don't accidentally introduce any new instances of this, when the packages are built with --enable-debug this messages are fatal. That means I should catch most of them in my automated testing.

@chrisrd
chrisrd commented Sep 12, 2012

Fatal? Ouch. Perhaps the fatal should be enabled with a specific non-default compile time flag for a release or two?

@behlendorf
ZFS on Linux member

It's part of --enable-debug which is disabled by default. it's also fatal to ensure I notice. But it could have its own option if this becomes a problem.

@chrisrd
chrisrd commented Sep 12, 2012

I've been running with --enable-debug, but I'd prefer not to have avoidable fatals (although perhaps I'm already subject to these). A possible feature suggestion would be to have --enable-debug as "slower with additional checks, but just as solid" and a new --enable-debug-fatal for "you'll really notice these!".

@behlendorf
ZFS on Linux member

@chrisrd If your enabling debug then your already subject to these. A large number of fatal assertions will be enabled in the code. These are all for things that should never happen, but if they do they are fatal. The same is true for this debugging. It should never happen.

@dremon
dremon commented Sep 17, 2012

Running latest daily build (from 14-09-2012), noticed this message in the kernel log:

SPL: Fixing allocation for task txg_sync (782) which used GFP flags 0xbc7bb1c with PF_NOFS set
[ 8503.870218] SPL: Showing stack for process 782
[ 8503.870222] Pid: 782, comm: txg_sync Tainted: P O 3.5.0-14-generic #19-Ubuntu
[ 8503.870224] Call Trace:
[ 8503.870250] [] spl_debug_dumpstack+0x27/0x40 [spl]
[ 8503.870256] [] sanitize_flags.part.12+0x68/0x956 [spl]
[ 8503.870262] [] kmem_alloc_debug+0x387/0x3b0 [spl]
[ 8503.870285] [] ? dsl_prop_get_ds+0x16e/0x230 [zfs]
[ 8503.870303] [] ? zap_remove_norm+0x13c/0x1c0 [zfs]
[ 8503.870319] [] dsl_prop_set_sync+0x3a1/0x6a0 [zfs]
[ 8503.870326] [] ? cv_wait_common+0xb7/0x190 [spl]
[ 8503.870330] [] ? _raw_spin_lock+0xe/0x20
[ 8503.870345] [] dsl_sync_task_group_sync+0x127/0x220 [zfs]
[ 8503.870361] [] dsl_pool_sync+0x1db/0x460 [zfs]
[ 8503.870378] [] spa_sync+0x3c8/0x9e0 [zfs]
[ 8503.870396] [] txg_sync_thread+0x2c3/0x4a0 [zfs]
[ 8503.870414] [] ? txg_init+0x250/0x250 [zfs]
[ 8503.870419] [] thread_generic_wrapper+0x78/0x90 [spl]
[ 8503.870424] [] ? __thread_create+0x340/0x340 [spl]
[ 8503.870427] [] kthread+0x93/0xa0
[ 8503.870430] [] kernel_thread_helper+0x4/0x10
[ 8503.870431] [] ? flush_kthread_worker+0x80/0x80
[ 8503.870433] [] ? gs_change+0x13/0x13

@behlendorf
ZFS on Linux member

@dremon Thank you, fixed.

@behlendorf behlendorf added a commit that referenced this issue Sep 17, 2012
@behlendorf behlendorf Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
ba36727
This was referenced Oct 4, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.