Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VERIFY(zio->io_children[c][w] == 0) failed with 7.0-RC2 #5429

Closed
tuomari opened this issue Nov 27, 2016 · 10 comments
Closed

VERIFY(zio->io_children[c][w] == 0) failed with 7.0-RC2 #5429

tuomari opened this issue Nov 27, 2016 · 10 comments
Milestone

Comments

@tuomari
Copy link

tuomari commented Nov 27, 2016

System is same as in #5226
Kernel version: 4.7.6

Nov 22 10:31:46 helvi kernel: [   19.528130] ZFS: Loaded module v0.7.0-rc2_43_gce4197c (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Nov 22 10:31:46 helvi kernel: [   17.858508] SPL: Loaded module v0.7.0-rc2_2_gcbba714 (DEBUG mode)
[Sat Nov 26 22:49:00 2016] VERIFY(zio->io_children[c][w] == 0) failed
[Sat Nov 26 22:49:00 2016] PANIC at zio.c:3662:zio_done()
[Sat Nov 26 22:49:00 2016] Showing stack for process 1349
[Sat Nov 26 22:49:00 2016] CPU: 4 PID: 1349 Comm: z_wr_iss Tainted: P           O    4.7.6.kvm.ovs.1 #2
[Sat Nov 26 22:49:00 2016] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[Sat Nov 26 22:49:00 2016]  0000000000000000 ffff8808263eb708 ffffffff8131012b ffffffffa0e6b994
[Sat Nov 26 22:49:00 2016]  0000000000000e4e ffff8808263eb718 ffffffffa0316a5f ffff8808263eb898
[Sat Nov 26 22:49:00 2016]  ffffffffa0316b17 0236ac5b3fc16940 ffff880800000028 ffff8808263eb8a8
[Sat Nov 26 22:49:00 2016] Call Trace:
[Sat Nov 26 22:49:00 2016]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0316a5f>] spl_dumpstack+0x3f/0x50 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0316b17>] spl_panic+0xa7/0xd0 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[Sat Nov 26 22:49:00 2016]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[Sat Nov 26 22:49:00 2016]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9f33b>] zio_done+0xb8b/0x18d0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0da02b3>] ? zio_ready+0x233/0x5e0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9b84e>] zio_nowait+0x10e/0x310 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0cbf484>] dbuf_prefetch+0x4d4/0x610 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0ccab7c>] dmu_prefetch+0x21c/0x240 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d347d6>] space_map_load+0xe6/0x6c0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d0f813>] metaslab_load+0x63/0x1a0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d0fbfd>] metaslab_activate+0xcd/0x150 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d12342>] metaslab_alloc+0x7c2/0x12d0 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d9bd00>] zio_dva_allocate+0x170/0x900 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[Sat Nov 26 22:49:00 2016]  [<ffffffffa031a3a0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa031a48d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0312af3>] ? taskq_member+0x13/0x20 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0d95d73>] zio_execute+0x103/0x320 [zfs]
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0313fe9>] taskq_thread+0x299/0x590 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
[Sat Nov 26 22:49:00 2016]  [<ffffffffa0313d50>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c0b98>] kthread+0xe8/0x100
[Sat Nov 26 22:49:00 2016]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
[Sat Nov 26 22:49:00 2016]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190
@tuomari
Copy link
Author

tuomari commented Dec 29, 2016

I hit this again with newer ZFS version:

ZFS: Loaded module v0.7.0-rc2_68_ge8a2014 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.7.0-rc2_3_g4934925 (DEBUG mode)
Dec 28 09:07:50 helvi kernel: [485283.704908] VERIFY(zio->io_children[c][w] == 0) failed
Dec 28 09:07:50 helvi kernel: [485283.704962] PANIC at zio.c:3719:zio_done()
Dec 28 09:07:50 helvi kernel: [485283.705018] Showing stack for process 1367
Dec 28 09:07:50 helvi kernel: [485283.705021] CPU: 6 PID: 1367 Comm: z_wr_iss Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
Dec 28 09:07:50 helvi kernel: [485283.705023] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
Dec 28 09:07:50 helvi kernel: [485283.705025]  0000000000000000 ffff88082caef6f8 ffffffff8131012b ffffffffa0ec3e7c
Dec 28 09:07:50 helvi kernel: [485283.705029]  0000000000000e87 ffff88082caef708 ffffffffa03b4a6f ffff88082caef888
Dec 28 09:07:50 helvi kernel: [485283.705031]  ffffffffa03b4b27 ffff88083fc96940 ffff880800000028 ffff88082caef898
Dec 28 09:07:50 helvi kernel: [485283.705034] Call Trace:
Dec 28 09:07:50 helvi kernel: [485283.705044]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
Dec 28 09:07:50 helvi kernel: [485283.705062]  [<ffffffffa03b4a6f>] spl_dumpstack+0x3f/0x50 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705068]  [<ffffffffa03b4b27>] spl_panic+0xa7/0xd0 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705074]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705077]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
Dec 28 09:07:50 helvi kernel: [485283.705080]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
Dec 28 09:07:50 helvi kernel: [485283.705082]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
Dec 28 09:07:50 helvi kernel: [485283.705084]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705161]  [<ffffffffa0de6c23>] zio_done+0xbd3/0x1920 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705163]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705213]  [<ffffffffa0de7ba3>] ? zio_ready+0x233/0x5e0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705215]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705264]  [<ffffffffa0de305e>] zio_nowait+0x10e/0x310 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705297]  [<ffffffffa0d08ed4>] dbuf_prefetch+0x4d4/0x610 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705332]  [<ffffffffa0d1460c>] dmu_prefetch+0x21c/0x240 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705381]  [<ffffffffa0d7e414>] space_map_load+0xf4/0x6d0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705385]  [<ffffffff810c9f6d>] ? wake_up_q+0x2d/0x70
Dec 28 09:07:50 helvi kernel: [485283.705426]  [<ffffffffa0d592e3>] metaslab_load+0x63/0x1a0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705475]  [<ffffffffa0d596cd>] metaslab_activate+0xcd/0x150 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705523]  [<ffffffffa0d5be12>] metaslab_alloc+0x7c2/0x12d0 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705583]  [<ffffffffa0ddd433>] ? zio_execute+0x103/0x320 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705638]  [<ffffffffa0de3510>] zio_dva_allocate+0x170/0x940 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705646]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
Dec 28 09:07:50 helvi kernel: [485283.705657]  [<ffffffffa03b83b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705669]  [<ffffffffa03b849d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705681]  [<ffffffffa03b0b03>] ? taskq_member+0x13/0x20 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705732]  [<ffffffffa0ddd433>] zio_execute+0x103/0x320 [zfs]
Dec 28 09:07:50 helvi kernel: [485283.705741]  [<ffffffffa03b1ff9>] taskq_thread+0x299/0x590 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705744]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
Dec 28 09:07:50 helvi kernel: [485283.705749]  [<ffffffffa03b1d60>] ? taskq_thread_should_stop+0x80/0x80 [spl]
Dec 28 09:07:50 helvi kernel: [485283.705752]  [<ffffffff810c0b98>] kthread+0xe8/0x100
Dec 28 09:07:50 helvi kernel: [485283.705762]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
Dec 28 09:07:50 helvi kernel: [485283.705770]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190

@tuomari
Copy link
Author

tuomari commented Jan 1, 2017

I seem to hit this every few days at the moment. These have all occured while resilvering the pool.

[   30.084573] SPL: Loaded module v0.7.0-rc2_8_g8e99d66 (DEBUG mode)
[   31.758485] ZFS: Loaded module v0.7.0-rc2_99_g3500a14 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 12 21:42:20 2016
	16.2T scanned out of 29.5T at 4.49M/s, (scan is slow, no estimated time)
	1.10T resilvered, 54.89% done
config:

	NAME                                                       STATE     READ WRITE CKSUM
	tank                                                       DEGRADED     0     0     0
	  raidz1-0                                                 DEGRADED     0     0     0
	    ata-ST3000DM001-9YN166_W1F3AEZG                        ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F38MTB-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1ER166_Z5009QEX                        ONLINE       0     0     0
	    ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0HRLESN-part4         ONLINE       0     0     0
	    ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N2SED25L-part4         ONLINE       0     0     0
	    spare-5                                                DEGRADED     0     0     0
	      9714062319235286728                                  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F4CKB1-part1
	      ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3ELL01X             ONLINE       0     0     0  (resilvering)
	  raidz1-1                                                 ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F4REQ1-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F4S6CX-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F3XC4M-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F4S822-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F4S6D9-part4                  ONLINE       0     0     0
	    ata-ST3000DM001-1CH166_Z1F4S7Z6-part4                  ONLINE       0     0     0
	logs
	  mirror-2                                                 ONLINE       0     0     0
	    ata-KINGSTON_SH103S3120G_50026B7334000234-part4        ONLINE       0     0     0
	    ata-Kingston_SSDNow_V_Series_128GB_06P970009771-part4  ONLINE       0     0     0
	cache
	  ata-KINGSTON_SH103S3120G_50026B7334000234-part5          ONLINE       0     0     0
	  ata-Kingston_SSDNow_V_Series_128GB_06P970009771-part5    ONLINE       0     0     0
	spares
	  ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3ELL01X                 INUSE     currently in use

errors: No known data errors
[196471.609690] VERIFY(zio->io_children[c][w] == 0) failed
[196471.609742] PANIC at zio.c:3719:zio_done()
[196471.609801] Showing stack for process 20437
[196471.609805] CPU: 12 PID: 20437 Comm: zmaudit.pl Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
[196471.609807] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[196471.609809]  0000000000000000 ffff8801d8ceb8f8 ffffffff8131012b ffffffffa0ecd054
[196471.609812]  0000000000000e87 ffff8801d8ceb908 ffffffffa01f2b1f ffff8801d8ceba88
[196471.609815]  ffffffffa01f2bd7 ffff88083fd16940 ffff880100000028 ffff8801d8ceba98
[196471.609817] Call Trace:
[196471.609826]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[196471.609841]  [<ffffffffa01f2b1f>] spl_dumpstack+0x3f/0x50 [spl]
[196471.609852]  [<ffffffffa01f2bd7>] spl_panic+0xa7/0xd0 [spl]
[196471.609860]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[196471.609867]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[196471.609873]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[196471.609880]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[196471.609882]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.609966]  [<ffffffffa0dea983>] zio_done+0xbd3/0x1920 [zfs]
[196471.609974]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610030]  [<ffffffffa0deb903>] ? zio_ready+0x233/0x5e0 [zfs]
[196471.610034]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610084]  [<ffffffffa0de6dbe>] zio_nowait+0x10e/0x310 [zfs]
[196471.610138]  [<ffffffffa0d0bee4>] dbuf_prefetch+0x4d4/0x610 [zfs]
[196471.610199]  [<ffffffffa0d1754c>] dmu_prefetch+0x14c/0x240 [zfs]
[196471.610285]  [<ffffffffa0dc9817>] zfs_readdir+0x2f7/0x400 [zfs]
[196471.610293]  [<ffffffff811b0cf2>] ? alloc_pages_vma+0xc2/0x260
[196471.610299]  [<ffffffff811de6bf>] ? terminate_walk+0x5f/0xd0
[196471.610307]  [<ffffffff811cd6e7>] ? mem_cgroup_commit_charge+0x67/0xa0
[196471.610310]  [<ffffffff811a1a0e>] ? page_add_new_anon_rmap+0x9e/0xd0
[196471.610313]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[196471.610319]  [<ffffffff81196e3f>] ? handle_mm_fault+0xb8f/0x1a50
[196471.610388]  [<ffffffffa0df0de7>] zpl_iterate+0x67/0x150 [zfs]
[196471.610393]  [<ffffffff811e8176>] iterate_dir+0x176/0x190
[196471.610395]  [<ffffffff811e85a4>] SyS_getdents+0x84/0xf0
[196471.610397]  [<ffffffff811e8190>] ? iterate_dir+0x190/0x190
[196471.610400]  [<ffffffff816ad11b>] entry_SYSCALL_64_fastpath+0x13/0x8f
```

@behlendorf behlendorf added this to the 0.7.0 milestone Jan 1, 2017
@tuomari
Copy link
Author

tuomari commented Jan 4, 2017

And again, after three days. Same versions as in the previous stack. Also resilver is still running.

[224731.486285] VERIFY(zio->io_children[c][w] == 0) failed
[224731.486312] PANIC at zio.c:3719:zio_done()
[224731.486328] Showing stack for process 1358
[224731.486333] CPU: 13 PID: 1358 Comm: z_wr_iss Tainted: P           O    4.7.6.remod.kvm.ovs.1 #2
[224731.486335] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[224731.486338]  0000000000000000 ffff880825f3b6f8 ffffffff8131012b ffffffffa2106054
[224731.486343]  0000000000000e87 ffff880825f3b708 ffffffffa0292b1f ffff880825f3b888
[224731.486346]  ffffffffa0292bd7 ffff88083fd56940 ffff880800000028 ffff880825f3b898
[224731.486350] Call Trace:
[224731.486362]  [<ffffffff8131012b>] dump_stack+0x4d/0x72
[224731.486381]  [<ffffffffa0292b1f>] spl_dumpstack+0x3f/0x50 [spl]
[224731.486390]  [<ffffffffa0292bd7>] spl_panic+0xa7/0xd0 [spl]
[224731.486397]  [<ffffffff816aceb9>] ? _raw_spin_unlock_irq+0x9/0x10
[224731.486403]  [<ffffffff810c622a>] ? finish_task_switch+0x7a/0x200
[224731.486407]  [<ffffffff816a82f6>] ? __schedule+0x1d6/0x680
[224731.486505]  [<ffffffffa201d52f>] ? zio_create+0x44f/0x6d0 [zfs]
[224731.486510]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[224731.486513]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486581]  [<ffffffffa2023983>] zio_done+0xbd3/0x1920 [zfs]
[224731.486586]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486652]  [<ffffffffa2024903>] ? zio_ready+0x233/0x5e0 [zfs]
[224731.486657]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.486714]  [<ffffffffa201fdbe>] zio_nowait+0x10e/0x310 [zfs]
[224731.486753]  [<ffffffffa1f44ee4>] dbuf_prefetch+0x4d4/0x610 [zfs]
[224731.486790]  [<ffffffffa1f5061c>] dmu_prefetch+0x21c/0x240 [zfs]
[224731.486847]  [<ffffffffa1fba424>] space_map_load+0xf4/0x6d0 [zfs]
[224731.486851]  [<ffffffff816aaef9>] ? __mutex_unlock_slowpath+0xa9/0x140
[224731.486894]  [<ffffffffa1f952d3>] metaslab_load+0x63/0x1a0 [zfs]
[224731.486937]  [<ffffffffa1f956bd>] metaslab_activate+0xcd/0x150 [zfs]
[224731.486980]  [<ffffffffa1f97e02>] metaslab_alloc+0x7c2/0x12d0 [zfs]
[224731.487030]  [<ffffffffa201a193>] ? zio_execute+0x103/0x320 [zfs]
[224731.487081]  [<ffffffffa2020270>] zio_dva_allocate+0x170/0x940 [zfs]
[224731.487083]  [<ffffffff816ace89>] ? _raw_spin_unlock+0x9/0x10
[224731.487091]  [<ffffffffa0296460>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[224731.487097]  [<ffffffffa029654d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[224731.487103]  [<ffffffffa028eb03>] ? taskq_member+0x13/0x20 [spl]
[224731.487153]  [<ffffffffa201a193>] zio_execute+0x103/0x320 [zfs]
[224731.487159]  [<ffffffffa028fff9>] taskq_thread+0x299/0x590 [spl]
[224731.487162]  [<ffffffff810c9fb0>] ? wake_up_q+0x70/0x70
[224731.487167]  [<ffffffffa028fd60>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[224731.487170]  [<ffffffff810c0b98>] kthread+0xe8/0x100
[224731.487173]  [<ffffffff816ad30f>] ret_from_fork+0x1f/0x40
[224731.487175]  [<ffffffff810c0ab0>] ? kthread_create_on_node+0x190/0x190

@dweeezil
Copy link
Contributor

dweeezil commented Jan 4, 2017

Referencing #5226 due to the likelihood this involves gang blocks in some way. @tuomari The stacks involving metaslabs don't quite make sense. Are you running something newer than ce4197c now? Could you run gdb on your kernel module and list *(metaslab_alloc+0x7c2).

@tuomari
Copy link
Author

tuomari commented Jan 8, 2017

It seems that although I did have zfs built with --enable-debug and kernel was compiled DEBUG_KERNEL, zfs.ko did not have debug symbols when compiled with make deb and installed from those packages.

I have now installed the module with make install and confirmed that debugging symbols exist.
I will report back if I hit this again.

@tuomari
Copy link
Author

tuomari commented Jan 17, 2017

@dweeezil I forgot to answer your question about the version. I was running SPL openzfs/spl@8e99d66 and ZFS 3500a14

Also, if the gdb output for the recompiled module works correctly, list *(metaslab_alloc+0x7c2) gives the following:

(gdb) list *(metaslab_alloc+0x7c2)
0x72e82 is in metaslab_alloc (/usr/src2/zfs/module/zfs/metaslab.c:2295).
2290				    msp->ms_weight & ~METASLAB_ACTIVE_MASK);
2291				mutex_exit(&msp->ms_lock);
2292				continue;
2293			}
2294	
2295			if (metaslab_activate(msp, activation_weight) != 0) {
2296				mutex_exit(&msp->ms_lock);
2297				continue;
2298			}
2299	
(gdb) 

@tuomari
Copy link
Author

tuomari commented Jan 17, 2017

It seems that I'm hitting this less frequently after the pool finished resilvering a disk.

This time ZFS version is 42b64e5 and SPL version openzfs/spl@8e99d66

[474171.260844] VERIFY(zio->io_children[c][w] == 0) failed
[474171.260942] PANIC at zio.c:3719:zio_done()
[474171.261017] Showing stack for process 29747
[474171.261021] CPU: 6 PID: 29747 Comm: zfs Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[474171.261023] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[474171.261025]  ffffc90040727540 ffffffff813262fb ffffffffa0e214f1 0000000000000e87
[474171.261030]  ffffc90040727550 ffffffffa0275b2f ffffc900407276d0 ffffffffa0275be7
[474171.261033]  ffff88083fc98038 ffffc90000000028 ffffc900407276e0 ffffc90040727680
[474171.261036] Call Trace:
[474171.261045]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[474171.261065]  [<ffffffffa0275b2f>] spl_dumpstack+0x3f/0x50 [spl]
[474171.261074]  [<ffffffffa0275be7>] spl_panic+0xa7/0xd0 [spl]
[474171.261079]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[474171.261086]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[474171.261090]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[474171.261094]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[474171.261096]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261213]  [<ffffffffa0d42a23>] zio_done+0xbd3/0x1920 [zfs]
[474171.261216]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261283]  [<ffffffffa0d439a3>] ? zio_ready+0x233/0x5e0 [zfs]
[474171.261286]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[474171.261348]  [<ffffffffa0d3ee5e>] zio_nowait+0x10e/0x310 [zfs]
[474171.261398]  [<ffffffffa0c63ef6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[474171.261451]  [<ffffffffa0c6f630>] dmu_prefetch+0x220/0x240 [zfs]
[474171.261506]  [<ffffffffa0c7d622>] receive_read_prefetch+0x162/0x1c0 [zfs]
[474171.261563]  [<ffffffffa0c808f4>] dmu_recv_stream+0xb24/0xff0 [zfs]
[474171.261625]  [<ffffffffa0d13a4c>] zfs_ioc_recv_impl+0x11c/0xd50 [zfs]
[474171.261689]  [<ffffffffa0d147fd>] zfs_ioc_recv+0x17d/0x220 [zfs]
[474171.261725]  [<ffffffffa0c61d26>] ? dbuf_rele+0x36/0x70 [zfs]
[474171.261762]  [<ffffffffa0c899cc>] ? dnode_rele_and_unlock+0x7c/0xc0 [zfs]
[474171.261809]  [<ffffffffa0d15bee>] zfsdev_ioctl+0x60e/0x720 [zfs]
[474171.261815]  [<ffffffff811a6b92>] ? handle_mm_fault+0x342/0xfc0
[474171.261819]  [<ffffffff811fad0b>] do_vfs_ioctl+0x8b/0x5b0
[474171.261822]  [<ffffffff810996d0>] ? __do_page_fault+0x220/0x430
[474171.261823]  [<ffffffff811fb2a4>] SyS_ioctl+0x74/0x80
[474171.261826]  [<ffffffff816d5aa0>] entry_SYSCALL_64_fastpath+0x13/0x94

@tuomari
Copy link
Author

tuomari commented Jan 27, 2017

This time a little different path, and about the same 5 day uptime as with the previous panics

ZFS version 0.7.0-rc3_2_gbb7ffda
SPL version 0.7.0-rc3
[511461.454420] VERIFY(zio->io_children[c][w] == 0) failed
[511461.454494] PANIC at zio.c:3729:zio_done()
[511461.454553] Showing stack for process 1400
[511461.454557] CPU: 4 PID: 1400 Comm: z_wr_iss Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[511461.454559] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[511461.454561]  ffffc90005a23720 ffffffff813262fb ffffffffa0e5ce41 0000000000000e91
[511461.454566]  ffffc90005a23730 ffffffffa01c6b2f ffffc90005a238b0 ffffffffa01c6be7
[511461.454569]  0001d12be649372e 0000000000000028 ffffc90005a238c0 ffffc90005a23860
[511461.454572] Call Trace:
[511461.454582]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[511461.454598]  [<ffffffffa01c6b2f>] spl_dumpstack+0x3f/0x50 [spl]
[511461.454606]  [<ffffffffa01c6be7>] spl_panic+0xa7/0xd0 [spl]
[511461.454611]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[511461.454618]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[511461.454622]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[511461.454625]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[511461.454627]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454777]  [<ffffffffa0d7db75>] zio_done+0xbc5/0x1910 [zfs]
[511461.454780]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454848]  [<ffffffffa0d7eaf1>] ? zio_ready+0x231/0x5e0 [zfs]
[511461.454850]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.454918]  [<ffffffffa0d79fae>] zio_nowait+0x10e/0x310 [zfs]
[511461.454975]  [<ffffffffa0c9def6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[511461.455033]  [<ffffffffa0ca9630>] dmu_prefetch+0x220/0x240 [zfs]
[511461.455100]  [<ffffffffa0d14384>] space_map_load+0xf4/0x6d0 [zfs]
[511461.455103]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[511461.455167]  [<ffffffffa0cee717>] metaslab_load+0x67/0x1e0 [zfs]
[511461.455231]  [<ffffffffa0ceeb4a>] metaslab_activate+0xda/0x160 [zfs]
[511461.455296]  [<ffffffffa0cf18fe>] metaslab_alloc+0x82e/0x15a0 [zfs]
[511461.455302]  [<ffffffff810dba62>] ? enqueue_task_fair+0xb62/0x1260
[511461.455368]  [<ffffffffa0d7a46a>] zio_dva_allocate+0x17a/0x930 [zfs]
[511461.455371]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[511461.455379]  [<ffffffffa01ca3b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[511461.455387]  [<ffffffffa01ca49d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[511461.455393]  [<ffffffffa01c2b33>] ? taskq_member+0x13/0x20 [spl]
[511461.455461]  [<ffffffffa0d74363>] zio_execute+0x103/0x320 [zfs]
[511461.455469]  [<ffffffffa01c402b>] taskq_thread+0x29b/0x590 [spl]
[511461.455472]  [<ffffffff810d1170>] ? wake_up_q+0x70/0x70
[511461.455479]  [<ffffffffa01c3d90>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[511461.455482]  [<ffffffff810c7079>] kthread+0xe9/0x100
[511461.455484]  [<ffffffff810c6f90>] ? kthread_park+0x60/0x60
[511461.455487]  [<ffffffff816d5cd2>] ret_from_fork+0x22/0x30

@tuomari
Copy link
Author

tuomari commented Feb 4, 2017

Same version as before.

[Thu Feb  2 17:43:22 2017] VERIFY(zio->io_children[c][w] == 0) failed
[Thu Feb  2 17:43:22 2017] PANIC at zio.c:3729:zio_done()
[Thu Feb  2 17:43:22 2017] Showing stack for process 1423
[Thu Feb  2 17:43:22 2017] CPU: 6 PID: 1423 Comm: z_wr_iss Tainted: P           O    4.9.1.remod.kvm.ovs.1 #1
[Thu Feb  2 17:43:22 2017] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
[Thu Feb  2 17:43:22 2017]  ffffc900062ab720 ffffffff813262fb ffffffffa1306e41 0000000000000e91
[Thu Feb  2 17:43:22 2017]  ffffc900062ab730 ffffffffa0346b2f ffffc900062ab8b0 ffffffffa0346be7
[Thu Feb  2 17:43:22 2017]  ffff88083fc98001 001da3a900000028 ffffc900062ab8c0 ffffc900062ab860
[Thu Feb  2 17:43:22 2017] Call Trace:
[Thu Feb  2 17:43:22 2017]  [<ffffffff813262fb>] dump_stack+0x4d/0x72
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0346b2f>] spl_dumpstack+0x3f/0x50 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0346be7>] spl_panic+0xa7/0xd0 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d5829>] ? _raw_spin_unlock_irq+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffff810cd1fa>] ? finish_task_switch+0x7a/0x210
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d030b>] ? __schedule+0x17b/0x610
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d2eb1>] ? __mutex_unlock_slowpath+0xa1/0x140
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d2eb9>] ? __mutex_unlock_slowpath+0xa9/0x140
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1227b75>] zio_done+0xbc5/0x1910 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1228af1>] ? zio_ready+0x231/0x5e0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1223fae>] zio_nowait+0x10e/0x310 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1147ef6>] dbuf_prefetch+0x4d6/0x610 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1153630>] dmu_prefetch+0x220/0x240 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa11be384>] space_map_load+0xf4/0x6d0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff811cb36c>] ? __kmalloc_node+0xbc/0x2c0
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1198717>] metaslab_load+0x67/0x1e0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa1198b4a>] metaslab_activate+0xda/0x160 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa119b8fe>] metaslab_alloc+0x82e/0x15a0 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa122446a>] zio_dva_allocate+0x17a/0x930 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d57f9>] ? _raw_spin_unlock+0x9/0x10
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034a3b0>] ? tsd_hash_search.isra.0+0x70/0x90 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034a49d>] ? tsd_get_by_thread+0x3d/0x90 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0342b33>] ? taskq_member+0x13/0x20 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa121e363>] zio_execute+0x103/0x320 [zfs]
[Thu Feb  2 17:43:22 2017]  [<ffffffffa034402b>] taskq_thread+0x29b/0x590 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff810d1170>] ? wake_up_q+0x70/0x70
[Thu Feb  2 17:43:22 2017]  [<ffffffffa0343d90>] ? taskq_thread_should_stop+0x80/0x80 [spl]
[Thu Feb  2 17:43:22 2017]  [<ffffffff810c7079>] kthread+0xe9/0x100
[Thu Feb  2 17:43:22 2017]  [<ffffffff810c6f90>] ? kthread_park+0x60/0x60
[Thu Feb  2 17:43:22 2017]  [<ffffffff816d5cd2>] ret_from_fork+0x22/0x30

@tuomari
Copy link
Author

tuomari commented Jun 20, 2017

Have not hit this in a while. Closing as this is most likely fixed...

@tuomari tuomari closed this as completed Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants