New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang importing pool in 0.6.4-1 with zvol present #3330

Closed
dswartz opened this Issue Apr 22, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@dswartz
Contributor

dswartz commented Apr 22, 2015

Discovered this while testing zfs support in ESOS (run from usb block storage appliance). The developer says he just built from source using the vanilla 0.6.4 tarball. Anyway, here is the reproducer: in a vsphere VM, create two vdisks, one 8GB (for OS) and one 16GB (for zpool). Install ESOS on the 8GB vdisk using the documented install script. Boot ESOS. Login, and do this:

zpool create -f test sdb
zfs create -V 12G test/foo
zpool export test
zpool import test

virtual machine hangs. no keyboard input, nothing. as this is a vsphere VM (I don't have a physical host to try with ATM), i have no idea how to send any magic sysrq to get a trace (if possible.)

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 22, 2015

Contributor

Forgot to mention. This seems specific to zvols. An empty pool imports fine. So does a pool with a single dataset. A pool with a single zvol hangs every time for me.

Contributor

dswartz commented Apr 22, 2015

Forgot to mention. This seems specific to zvols. An empty pool imports fine. So does a pool with a single dataset. A pool with a single zvol hangs every time for me.

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 22, 2015

Contributor

Okay, I figured out how to send magic sysrq, but backtrace shows nothing. I did send a crash and it dutifully rebooted. Any ideas?

Contributor

dswartz commented Apr 22, 2015

Okay, I figured out how to send magic sysrq, but backtrace shows nothing. I did send a crash and it dutifully rebooted. Any ideas?

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 22, 2015

Contributor

I did manage to capture a backtrace where it was doing spa_import. Now what?

Contributor

dswartz commented Apr 22, 2015

I did manage to capture a backtrace where it was doing spa_import. Now what?

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 22, 2015

Contributor

#9 [ffff88013fc03b78] __handle_sysrq at ffffffff813c71a2
#10 [ffff88013fc03bb0] sysrq_filter at ffffffff813c734b
#11 [ffff88013fc03be8] input_to_handler at ffffffff81769ab9
#12 [ffff88013fc03c28] input_pass_values at ffffffff8176b0f4
#13 [ffff88013fc03c70] input_handle_event at ffffffff8176c637
#14 [ffff88013fc03cb0] input_event at ffffffff8176c8ca
#15 [ffff88013fc03cf0] atkbd_interrupt at ffffffff81772401
#16 [ffff88013fc03d40] serio_interrupt at ffffffff817671d5
#17 [ffff88013fc03d78] i8042_interrupt at ffffffff81767dd1
#18 [ffff88013fc03dc8] handle_irq_event_percpu at ffffffff81077167
#19 [ffff88013fc03e08] handle_irq_event at ffffffff81077279
#20 [ffff88013fc03e30] handle_edge_irq at ffffffff81079722
#21 [ffff88013fc03e48] handle_irq at ffffffff8100436a
#22 [ffff88013fc03e60] do_IRQ at ffffffff81003e7f
#23 [ffff88013fc03f08] __do_softirq at ffffffff8104114a
#24 [ffff88013fc03f88] irq_exit at ffffffff810413f1
#25 [ffff88013fc03f98] smp_apic_timer_interrupt at ffffffff810281fc
#26 [ffff88013fc03fb0] apic_timer_interrupt at ffffffff8198bf0a
--- ---
#27 [ffff88011e71d8e8] apic_timer_interrupt at ffffffff8198bf0a
[exception RIP: disk_clear_events+15]
RIP: ffffffff81337cda RSP: ffff88011e71d990 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88011e71da70 RCX: ffff880067f11400
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff880067f11400
RBP: ffff88011e71d9b8 R8: 0000000000014000 R9: ffff88013ac24680
R10: ffff88011e71da70 R11: 0000000000000001 R12: ffff88011e71da70
R13: ffff88011e71d9a0 R14: ffff88013b00b3c8 R15: ffff88011e71d980
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#28 [ffff88011e71d9c0] check_disk_change at ffffffff811181f6
#29 [ffff88011e71d9e0] zvol_open at ffffffffa0351aa4 [zfs]
#30 [ffff88011e71da38] __blkdev_get at ffffffff81118a60
#31 [ffff88011e71da90] blkdev_get at ffffffff81118ebc
#32 [ffff88011e71db18] add_disk at ffffffff813376be
#33 [ffff88011e71db70] __zvol_create_minor at ffffffffa03516e0 [zfs]
#34 [ffff88011e71dbc0] zvol_create_minor at ffffffffa0352aef [zfs]
#35 [ffff88011e71dbe0] zvol_create_minors_cb at ffffffffa0352b2c [zfs]
#36 [ffff88011e71dbf0] dmu_objset_find_impl at ffffffffa02d2b13 [zfs]
#37 [ffff88011e71dca8] dmu_objset_find_impl at ffffffffa02d2978 [zfs]
#38 [ffff88011e71dd60] dmu_objset_find at ffffffffa02d2b65 [zfs]
#39 [ffff88011e71dd98] zvol_create_minors at ffffffffa0352c69 [zfs]
#40 [ffff88011e71dda8] spa_import at ffffffffa03049c7 [zfs]
#41 [ffff88011e71de40] zfs_ioc_pool_import at ffffffffa032afbb [zfs]
#42 [ffff88011e71de78] zfsdev_ioctl at ffffffffa032f94e [zfs]
#43 [ffff88011e71dee0] do_vfs_ioctl at ffffffff810fdb51
#44 [ffff88011e71df48] sys_ioctl at ffffffff810fdc3d
#45 [ffff88011e71df80] system_call_fastpath at ffffffff8198b3c2
RIP: 00007f74f1beab27 RSP: 00007ffdd6275fb8 RFLAGS: 00010246
RAX: 0000000000000010 RBX: ffffffff8198b3c2 RCX: 0000000000000000
RDX: 00007ffdd6276810 RSI: 0000000000005a02 RDI: 0000000000000003
RBP: 00000000008ed090 R8: 0000000000000048 R9: 0000000000100000
R10: 00007ffdd6275d70 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000008f7a88 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

Contributor

dswartz commented Apr 22, 2015

#9 [ffff88013fc03b78] __handle_sysrq at ffffffff813c71a2
#10 [ffff88013fc03bb0] sysrq_filter at ffffffff813c734b
#11 [ffff88013fc03be8] input_to_handler at ffffffff81769ab9
#12 [ffff88013fc03c28] input_pass_values at ffffffff8176b0f4
#13 [ffff88013fc03c70] input_handle_event at ffffffff8176c637
#14 [ffff88013fc03cb0] input_event at ffffffff8176c8ca
#15 [ffff88013fc03cf0] atkbd_interrupt at ffffffff81772401
#16 [ffff88013fc03d40] serio_interrupt at ffffffff817671d5
#17 [ffff88013fc03d78] i8042_interrupt at ffffffff81767dd1
#18 [ffff88013fc03dc8] handle_irq_event_percpu at ffffffff81077167
#19 [ffff88013fc03e08] handle_irq_event at ffffffff81077279
#20 [ffff88013fc03e30] handle_edge_irq at ffffffff81079722
#21 [ffff88013fc03e48] handle_irq at ffffffff8100436a
#22 [ffff88013fc03e60] do_IRQ at ffffffff81003e7f
#23 [ffff88013fc03f08] __do_softirq at ffffffff8104114a
#24 [ffff88013fc03f88] irq_exit at ffffffff810413f1
#25 [ffff88013fc03f98] smp_apic_timer_interrupt at ffffffff810281fc
#26 [ffff88013fc03fb0] apic_timer_interrupt at ffffffff8198bf0a
--- ---
#27 [ffff88011e71d8e8] apic_timer_interrupt at ffffffff8198bf0a
[exception RIP: disk_clear_events+15]
RIP: ffffffff81337cda RSP: ffff88011e71d990 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88011e71da70 RCX: ffff880067f11400
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff880067f11400
RBP: ffff88011e71d9b8 R8: 0000000000014000 R9: ffff88013ac24680
R10: ffff88011e71da70 R11: 0000000000000001 R12: ffff88011e71da70
R13: ffff88011e71d9a0 R14: ffff88013b00b3c8 R15: ffff88011e71d980
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#28 [ffff88011e71d9c0] check_disk_change at ffffffff811181f6
#29 [ffff88011e71d9e0] zvol_open at ffffffffa0351aa4 [zfs]
#30 [ffff88011e71da38] __blkdev_get at ffffffff81118a60
#31 [ffff88011e71da90] blkdev_get at ffffffff81118ebc
#32 [ffff88011e71db18] add_disk at ffffffff813376be
#33 [ffff88011e71db70] __zvol_create_minor at ffffffffa03516e0 [zfs]
#34 [ffff88011e71dbc0] zvol_create_minor at ffffffffa0352aef [zfs]
#35 [ffff88011e71dbe0] zvol_create_minors_cb at ffffffffa0352b2c [zfs]
#36 [ffff88011e71dbf0] dmu_objset_find_impl at ffffffffa02d2b13 [zfs]
#37 [ffff88011e71dca8] dmu_objset_find_impl at ffffffffa02d2978 [zfs]
#38 [ffff88011e71dd60] dmu_objset_find at ffffffffa02d2b65 [zfs]
#39 [ffff88011e71dd98] zvol_create_minors at ffffffffa0352c69 [zfs]
#40 [ffff88011e71dda8] spa_import at ffffffffa03049c7 [zfs]
#41 [ffff88011e71de40] zfs_ioc_pool_import at ffffffffa032afbb [zfs]
#42 [ffff88011e71de78] zfsdev_ioctl at ffffffffa032f94e [zfs]
#43 [ffff88011e71dee0] do_vfs_ioctl at ffffffff810fdb51
#44 [ffff88011e71df48] sys_ioctl at ffffffff810fdc3d
#45 [ffff88011e71df80] system_call_fastpath at ffffffff8198b3c2
RIP: 00007f74f1beab27 RSP: 00007ffdd6275fb8 RFLAGS: 00010246
RAX: 0000000000000010 RBX: ffffffff8198b3c2 RCX: 0000000000000000
RDX: 00007ffdd6276810 RSI: 0000000000005a02 RDI: 0000000000000003
RBP: 00000000008ed090 R8: 0000000000000048 R9: 0000000000100000
R10: 00007ffdd6275d70 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000008f7a88 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 24, 2015

Contributor

No comment in 2 days? On a 100% reproduceable hang? Seriously?

Contributor

dswartz commented Apr 24, 2015

No comment in 2 days? On a 100% reproduceable hang? Seriously?

@behlendorf

This comment has been minimized.

Show comment
Hide comment
@behlendorf

behlendorf Apr 24, 2015

Member

@dswartz can you include the kernel version. I'm not able to reproduce this using your reproducer and this is the first report we've seen.

Member

behlendorf commented Apr 24, 2015

@dswartz can you include the kernel version. I'm not able to reproduce this using your reproducer and this is the first report we've seen.

@dswartz

This comment has been minimized.

Show comment
Hide comment
@dswartz

dswartz Apr 24, 2015

Contributor

It was 3.14.36 with scst patches applied. Dev (Marc Smith) says he
downloaded straight up 0.1.1 tarball from zfsonlinux.org and put zfs.ko
and spl.ko on the install image. The easiest repro so you don't have to
jump through hoops: I have created a VM with 8GB boot disk and 16GB
data disk (with a zpool on it, along with a single 12GB zvol.) I
exported that as an OVA. All you need to do is deploy it on a vsphere
host (might work with vbox or kvm too, I don't know). If you want to
take a look, I'll put it on my webserver for you to DL...

Contributor

dswartz commented Apr 24, 2015

It was 3.14.36 with scst patches applied. Dev (Marc Smith) says he
downloaded straight up 0.1.1 tarball from zfsonlinux.org and put zfs.ko
and spl.ko on the install image. The easiest repro so you don't have to
jump through hoops: I have created a VM with 8GB boot disk and 16GB
data disk (with a zpool on it, along with a single 12GB zvol.) I
exported that as an OVA. All you need to do is deploy it on a vsphere
host (might work with vbox or kvm too, I don't know). If you want to
take a look, I'll put it on my webserver for you to DL...

@FransUrbo

This comment has been minimized.

Show comment
Hide comment
@FransUrbo

FransUrbo May 22, 2015

Member

I've seen the same on a brand new Gentoo installed VM. I've tried to debug this to, but to no avail. It just hangs for me. The kernel seems to be up (at least the network works - I get a ping back), but any login via ssh or console doesn't work. The consoles are "dead".

My kernel was a standard Gentoo kernel and almost-latlest GIT masters.

Strange thing is that it used to work. It worked for almost a day and then all of a sudden, no changes in packages or anything, blammo!. ONLY a whole bunch of imports and exports back and forth (I was debugging the init scripts in my PR).

Member

FransUrbo commented May 22, 2015

I've seen the same on a brand new Gentoo installed VM. I've tried to debug this to, but to no avail. It just hangs for me. The kernel seems to be up (at least the network works - I get a ping back), but any login via ssh or console doesn't work. The consoles are "dead".

My kernel was a standard Gentoo kernel and almost-latlest GIT masters.

Strange thing is that it used to work. It worked for almost a day and then all of a sudden, no changes in packages or anything, blammo!. ONLY a whole bunch of imports and exports back and forth (I was debugging the init scripts in my PR).

@behlendorf behlendorf added this to the 0.7.0 milestone Sep 21, 2015

@behlendorf

This comment has been minimized.

Show comment
Hide comment
@behlendorf

behlendorf Mar 25, 2016

Member

This issue was addressed in 0.6.5.6 with the asynchronous zvol changes.

Member

behlendorf commented Mar 25, 2016

This issue was addressed in 0.6.5.6 with the asynchronous zvol changes.

@behlendorf behlendorf closed this Mar 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment