Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
NULL dereference in __blkdev_get() #949
wait many seconds
I thought perhaps there was a race/qurk with zvols, but this happens even in cases where there are none.
I've seen this reported once or twice as well. I also thought it was probably associated with zvols but you seem to have ruled that out.
The really interesting thing here is that we're just calling the user space sys_open() on a normal block device. Nothing in the ZFS kernel stack is involved here. I suppose it's possible we're racing with a concurrent operation on the bdev and things aren't locked properly, but most of that is handled by the kernel.
If you have you vmlinux kernel image still around can you resolve the offending line the NULL deref occurred on. That would help narrow down exactly what's getting damaged.
gdb /boot/vmlinux list *(__blkdev_get+0x75)
I rebuilt w/ CONFIG_DEBUG_INFO=y but I don't think that would have caused any insns to move about.
Assuming gcc didn't shuffle things about much:
No, it's 1137, __blkdev_get + 0x75 == 0xffffffff8112e308
The insb before is from: if (!disk) goto out;
disk not null
@cwedgwood Then the thing to do probably is to add an IS_ERR() check so it doesn't NULL deref and then reproduce the issue under system tap to get a full trace.
referenced this issue
Sep 16, 2012
This comment has been minimized.
This comment has been minimized.Show comment Hide comment
This was referenced
Sep 17, 2012
@behlendorf doing zfs destroy ... to remove a slew of zvol's i managed to hit this or something similar:
and i see:
I had debugging for the later as well but clearly not enough.
As mentioned above 00000000000002b6 is smaller than the struct size, so I think this is ERR_PTR + struct offset we're seeing
so, in this case it is zvol related, looking at zvol.c we have:
now, why zvol_find_by_dev fails i'm not sure (in this it might be that destroyed zvol's didn't do away (i've seen that) so something accessing things tripped up this)
pretty sure we can't return ERR_PTR(...) from here though as the rest of the code doesn't like it