Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference hardlock #2238

Closed
fling- opened this issue Apr 4, 2014 · 3 comments
Closed

NULL pointer dereference hardlock #2238

fling- opened this issue Apr 4, 2014 · 3 comments
Milestone

Comments

@fling-
Copy link
Contributor

fling- commented Apr 4, 2014

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048

Got another hardlock on another box while doing this > vds2 ~ # zfs set volsize=24G vds2/volumes/root/rdp2

Side effects:

  1. Unable to kill most of processes.
  2. zfs/zpool commands stopped working.
  3. i/o slowed down.

Volume size is not changed as seen after a reboot with /proc/sysrq-trigger:
vds2/volumes/root/rdp2 42,5G 1,60T 8,26G -

vds2 ~ # cat /sys/module/{spl,zfs}/version 
0.6.2-23_g4c99541
0.6.2-195_g0ad85ed
vds2 ~ # uname -a
Linux vds2 3.13.5-hardened-gnu_fling #4 SMP Tue Mar 4 10:00:47 OMST 2014 x86_64 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz GenuineIntel GNU/Linux
vds2 ~ # zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vds2  2,66T   749G  1,92T    27%  1.90x  ONLINE  -
vds2 ~ # zpool status
  pool: vds2
 state: ONLINE
  scan: scrub repaired 0 in 0h48m with 0 errors on Tue Mar 18 14:40:25 2014
config:

        NAME        STATE     READ WRITE CKSUM
        vds2        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
            sda3    ONLINE       0     0     0

errors: No known data errors

dmesg -> https://gist.github.com/anonymous/9968315

@fling-
Copy link
Contributor Author

fling- commented Apr 4, 2014

Hit the same bug previously on another box with the same symptoms while doing recursive chmod and/or chown on a clone with few millions of files.

worker0 ~ # cat /sys/module/{spl,zfs}/version 
0.6.2-23_g4c99541
0.6.2-224_g4d8c78c
worker0 ~ # uname -a
Linux worker0 3.13.5-hardened-gnu_fling #4 SMP PREEMPT Wed Mar 26 12:12:16 OMST 2014 x86_64 Intel(R) Xeon(R) CPU E31230 @ 3.20GHz GenuineIntel GNU/Linux
worker0 ~ # zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
worker0  3,50T   164G  3,34T     4%  1.27x  ONLINE  -
worker0 ~ # zpool status
  pool: worker0
 state: ONLINE
  scan: resilvered 193M in 0h1m with 0 errors on Tue Mar  4 15:01:27 2014
config:

        NAME        STATE     READ WRITE CKSUM
        worker0     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdc3    ONLINE       0     0     0
            sdd3    ONLINE       0     0     0

errors: No known data errors

dmesg -> https://gist.github.com/9968389
full dmesg -> https://gist.github.com/9968395

@dweeezil
Copy link
Contributor

dweeezil commented Apr 4, 2014

@fling- This is the "in 3.13 kthread_create() is interruptible and can also return ENOMEM" issue as discussed in #2230, openzfs/spl#331, openzfs/spl#339 and openzfs/spl#340.

The patch I worked up in openzfs/spl#340 addresses part of the problem and may very well fix your particular cases. We still need to handle the SIGKILL situation in some way. I've also been working on getting these failures to propagate back to the caller properly within ZFS and have pushed dweeezil/zfs@1070e6f as a WIP patch. That patch doesn't "fix" anything, but it should prevent the bogus pointer from being dereferenced and cause the error code to be returned properly to the caller. I just worked up this patch yesterday morning and have not tested it all.

EDIT: @fling-, to clarify, I do think the patch in openzfs/spl#340 will fix your immediate problem, however it's not a complete fix.

@behlendorf
Copy link
Contributor

This issue was resolved by openzfs/spl#339.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants