Scrub Passes but rm Panics #4083

gordan-bobic · 2015-12-10T02:04:09Z

 # unzip broken-fs.zfs-send.zip
 # pv broken-fs.zfs-send | zfs receive testpool/testfs
 # zpool scrub testpool
 _no errors get detected_
 # rm -rf /testpool/testfs/*
 _kernel panics_

broken-fs.zfs-send.zip

The text was updated successfully, but these errors were encountered:

behlendorf · 2015-12-10T22:10:45Z

if you still have the backtrace from the panic could you post it in to the bug.

gordan-bobic · 2015-12-10T22:12:25Z

I don't have it, but it is reproducible. You should be able to gather your own by doing what I described above. Does it not trigger the kernel panic for you?

behlendorf · 2015-12-10T22:27:43Z

@gordan-bobic I haven't had a chance to try yet, but I don't doubt it will. I just thought you might have it handy. Thanks for the reproducer.

Gregy · 2016-01-26T15:24:10Z

Hi, I just played with this and thought I would post the panic

[  653.015750] SPL: Loaded module v0.6.5-1
[  653.040102] ZFS: Loaded module v0.6.5.2-2, ZFS pool version 5000, ZFS filesystem version 5
[  657.432534]  vdb: unknown partition table
[  657.589028]  vdb: vdb1 vdb9
[  657.591007]  vdb: vdb1 vdb9
[  657.619277] SPL: using hostid 0x007f0101
[ 1631.766142] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f8
[ 1631.766267] IP: [<ffffffffa059c6de>] zfs_remove+0x23e/0x4d0 [zfs]
[ 1631.766359] PGD 3e396067 PUD 794a0067 PMD 0 
[ 1631.766429] Oops: 0000 [#1] SMP 
[ 1631.766478] Modules linked in: zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) xt_multiport iptable_filter ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc joydev hid_generic crc32_pclmul usbhid hid ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse pcspkr serio_raw virtio_balloon vmwgfx ttm drm_kms_helper drm i2c_piix4 i2c_core pvpanic parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net crct10dif_pclmul crct10dif_common crc32c_intel floppy ata_piix uhci_hcd ehci_hcd virtio_pci virtio_ring virtio usbcore usb_common libata scsi_mod
[ 1631.767509] CPU: 0 PID: 23108 Comm: rm Tainted: P           O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u3
[ 1631.767636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1631.767759] task: ffff880036e5f4b0 ti: ffff88003e0a4000 task.ti: ffff88003e0a4000
[ 1631.767857] RIP: 0010:[<ffffffffa059c6de>]  [<ffffffffa059c6de>] zfs_remove+0x23e/0x4d0 [zfs]
[ 1631.768006] RSP: 0018:ffff88003e0a7d98  EFLAGS: 00010246
[ 1631.768006] RAX: 0000000000000000 RBX: ffff88007a108e80 RCX: ffff880036e03f48
[ 1631.768006] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007a108e80
[ 1631.768006] RBP: ffff88007bec6000 R08: 0000000000000000 R09: ffff88007a5357c0
[ 1631.768006] R10: ffff88003e0a7bb0 R11: 001e01bf10000000 R12: ffff880079c85168
[ 1631.768006] R13: ffff88003e0f26b8 R14: 0000000000000000 R15: ffff880079c85380
[ 1631.768006] FS:  00007f2968bfe700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1631.768006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1631.768006] CR2: 00000000000001f8 CR3: 000000003e398000 CR4: 00000000001407f0
[ 1631.768006] Stack:
[ 1631.768006]  0000000000000000 ffff8800428ffa10 ffff880079e66580 00000000000000cd
[ 1631.768006]  ffff88007bec63b0 ffff88003e072800 ffff880079e66580 ffff88003e0f24a0
[ 1631.768006]  0000000000000000 000000000000003e 0000000000000000 ffff88003e3a7140
[ 1631.768006] Call Trace:
[ 1631.768006]  [<ffffffffa05b6d7c>] ? zpl_unlink+0x4c/0x70 [zfs]
[ 1631.768006]  [<ffffffff811b51a2>] ? vfs_unlink+0xd2/0x140
[ 1631.768006]  [<ffffffff811b5485>] ? do_unlinkat+0x275/0x2c0
[ 1631.768006]  [<ffffffff810852a1>] ? task_work_run+0x91/0xb0
[ 1631.768006]  [<ffffffff81012e99>] ? do_notify_resume+0x69/0xa0
[ 1631.768006]  [<ffffffff81513d0d>] ? system_call_fast_compare_end+0x10/0x15
[ 1631.768006] Code: 48 89 ef e8 75 a7 00 00 48 8b 44 24 38 48 89 df ba 01 00 00 00 48 8b b0 f8 01 00 00 e8 bc ac f8 ff 48 8b 44 24 40 31 d2 48 89 df <48> 8b b0 f8 01 00 00 e8 a6 ac f8 ff e9 31 fe ff ff 90 48 8b 7c 
[ 1631.768006] RIP  [<ffffffffa059c6de>] zfs_remove+0x23e/0x4d0 [zfs]
[ 1631.768006]  RSP <ffff88003e0a7d98>
[ 1631.768006] CR2: 00000000000001f8
[ 1631.773443] ---[ end trace c0524f6b10302010 ]---

loli10K · 2016-11-09T18:35:15Z

I don't know how you managed to get to this point but i guess it may be possible to delete an xattr (dataset with xattr=on or xattr=dir) without updating the file pointing to it: this will panic the kernel when you actually try to delete the file in zfs_remove.

This file here is expected to have an xattr object 67:

root@debian-8-zfs:~# ls -i /tank/recv/etc/alternatives/mta-sendmailman
472 /tank/recv/etc/alternatives/mta-sendmailman
root@debian-8-zfs:~# zdb -dddd tank/recv 472
Dataset tank/recv [ZPL], ID 72, cr_txg 5, 5.03M, 1099 objects, rootbp DVA[0]=<0:556200:200> DVA[1]=<0:556400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=17L/17P fill=1099 cksum=fbe1049f5:5f10cb4a5a2:128b78defd487:27c145bc1891b3

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       472    1    16K    512      0     512    512    0.00  ZFS plain file
                                               225   bonus  System attributes
    dnode flags: USERUSED_ACCOUNTED 
    dnode maxblkid: 0
    path    /etc/alternatives/mta-sendmailman
    uid     0
    gid     0
    atime   Wed Oct 26 21:48:58 2016
    mtime   Sun Sep 20 18:34:20 2015
    ctime   Wed Nov  4 00:53:00 2015
    crtime  Wed Nov  4 00:52:39 2015
    gen 175
    mode    120777
    size    41
    parent  212
    links   1
    pflags  40800000104
    xattr   67

Too bad we already deleted it:

root@debian-8-zfs:~# zdb -dddd tank/recv 67
Dataset tank/recv [ZPL], ID 72, cr_txg 5, 5.03M, 1099 objects, rootbp DVA[0]=<0:556200:200> DVA[1]=<0:556400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=17L/17P fill=1099 cksum=fbe1049f5:5f10cb4a5a2:128b78defd487:27c145bc1891b3

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
zdb: dmu_bonus_hold(67) failed, errno 2

When compiled with --enable-debug ZFS will fail the ASSERT0(error), otherwise it just drives on and references xzp->z_sa_hdl with xzp being NULL.

(gdb) bt
#0  zfs_remove (dip=0xffff88002a2d35c8, name=0xffff88002af0c278 "mta-sendmailman", cr=0xffff88002a98a9c0, flags=0) at /usr/src/zfs-zfs-0.7.0-rc1/module/zfs/zfs_vnops.c:1616
#1  0xffffffffc06a4d3b in zpl_unlink (dir=0xffff88002a2d35c8, dentry=0xffff88002af0c240) at /usr/src/zfs-zfs-0.7.0-rc1/module/zfs/zpl_inode.c:223
#2  0xffffffff812019a7 in vfs_unlink (dir=0xffff88002a2d35c8, dentry=0xffff88002af0c240, delegated_inode=0xffff880059a87ef0) at fs/namei.c:3842
#3  0xffffffff81203919 in do_unlinkat (dfd=-100, pathname=<optimized out>) at fs/namei.c:3907
#4  0xffffffff815cca36 in system_call_fast_compare_end () at arch/x86/entry/entry_64.S:211
#5  0x00007f2a3c105678 in ?? ()
#6  0x0000000000001100 in irq_stack_union ()
#7  0x0000000000001111 in irq_stack_union ()
#8  0x00007ffc1988faa0 in ?? ()
#9  0x0000000000000000 in ?? ()
(gdb) list zfs_vnops.c:1613
1608    
1609        /* are there any extended attributes? */
1610        error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zsb),
1611            &xattr_obj, sizeof (xattr_obj));
1612        if (error == 0 && xattr_obj) {
1613            error = zfs_zget(zsb, xattr_obj, &xzp);
1614            ASSERT0(error);
1615            dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
1616            dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);
1617        }
(gdb) p error
$2 = 2
(gdb) p xattr_obj
$3 = 67
(gdb) p zp->z_id
$4 = 472
(gdb)

This is reproducible on Illumos too:

panic[cpu0]/thread=ffffff00e6d6e800: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff0002957af0 addr=e8 occurred in module "zfs" due to a NULL pointer dereference


rm: 
#pf Page fault
Bad kernel fault at addr=0xe8
pid=3775, pc=0xfffffffff7d785d5, sp=0xffffff0002957be0, eflags=0x10246
cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 406f8<osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: e8
cr3: 1ddd6000
cr8: 0

        rdi: ffffff00f7a67e40 rsi: ffffff00e70f5b58 rdx:                0
        rcx:              15c  r8:   a2c6d13eacb74d  r9: fffffffff7d94910
        rax:                0 rbx: ffffff00f7a67e40 rbp: ffffff0002957d10
        r10: fffffffffb8569bc r11: fffffffffbc32fc0 r12:                0
        r13: ffffff00ec48a000 r14: ffffff00ea581340 r15: ffffff00f827c9d8
        fsb:                0 gsb: fffffffffbc32fc0  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                e err:                0 rip: fffffffff7d785d5
         cs:               30 rfl:            10246 rsp: ffffff0002957be0
         ss:               38

ffffff00029579d0 unix:die+df ()
ffffff0002957ae0 unix:trap+e18 ()
ffffff0002957af0 unix:cmntrap+e6 ()
ffffff0002957d10 zfs:zfs_remove+365 ()
ffffff0002957d90 genunix:fop_remove+5b ()
ffffff0002957e60 genunix:vn_removeat+382 ()
ffffff0002957eb0 genunix:unlinkat+59 ()
ffffff0002957f10 unix:brand_sys_sysenter+1d3 ()

addr=0xe8 is just z_sa_hdl offset to our xzp (0x00).

> ::offsetof znode_t z_sa_hdl
offsetof (znode_t, z_sa_hdl) = 0xe8, sizeof (...->z_sa_hdl) = 8

I'm no ZFS expert and i may be wrong here (haven't studied all the code yet) but if we batch process xattrs separately from files we could probably force xattr deletes to be in a different txg than the actual file deletes.
Now, if the system crashes between these two txgs, the "xattrs" txg already committed and the "files" txg lost because of the crash ZFS is theoretically still consistent on disk and will complete a scrub without errors but will promptly crash (again) when we try to delete files from the "lost" txg which point to deleted xattrs.

behlendorf · 2016-11-09T21:49:50Z

@loli10K thanks for looking in to this and the clear analysis.

I also don't see how it's possible to get a pool in to this state. During an unlink the xattr directory object id is removed from the file's SAs and added to the unlinked set in the same transaction. That will prevent us from even being able to get in to this inconsistent state... yet this pool clearly exists. Perhaps it was possible in an earlier version of the code. We did address several issues in this area over the last year.

The good news is that it should be fairly straight forward to add some error handling code to cleanly handle this case. An ENOENT error can be treated the same way a file with no xattrs is handled. All other error's could be left as fatal, but while we're here we might as well cleanly unwind the function and return them to the caller.

Adds a test pool which designed to be a minimal reproducer of openzfs/zfs#4083. The pool contains a single filesystem "tank" with only a file "file2" in it. When we try to delete the file we trigger an assertion in zfs_remove(): this is because the file xattr obj was already deleted. This data discrepancy is not detected by a scrub which completes successfully. Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3

stale · 2020-08-25T00:58:41Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

behlendorf added this to the 0.7.0 milestone Dec 10, 2015

behlendorf modified the milestones: 0.8.0, 0.7.0 Mar 26, 2016

loli10K mentioned this issue Jan 13, 2017

Add minimal reproducer for ZoL issue 4083 openzfs/zfs-images#3

Merged

behlendorf removed this from the 0.8.0 milestone Feb 9, 2018

loli10K mentioned this issue Sep 20, 2018

ZFS determines an incorrect size for an ACL, resulting in a kernel panic #7910

Closed

stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020

stale bot closed this as completed Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrub Passes but rm Panics #4083

Scrub Passes but rm Panics #4083

gordan-bobic commented Dec 10, 2015 •

edited by behlendorf

Loading

behlendorf commented Dec 10, 2015

gordan-bobic commented Dec 10, 2015

behlendorf commented Dec 10, 2015

Gregy commented Jan 26, 2016

loli10K commented Nov 9, 2016

behlendorf commented Nov 9, 2016

stale bot commented Aug 25, 2020

Scrub Passes but rm Panics #4083

Scrub Passes but rm Panics #4083

Comments

gordan-bobic commented Dec 10, 2015 • edited by behlendorf Loading

behlendorf commented Dec 10, 2015

gordan-bobic commented Dec 10, 2015

behlendorf commented Dec 10, 2015

Gregy commented Jan 26, 2016

loli10K commented Nov 9, 2016

behlendorf commented Nov 9, 2016

stale bot commented Aug 25, 2020

gordan-bobic commented Dec 10, 2015 •

edited by behlendorf

Loading