Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VERIFY(0 == dmu_buf_hold_array()) in dmu_write() #1440

Closed
behlendorf opened this issue May 3, 2013 · 2 comments
Closed

VERIFY(0 == dmu_buf_hold_array()) in dmu_write() #1440

behlendorf opened this issue May 3, 2013 · 2 comments

Comments

@behlendorf
Copy link
Contributor

Andrei Mikhailovsky wrote:

I was wondering if someone could help me with sorting out an issue when trying to mount a zfs filesystem.

I had a disk failure on a raidz1. The disk has been replaced, resilvered and zpool scrub has been ran. Unfortunately, it was unable to repair the errors and there are still a large number of errors in the zfs pool. Despite that I am still able to mount 2 out of 3 filesystems. The trouble is majority of important data is on a 3rd filesystem which I am unable to mount. I get an SPL PANIC. Please see below the information:

Server is Ubuntu Server 12.10 64bit with up to date patch level and zfs version 0.6.1 installed from the stable ppa.

ZFS Pool details (compression = lz4; deduplication = off):

zpool status
  pool: zfs-mirror
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 6h33m with 15722 errors on Thu May  2 03:44:56 2013
config:

        NAME                        STATE     READ WRITE CKSUM
        zfs-mirror                  ONLINE       0     0 31.0K
          raidz1-0                  ONLINE       0     0 63.4K
            scsi-20004d927fffff800  ONLINE       0     0     0
            scsi-20004d927fffff810  ONLINE       0     0     0
            scsi-20004d927fffff820  ONLINE       0     0     0
            scsi-20004d927fffff830  ONLINE       0     0     0
            scsi-20004d927fffff840  ONLINE       0     0     0
            scsi-20004d927fffff850  ONLINE       0     0     0
            scsi-20004d927fffff860  ONLINE       0     0     0
            sdh                     ONLINE       0     0     0

errors: 15723 data errors, use '-v' for a list
SPL PANIC error when trying to zfs mount in dmesg:

[107118.564070] VERIFY(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed
[107118.564137] SPLError: 1380:0:(dmu.c:791:dmu_write()) SPL PANIC
[107118.564168] SPL: Showing stack for process 1380
[107118.564173] Pid: 1380, comm: txg_sync Tainted: P           O 3.5.0-27-generic #46-Ubuntu
[107118.564175] Call Trace:
[107118.564214]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[107118.564226]  [] spl_debug_bug+0x7f/0xe0 [spl]
[107118.564301]  [] dmu_write+0x167/0x170 [zfs]
[107118.564316]  [] ? kmem_alloc_debug+0x96/0x3b0 [spl]
[107118.564361]  [] space_map_sync+0x236/0x3c0 [zfs]
[107118.564400]  [] ? space_map_remove+0x320/0x320 [zfs]
[107118.564436]  [] metaslab_sync+0x12b/0x360 [zfs]
[107118.564470]  [] ? dsl_scan_sync+0x4f/0xa90 [zfs]
[107118.564482]  [] ? mutex_lock+0x1d/0x50
[107118.564520]  [] vdev_sync+0x73/0x140 [zfs]
[107118.564557]  [] spa_sync+0x453/0xa60 [zfs]
[107118.564595]  [] txg_sync_thread+0x32b/0x590 [zfs]
[107118.564633]  [] ? txg_init+0x250/0x250 [zfs]
[107118.564647]  [] thread_generic_wrapper+0x78/0x90 [spl]
[107118.564659]  [] ? __thread_create+0x340/0x340 [spl]
[107118.564670]  [] kthread+0x93/0xa0
[107118.564678]  [] kernel_thread_helper+0x4/0x10
[107118.564684]  [] ? kthread_freezable_should_stop+0x70/0x70
[107118.564689]  [] ? gs_change+0x13/0x13


[107280.428061] INFO: task txg_sync:1380 blocked for more than 120 seconds.
[107280.428068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[107280.428073] txg_sync        D ffff88052fc939c0     0  1380      2 0x00000000
[107280.428080]  ffff880503ca5a70 0000000000000046 ffff88050638c500 ffff880503ca5fd8
[107280.428087]  ffff880503ca5fd8 ffff880503ca5fd8 ffff880512d1ae00 ffff88050638c500
[107280.428092]  ffffffff81c13347 0000000000000000 0000000000000000 0000000000039fe0
[107280.428098] Call Trace:
[107280.428115]  [] schedule+0x29/0x70
[107280.428147]  [] spl_debug_bug+0xb5/0xe0 [spl]
[107280.428213]  [] dmu_write+0x167/0x170 [zfs]
[107280.428228]  [] ? kmem_alloc_debug+0x96/0x3b0 [spl]
[107280.428272]  [] space_map_sync+0x236/0x3c0 [zfs]
[107280.428312]  [] ? space_map_remove+0x320/0x320 [zfs]
[107280.428348]  [] metaslab_sync+0x12b/0x360 [zfs]
[107280.428382]  [] ? dsl_scan_sync+0x4f/0xa90 [zfs]
[107280.428389]  [] ? mutex_lock+0x1d/0x50
[107280.428427]  [] vdev_sync+0x73/0x140 [zfs]
[107280.428464]  [] spa_sync+0x453/0xa60 [zfs]
[107280.428502]  [] txg_sync_thread+0x32b/0x590 [zfs]
[107280.428540]  [] ? txg_init+0x250/0x250 [zfs]
[107280.428553]  [] thread_generic_wrapper+0x78/0x90 [spl]
[107280.428565]  [] ? __thread_create+0x340/0x340 [spl]
[107280.428575]  [] kthread+0x93/0xa0
[107280.428582]  [] kernel_thread_helper+0x4/0x10
[107280.428588]  [] ? kthread_freezable_should_stop+0x70/0x70
[107280.428592]  [] ? gs_change+0x13/0x13

As per my conversation with couple of guys on the IRC #zfsonlinux channel I have been suggested to try booting into the latest FreeBSD and also OpenIndiana and see if the filesystem in question could me mounted. I have done zpool export on the ubuntu server and tried to import it from a bootable live usb stick. The FreeBSD 9.1 doesn't have support for the zfs v5000, so I was unable to import it. The OpenIndiana live CD didn't have support for the lz4 compression, so once again, was unable to import the pool.

I have tried booting into SmartOS, but even though the pool was visible and supported by the version of zfs that came with the SmartOS I had an OS panic and a reboot when trying to import the pool with zpool import .

I am running out of ideas on what else to try and I really need to get the data from that partition. Whatever is salvageable. Is there any other way I could get to the data? I am happy to try any patches that could address the SPL Panic or do more debugging using your instructions.

Many thanks in advance for your suggestions.

Andrei

@behlendorf
Copy link
Contributor Author

I tiny debug patch to report the offending error code in the VERIFY error message.

diff --git a/module/zfs/dmu.c b/module/zfs/dmu.c
index e856356..de946fd 100644
--- a/module/zfs/dmu.c
+++ b/module/zfs/dmu.c
@@ -787,7 +787,7 @@ dmu_write(objset_t *os, uint64_t object, uint64_t offset, ui
        if (size == 0)
                return;

-       VERIFY(0 == dmu_buf_hold_array(os, object, offset, size,
+       VERIFY3S(0, ==, dmu_buf_hold_array(os, object, offset, size,
            FALSE, FTAG, &numbufs, &dbp));

        for (i = 0; i < numbufs; i++) {

@behlendorf behlendorf removed this from the 0.6.5 milestone Oct 3, 2014
behlendorf added a commit to behlendorf/zfs that referenced this issue Oct 3, 2014
This is a debug patch designed to ensure an error code is logged
to the console when this VERIFY() is hit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1440
behlendorf added a commit to behlendorf/zfs that referenced this issue Oct 8, 2014
This is a debug patch designed to ensure an error code is logged
to the console when this VERIFY() is hit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#1440
@behlendorf
Copy link
Contributor Author

The additional debugging has been merged. If this issue occurs again the error code will be logged.

ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
This is a debug patch designed to ensure an error code is logged
to the console when this VERIFY() is hit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#1440
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant