New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to clear permanent errors. #2617

Open
fling- opened this Issue Aug 21, 2014 · 10 comments

Comments

4 participants
@fling-
Contributor

fling- commented Aug 21, 2014

studio ~ # zpool clear studio-striped
studio ~ # zpool clear studio-striped studio-striped_leg_a
studio ~ # zpool clear studio-striped studio-striped_leg_b
studio ~ # zpool status -v studio-striped
  pool: studio-striped
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 13h25m with 0 errors on Wed Aug 20 10:34:36 2014
config:

        NAME                    STATE     READ WRITE CKSUM
        studio-striped          ONLINE       0     0     0
          studio-striped_leg_a  ONLINE       0     0     0
          studio-striped_leg_b  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x39c>:<0xb>

I had an UNC sector ( represented as 8x logical 512B sectors on the drive).
lost ~50K of data in the middle of a single file (more than 4K because of compression?).
I had rewritten 8 affected sectors then I reimported the pool and destroyed dataset containing the file and tried to clear without luck.
Clear does not work even aftrer a scrub.

@behlendorf behlendorf added this to the 0.8.0 milestone Aug 25, 2014

@behlendorf

This comment has been minimized.

Member

behlendorf commented Aug 25, 2014

@fling- Can you verify that after removing dataset 0x39c (including its snapshots/clones) and then scrubbing the pool the error persists? The scrub should rotate the log and if all references to the damaged blocks were removed you should no longer see the error.

@fling-

This comment has been minimized.

Contributor

fling- commented Aug 25, 2014

@behlendorf Scrub does not help.

@behlendorf behlendorf removed this from the 0.8.0 milestone Oct 11, 2016

@montanaviking

This comment has been minimized.

montanaviking commented Apr 19, 2017

I have a similar issue, namely I got the following errors:
pool: zbackup4
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 13h47m with 0 errors on Wed Apr 19 12:03:40 2017
config:

NAME        STATE     READ WRITE CKSUM
zbackup4    ONLINE       0     0     0
  sdd       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

    <metadata>:<0x0>
    <metadata>:<0x1>
    <metadata>:<0x1d780>
    /zbackup4/home/

#################
The ZFS version is:
$ dmesg | grep ZFS
[ 5.719507] ZFS: Loaded module v0.6.5.9-1~trusty, ZFS pool version 5000, ZFS filesystem version 5
and I'm running Ubuntu 14.04
Sibyl 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

A scrub did not help to clear the errors, nor did zpool clear zbackup4.
zbackup4 This is a backup USB-connected drive with copies=2 to provide some degree of redundancy (for a single drive).
This external drive shows a good SMART status with no reallocated sectors.

I suspect the ZFS errors were caused by a momentary USB interruption and/or ZFS caused error and are likely not physical.

Do I need to go ahead and rebuild this backup pool? Is there a way to clear the error?
Thanks so much,
Phil

@richardelling

This comment has been minimized.

Contributor

richardelling commented Apr 19, 2017

The "permanent errors" list contains information from up to two scrubs (current and previous). If you believe the errors to be transient, then try another scrub.

@montanaviking

This comment has been minimized.

montanaviking commented Apr 19, 2017

@richardelling

This comment has been minimized.

Contributor

richardelling commented Apr 19, 2017

... if they are checksum errors, that really depends on if there were errors reported by the underlying drivers.

If you want to look at the details zpool events is your friend

@montanaviking

This comment has been minimized.

montanaviking commented Apr 19, 2017

@richardelling

This comment has been minimized.

Contributor

richardelling commented Apr 20, 2017

If a checksum or other error is found, there should be an e[rror]report describing its details. These events are since boot or when the zfs driver was loaded. The error log is on-disk. So it is possible that the events reports do not tell the whole story.

@montanaviking

This comment has been minimized.

montanaviking commented Apr 20, 2017

@richardelling

This comment has been minimized.

Contributor

richardelling commented Apr 20, 2017

We're glad to hear the good news!

For the archives, it is not uncommon to see transient errors on USB or similar, hot-pluggable interfaces. From ZFS's perspective, such errors appear as drive errors and are treated as such. But if they are transient, the two-scrub process should clear them out of the error log. If they are permanent (also common on USB disks :-( ) then the details can be displayed by zpool events [-v]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment