Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improper STATE reporting when disk UNAVAIL due to corruption #4653

Closed
jimsalterjrs opened this issue May 16, 2016 · 9 comments
Closed

improper STATE reporting when disk UNAVAIL due to corruption #4653

jimsalterjrs opened this issue May 16, 2016 · 9 comments

Comments

@jimsalterjrs
Copy link

jimsalterjrs commented May 16, 2016

Accidentally discovered today that zpool status improperly reports pool and vdev as ONLINE, not DEGRADED, when a disk has been failed entirely out of the vdev as UNAVAIL due to corrupt metadata.

http://jrs-s.net/2016/05/16/zfs-practicing-failures/

TL;DR - pool with two 2-disk mirror vdevs:

root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    ONLINE       0     0     0
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

Corrupt all blocks on disk nbd0, scrub, and check status:

root@banshee:~# pv < /dev/zero > /dev/nbd0
pv: write failed: No space left on device<=>                                                 ]
root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:36:38 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    UNAVAIL      0     0 1.40K  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

The human-readable status message for the pool is correct, but the STATE data for both pool test and vdev mirror-0 are showing ONLINE, where they should be showing DEGRADED.

Physically removing disk nbd0 does cause both pool and vdev to show DEGRADED status properly:

root@banshee:~# qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected
root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:42:35 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        DEGRADED     0     0     0
      mirror-0  DEGRADED     0     0     0
        nbd0    UNAVAIL      0     0 1.40K  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

I'm guessing this is a corner case that nobody really tested? Most automated alerting/monitoring systems are going to be looking at the STATE flags, not the human-readable error message. Could be bad if a disk blows out this way in production and nobody knows because the monitoring system never sends an alarm.

@dasjoe
Copy link
Contributor

dasjoe commented May 16, 2016

Out of interest, what does zpool status -x report for the degraded pool?

@jimsalterjrs
Copy link
Author

jimsalterjrs commented May 19, 2016

Out of interest, what does zpool status -x report for the degraded pool?

root@locutus:/data/test# zpool status -x test
  pool: test
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu May 19 16:29:24 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    UNAVAIL      0     0     0  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

Same output as without the -x.

At least it doesn't just say "pool is healthy". Still, not great.

@joshuaimmanuel
Copy link

joshuaimmanuel commented Mar 13, 2017

Like wise, when the disk which is in raidz configuration is removed physically, still the status reports the removed disk as online and the zpool status -x reports "all pools are healthy"

Only after doing a zpool clear <pool-name>, and then running the zpool status reports the disk as unavailable

Similarly, when reconnecting the drive back, only after doing a zpool clear <pool-name> that disk becomes online. Until then, the zpool status reports the disk as unavailable.

So, who is responsible for maintaining the disk status? I thought the zfs module will do it.

@behlendorf
Copy link
Contributor

@joshuaimmanuel the issue here is the kernel module doesn't receive any notification of the drive removal until it attempts to perform some kind of IO to it. It's only then that it can realize the drive was removed. The good news is that this issue was addressed by master. The ZED now monitors udev device add/remove events for the system manages the drives accordingly.

@joshuaimmanuel
Copy link

@behlendorf Thanks

@behlendorf
Copy link
Contributor

Closing. This issue was resolved in master with changes to the ZED.

@joshuaimmanuel
Copy link

@behlendorf I couldn't find this issue mentioned in v0.7.0-rc4. Is this fix available in this release?

@behlendorf
Copy link
Contributor

This functionality was merged in several parts to extend the ZED drive management functionality. This work is all in 0.7.0-rc4 with the critical bit you're interested in getting merged in commit d02ca37. The ZED now actively monitors udev events (via libudev) so it will detect things like idle drive removal/addition.

If you have a chance it would be great if you could try it out open new issues if you discover problems.

@tonyhutter
Copy link
Contributor

Note that while zed can detect drive removals via udev, it doesn't currently do anything about it. That is, if zed sees a drive removed it doesn't offline or fault the drive. That may be something we want to look into for future releases. Right now though, the vdev will eventually fault when it gets issued IO, which gives you the same result. I believe the "fault drive on bad IOs" action requires you to be running zed, so make sure you're running it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants