improper STATE reporting when disk UNAVAIL due to corruption #4653

jimsalterjrs · 2016-05-16T17:20:34Z

Accidentally discovered today that zpool status improperly reports pool and vdev as ONLINE, not DEGRADED, when a disk has been failed entirely out of the vdev as UNAVAIL due to corrupt metadata.

http://jrs-s.net/2016/05/16/zfs-practicing-failures/

TL;DR - pool with two 2-disk mirror vdevs:

root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    ONLINE       0     0     0
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

Corrupt all blocks on disk nbd0, scrub, and check status:

root@banshee:~# pv < /dev/zero > /dev/nbd0
pv: write failed: No space left on device<=>                                                 ]
root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:36:38 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    UNAVAIL      0     0 1.40K  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

The human-readable status message for the pool is correct, but the STATE data for both pool test and vdev mirror-0 are showing ONLINE, where they should be showing DEGRADED.

Physically removing disk nbd0 does cause both pool and vdev to show DEGRADED status properly:

root@banshee:~# qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected
root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:42:35 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        DEGRADED     0     0     0
      mirror-0  DEGRADED     0     0     0
        nbd0    UNAVAIL      0     0 1.40K  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

I'm guessing this is a corner case that nobody really tested? Most automated alerting/monitoring systems are going to be looking at the STATE flags, not the human-readable error message. Could be bad if a disk blows out this way in production and nobody knows because the monitoring system never sends an alarm.

The text was updated successfully, but these errors were encountered:

dasjoe · 2016-05-16T17:23:49Z

Out of interest, what does zpool status -x report for the degraded pool?

jimsalterjrs · 2016-05-19T20:30:21Z

Out of interest, what does zpool status -x report for the degraded pool?

root@locutus:/data/test# zpool status -x test
  pool: test
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu May 19 16:29:24 2016
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        nbd0    UNAVAIL      0     0     0  corrupted data
        nbd1    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        nbd2    ONLINE       0     0     0
        nbd3    ONLINE       0     0     0

errors: No known data errors

Same output as without the -x.

At least it doesn't just say "pool is healthy". Still, not great.

joshuaimmanuel · 2017-03-13T08:18:43Z

Like wise, when the disk which is in raidz configuration is removed physically, still the status reports the removed disk as online and the zpool status -x reports "all pools are healthy"

Only after doing a zpool clear <pool-name>, and then running the zpool status reports the disk as unavailable

Similarly, when reconnecting the drive back, only after doing a zpool clear <pool-name> that disk becomes online. Until then, the zpool status reports the disk as unavailable.

So, who is responsible for maintaining the disk status? I thought the zfs module will do it.

behlendorf · 2017-03-13T17:12:04Z

@joshuaimmanuel the issue here is the kernel module doesn't receive any notification of the drive removal until it attempts to perform some kind of IO to it. It's only then that it can realize the drive was removed. The good news is that this issue was addressed by master. The ZED now monitors udev device add/remove events for the system manages the drives accordingly.

joshuaimmanuel · 2017-03-14T05:23:31Z

@behlendorf Thanks

behlendorf · 2017-03-14T17:29:48Z

Closing. This issue was resolved in master with changes to the ZED.

joshuaimmanuel · 2017-06-07T06:49:12Z

@behlendorf I couldn't find this issue mentioned in v0.7.0-rc4. Is this fix available in this release?

behlendorf · 2017-06-07T17:13:29Z

This functionality was merged in several parts to extend the ZED drive management functionality. This work is all in 0.7.0-rc4 with the critical bit you're interested in getting merged in commit d02ca37. The ZED now actively monitors udev events (via libudev) so it will detect things like idle drive removal/addition.

If you have a chance it would be great if you could try it out open new issues if you discover problems.

tonyhutter · 2017-06-07T17:37:38Z

Note that while zed can detect drive removals via udev, it doesn't currently do anything about it. That is, if zed sees a drive removed it doesn't offline or fault the drive. That may be something we want to look into for future releases. Right now though, the vdev will eventually fault when it gets issued IO, which gives you the same result. I believe the "fault drive on bad IOs" action requires you to be running zed, so make sure you're running it.

behlendorf closed this as completed Mar 14, 2017

exitcomestothis mentioned this issue Mar 13, 2020

No zed alert if pool is degraded or if vdev unavailable #10123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improper STATE reporting when disk UNAVAIL due to corruption #4653

improper STATE reporting when disk UNAVAIL due to corruption #4653

jimsalterjrs commented May 16, 2016 •

edited

Loading

dasjoe commented May 16, 2016

jimsalterjrs commented May 19, 2016 •

edited

Loading

joshuaimmanuel commented Mar 13, 2017 •

edited

Loading

behlendorf commented Mar 13, 2017

joshuaimmanuel commented Mar 14, 2017

behlendorf commented Mar 14, 2017

joshuaimmanuel commented Jun 7, 2017

behlendorf commented Jun 7, 2017

tonyhutter commented Jun 7, 2017

improper STATE reporting when disk UNAVAIL due to corruption #4653

improper STATE reporting when disk UNAVAIL due to corruption #4653

Comments

jimsalterjrs commented May 16, 2016 • edited Loading

dasjoe commented May 16, 2016

jimsalterjrs commented May 19, 2016 • edited Loading

joshuaimmanuel commented Mar 13, 2017 • edited Loading

behlendorf commented Mar 13, 2017

joshuaimmanuel commented Mar 14, 2017

behlendorf commented Mar 14, 2017

joshuaimmanuel commented Jun 7, 2017

behlendorf commented Jun 7, 2017

tonyhutter commented Jun 7, 2017

jimsalterjrs commented May 16, 2016 •

edited

Loading

jimsalterjrs commented May 19, 2016 •

edited

Loading

joshuaimmanuel commented Mar 13, 2017 •

edited

Loading