Reproducible CKSUM errors after 2 drives taken OFFLINE on raidz2 #5806

thegreatgazoo · 2017-02-18T05:16:37Z

I was able to 100% reproduce it on current/vanilla master (spl at 9704820, zfs at 100790a). Simple to reproduce:

zpool create -f -o ashift=12 tank raidz2 sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq
zpool offline tank sde
zpool offline tank sdn
cp -a spl zfs /tank
zpool online tank sde
zpool status
  scan: resilvered **39.0M** in 0h0m with 0 errors on Sat Feb 18 04:47:19 2017
zpool scrub tank
  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Feb 18 04:47:50 2017
zpool online tank sdn
zpool status
  scan: resilvered **12K** in 0h0m with 0 errors on Sat Feb 18 04:48:26 2017
zpool scrub tank
zpool status
  scan: scrub repaired **38.7M** in 0h0m with 0 errors on Sat Feb 18 04:49:02 2017
config:
        NAME        STATE     READ WRITE CKSUM
            sdn     ONLINE       0     0 5.04K

The two drives were taken offline right after pool creation, so the amount of lost data/parity on them should be fairly close. Looking at the bold text above, the sde resilver fixed 39.0M but the sdn resilver only fixed 12K. So it looked like the 2nd resilver missed quite some blocks. The 2nd scrub fixed 38.7M, and if we add that to the 12K fixed by the 2nd resilver, it'd get fairly close to the 39.0M fixed by the 1st resilver. So looked like the 2nd scrub was actually fixing the blocks missed by the 2nd resilver.

Since the difference between resilver and scrub is that resilver would look at DTL_PARTIAL to decide whether to check a block, I guess something messed up the DTLs before the 2nd resilver - therefore the 1st scrub looked very fishy. Then I did the same thing again, except I didn't do the scrub between the 2 resilvers:

# zpool create -f -o ashift=12 tank raidz2 sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq
# zpool offline tank sde
# zpool offline tank sdn
# cp -a spl zfs /tank
# zpool online tank sde
# zpool status
  scan: resilvered 38.8M in 0h0m with 0 errors on Sat Feb 18 05:02:24 2017
# zpool online tank sdn
# zpool status
  scan: resilvered 38.8M in 0h0m with 0 errors on Sat Feb 18 05:03:00 2017
# zpool status
  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Feb 18 05:03:30 2017

This time there's 0 error, and the 2 resilvers fixed about the same amount of data/parity. Everything above is 100% repeatable. Which seemed to verify my guess that the scrub between resilvers messed the DTL somehow.

The resilver/scrub and raidz code really hasn't changed much - I also used zfs_vdev_raidz_impl="original" to disable the new fancy parity routines - so I'd suspect it'd affect older ZFS versions as well, maybe even ZFS on other OS. This is probably something we'd want to fix before the next release. I realized the 14-drive raidz2 I used in the tests was not a common configuration, but it's not crazy either.

The text was updated successfully, but these errors were encountered:

behlendorf · 2017-02-21T19:55:03Z

Which seemed to verify my guess that the scrub between resilvers messed the DTL somehow.

It definitely appears that way. It looks as if dsl_scan_done()->vdev_dtl_reassess()->vdev_dtl_should_excise() can update the vdev's DTL in memory even when leaf-vdev is offline. If it's not updated properly when the leaf-vdev is re-opened that could explain what's going on.

I agree we should address this before the next tag.

pcd1193182 · 2017-02-24T16:36:59Z

I can confirm that this reproduces on Illumos.

ahrens · 2017-02-24T17:00:12Z

@grwilson may also be interested in this.

loli10K · 2017-04-25T17:34:00Z

Is anyone already working on this? I think this is also reproducible on 2+ disks raidz1, which seems quite troubling.

thegreatgazoo · 2017-04-26T17:58:39Z

@loli10K Do you have a way to reproduce it on raidz1? I had to take 2 drives offline and do IO to reproduce on raidz2, but raidz1 can't do any IO if two drives have been taken offline.

loli10K · 2017-04-26T18:13:59Z

@thegreatgazoo i've never used raidz-n before (all my pools are mirrors) so i may be doing something wrong. That said, reproducer here: https://gist.github.com/loli10K/cc5b56612aa74871397066c2f6ac75d8.

gamanakis · 2017-04-27T01:25:46Z

I can reproduce this (raidz2) on FreeBSD 11.0, too.

ahrens · 2017-04-28T19:30:05Z

@loli10K Thanks for the great script. I was able to reproduce it and I understand what's causing the problem. @grwilson and I are discussing what the best fix will be. The basic problem is:

scrub while one leaf of RAIDZ vdev is offline
vdev_dtl_reassess called, scrub_txg nonzero, scrub_done=1
vdev_dtl_should_excise returns TRUE because vdev_resilver_txg=0 (this was not a resilver)
vdev_dtl[DTL_SCRUB] is empty

Reviewed by: George Wilson george.wilson@delphix.com If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806

Reviewed by: George Wilson george.wilson@delphix.com If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". Closes openzfs#5806

Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". OpenZFS-issue: https://www.illumos.org/issues/8166 OpenZFS-commit: openzfs/openzfs#372 Closes #5806 Closes #6103

Reviewed by: George Wilson george.wilson@delphix.com If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806

Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Closes #372

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> git-svn-id: svn+ssh://svn.freebsd.org/base/head@318943 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>

Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". OpenZFS-issue: https://www.illumos.org/issues/8166 OpenZFS-commit: openzfs/openzfs#372 Closes openzfs#5806 Closes openzfs#6103

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> git-svn-id: https://svn.freebsd.org/base/vendor-sys/illumos/dist@318942 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> git-svn-id: https://svn.freebsd.org/base/head@318943 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> git-svn-id: svn+ssh://svn.freebsd.org/base/head@318943 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806

MFV r318942: 8166 zpool scrub thinks it repaired offline device https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 PR: 219537 Sponsored by: The FreeBSD Foundation

MFV r318942: 8166 zpool scrub thinks it repaired offline device https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 PR: 219537 Approved by: re (kib) Sponsored by: The FreeBSD Foundation

MFV r318942: 8166 zpool scrub thinks it repaired offline device https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 PR: 219537 Sponsored by: The FreeBSD Foundation git-svn-id: https://svn.freebsd.org/base/stable/10@319625 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

MFV r318942: 8166 zpool scrub thinks it repaired offline device https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 PR: 219537 Approved by: re (kib) Sponsored by: The FreeBSD Foundation git-svn-id: https://svn.freebsd.org/base/stable/11@319624 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f

Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". OpenZFS-issue: https://www.illumos.org/issues/8166 OpenZFS-commit: openzfs/openzfs#372 Closes #5806 Closes #6103

illumos/illumos-gate@2d2f193 illumos/illumos-gate@2d2f193 https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806 Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>

Reviewed by: George Wilson george.wilson@delphix.com Reviewed by: Brad Lewis <brad.lewis@delphix.com> If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also openzfs/zfs#5806

behlendorf added this to the 0.7.0 milestone Feb 21, 2017

This was referenced May 5, 2017

OpenZFS 8166 - zpool scrub thinks it repaired offline device #6103

Merged

8166 zpool scrub thinks it repaired offline device openzfs/openzfs#372

Closed

behlendorf closed this as completed in #6103 May 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducible CKSUM errors after 2 drives taken OFFLINE on raidz2 #5806

Reproducible CKSUM errors after 2 drives taken OFFLINE on raidz2 #5806

thegreatgazoo commented Feb 18, 2017 •

edited by behlendorf

Loading

behlendorf commented Feb 21, 2017

pcd1193182 commented Feb 24, 2017

ahrens commented Feb 24, 2017

loli10K commented Apr 25, 2017

thegreatgazoo commented Apr 26, 2017

loli10K commented Apr 26, 2017

gamanakis commented Apr 27, 2017

ahrens commented Apr 28, 2017

Reproducible CKSUM errors after 2 drives taken OFFLINE on raidz2 #5806

Reproducible CKSUM errors after 2 drives taken OFFLINE on raidz2 #5806

Comments

thegreatgazoo commented Feb 18, 2017 • edited by behlendorf Loading

behlendorf commented Feb 21, 2017

pcd1193182 commented Feb 24, 2017

ahrens commented Feb 24, 2017

loli10K commented Apr 25, 2017

thegreatgazoo commented Apr 26, 2017

loli10K commented Apr 26, 2017

gamanakis commented Apr 27, 2017

ahrens commented Apr 28, 2017

thegreatgazoo commented Feb 18, 2017 •

edited by behlendorf

Loading