Resilver restarting? #840

Open
chrisrd opened this Issue Jul 19, 2012 · 47 comments

Comments

@chrisrd
Contributor

chrisrd commented Jul 19, 2012

I used zpool replace ${pool} ${guid} ${dev} to replace one dead and one dying disk, which started a resilver. I've been monitoring the progress and have seen it apparently restart, e.g.:

while true; do ts=$(date +"%F-%T"); echo -e "$ts \c"; zpool status pool2 | egrep scanned; sleep 30; done
...
2012-07-19-13:55:09     1.59T scanned out of 29.9T at 286M/s, 28h49m to go
2012-07-19-13:55:39     1.60T scanned out of 29.9T at 286M/s, 28h48m to go
2012-07-19-13:56:09     53.7M scanned out of 29.9T at 1.99M/s, (scan is slow, no estimated time)
2012-07-19-13:56:39     89.4M scanned out of 29.9T at 1.57M/s, (scan is slow, no estimated time)
...

In conjunction with the apparent restart there's a disk error reported in the kern.log:

Jul 19 13:55:35 b5 kernel: [81414.117130] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 19 13:55:35 b5 kernel: [81414.117194] sd 0:0:57:0: [sdbc] Unhandled sense code
Jul 19 13:55:35 b5 kernel: [81414.117223] sd 0:0:57:0: [sdbc]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Jul 19 13:55:35 b5 kernel: [81414.117276] sd 0:0:57:0: [sdbc]  Sense Key : Medium Error [current] 
Jul 19 13:55:35 b5 kernel: [81414.117311] Info fld=0x5066f2a4
Jul 19 13:55:35 b5 kernel: [81414.117333] sd 0:0:57:0: [sdbc]  Add. Sense: Unrecovered read error
Jul 19 13:55:35 b5 kernel: [81414.117366] sd 0:0:57:0: [sdbc] CDB: Read(10): 28 00 50 66 f2 6c 00 00 39 00
Jul 19 13:55:35 b5 kernel: [81414.117422] end_request: critical target error, dev sdbc, sector 1348924012
Jul 19 13:55:35 b5 kernel: [81414.117455] ZFS: zio error=121 type=1 offset=688647362560 size=29184 flags=20668 delay=7000

...where sdbc is one of the drives being replaced.

This has happened at least 4 or 5 times since the replaces were originally done some 24 hours ago, and the 3 times I've actually watched it happen have all been with errors on sdbc (at different sectors).

On the other hand, there've been other disk errors, including (I think) on sdbc, that haven't caused the resilver to apparently restart.

Is this normal, and/or can I expect the resilver to actually finish, or is it stuck in a loop?

Oh, I've just realised I have a few dumps of the complete zfs status taken during all this, which show that as far as the in progress since... message is concerned the resilver really is restarting. E.g. from this morning vs now:

scan: resilver in progress since Thu Jul 19 05:08:45 2012
scan: resilver in progress since Thu Jul 19 14:47:10 2012

The complete current zpool status, with the /dev/mapper devices annotated with their matching /dev device:

  pool: pool2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Jul 19 14:47:10 2012
    653G scanned out of 29.9T at 392M/s, 21h43m to go
    39.4G resilvered, 2.13% done
config:

    NAME                                        STATE     READ WRITE CKSUM
    pool2                                  DEGRADED     0     0     0
      raidz2-0                                  ONLINE       0     0     0
        sdbk  Hitachi_HDS5C30_ML0220F30E2DMD    ONLINE       0     0     0
        sdbi  Hitachi_HDS5C30_ML0221F30B7ZND    ONLINE       0     0     0
        sdac  Hitachi_HDS7230_MN1240F33XBYYD    ONLINE       0     0     0
        sdad  Hitachi_HDS7230_MN1240FA03E4HD    ONLINE       0     0     0
        sdab  Hitachi_HDS7230_MN1240FA0HUZAD    ONLINE       0     0     0
        sdaa  Hitachi_HDS7230_MN1240FA0JXU2D    ONLINE       0     0     0
        sdag  Hitachi_HDS7230_MN1240FA0K07YD    ONLINE       0     0     0
        sdaf  Hitachi_HDS7230_MN1240FA0KP7YD    ONLINE       0     0     0
        sdah  Hitachi_HDS7230_MN1240FA0LA36D    ONLINE       0     0     0
        sdae  Hitachi_HDS7230_MN1240FA0LMP9D    ONLINE       0     0     0
        sdz   Hitachi_HDS7230_MN1270FA0SV8ED    ONLINE       0     0     0
      raidz2-1                                  ONLINE       0     0     0
        sdl   ST2000DL003-9VT_5YD1V6ZL          ONLINE       0     0     0
        sdaj  ST2000DL003-9VT_5YD1W04H          ONLINE       0     0     0
        sdn   ST2000DL003-9VT_5YD1XEDS          ONLINE       0     0     0
        sdi   ST2000DL003-9VT_5YD1XF0K          ONLINE       0     0     0
        sdak  ST2000DL003-9VT_5YD1XQ15          ONLINE       0     0     0
        sdo   ST2000DL003-9VT_5YD1Z9BZ          ONLINE       0     0     0
        sdbf  WDC_WD20EARX-00_WD-WMAZA5589913   ONLINE       0     0  816K
        sdai  ST2000DL003-9VT_5YD563JW          ONLINE       0     0     0
        sds   ST2000DL003-9VT_5YD56MEZ          ONLINE       0     0     0
        sdy   ST2000DL003-9VT_5YD5FWQ6          ONLINE       0     0     0
        sdbs  WDC_WD20EARX-00_WD-WMAZA5567696   ONLINE       0     0     0
      raidz2-2                             DEGRADED     3     0     0
        sdbe  WDC_WD20EADS-00_WD-WCAVY1129416   ONLINE     110     0     1  (resilvering)
        sdaz  WDC_WD20EADS-00_WD-WCAVY1162803   ONLINE     120     0     2
              replacing-2                       ONLINE       0     0    39
          sdbc  WDC_WD20EADS-00_WD-WCAVY1184604 ONLINE      70     0     0  (resilvering)
          sdbj  WDC_WD20EARX-00_WD-WMAZA5603921 ONLINE       0     0     0  (resilvering)
        sdbh  WDC_WD20EADS-00_WD-WCAVY1188798   ONLINE     158     0     0
              replacing-4                       UNAVAIL      0     0     0
                17433600781902467301            UNAVAIL      0     0     0  was /dev/mapper/WDC_WD20EADS-00_WD-WCAVY1188828
          sdba  Hitachi_HDS7230_MN1240FA0LHJLD  ONLINE       0     0     0  (resilvering)
        sdbr  WDC_WD20EADS-00_WD-WCAVY1188933   ONLINE     153     0     2
        sdbt  WDC_WD20EADS-00_WD-WCAVY7404957   ONLINE       0     0     0
        sdbg  WDC_WD20EARX-00_WD-WCAZA9871198   ONLINE       0     0     0
        sdbq  WDC_WD20EARX-00_WD-WMAZA5346235   ONLINE       0     0     0
        sdbp  WDC_WD20EARX-00_WD-WMAZA5534735   ONLINE       0     0     0
        sdbl  WDC_WD20EARX-00_WD-WMAZA5561260   ONLINE       0     0     0

errors: No known data errors
@atonkyra

This comment has been minimized.

Show comment Hide comment
@atonkyra

atonkyra Jul 19, 2012

I actually had this happen on vanilla linux mdraid once. Basically linux kernel tries to read the sector and HDD fails to do it since the sector has corrupted/become unreadable for some reason. This in turn caused the raid sync to fail.

There is a dangerous "fix" available. In order to force a reallocation of the broken HDD sector, you can for example do hdparm --write-sector . Note that this is really very dangerous as it effectively zeroes out a whole HDD sector and I don't really know how ZFS will deal with the situation. Your pool is already degraded which potentially worsens the situation, for example data in those sectors might not be redundant anymore (you have several read errors across the raidz2-2).

I've seen something similar happening on another storage server with WD disks, a kind of "WD bomb" if I may. I'd consider swapping them out for something else ASAP.

I'm not really sure how ZFS should act in case of unreadable sectors since Linux will never reallocate the sector by itself so potentially this could kill a pool if unreadable sectors accumulate. In essence this is a form of silent data corruption that we aren't protected from and requires a periodic active monitoring of the pool health from the sysadmin. Perhaps @behlendorf can comment on this.

edit: Or could this be a bug that ZFS On Linux does not try to get the broken sector's data from another device in case the primary method fails, or could the redundancy already be exhausted here?

I actually had this happen on vanilla linux mdraid once. Basically linux kernel tries to read the sector and HDD fails to do it since the sector has corrupted/become unreadable for some reason. This in turn caused the raid sync to fail.

There is a dangerous "fix" available. In order to force a reallocation of the broken HDD sector, you can for example do hdparm --write-sector . Note that this is really very dangerous as it effectively zeroes out a whole HDD sector and I don't really know how ZFS will deal with the situation. Your pool is already degraded which potentially worsens the situation, for example data in those sectors might not be redundant anymore (you have several read errors across the raidz2-2).

I've seen something similar happening on another storage server with WD disks, a kind of "WD bomb" if I may. I'd consider swapping them out for something else ASAP.

I'm not really sure how ZFS should act in case of unreadable sectors since Linux will never reallocate the sector by itself so potentially this could kill a pool if unreadable sectors accumulate. In essence this is a form of silent data corruption that we aren't protected from and requires a periodic active monitoring of the pool health from the sysadmin. Perhaps @behlendorf can comment on this.

edit: Or could this be a bug that ZFS On Linux does not try to get the broken sector's data from another device in case the primary method fails, or could the redundancy already be exhausted here?

@atonkyra

This comment has been minimized.

Show comment Hide comment
@atonkyra

atonkyra Jul 19, 2012

Fast way of course to check this on a single device is to run badblocks -s -c256 /dev/device (this is read only so it should be safe, check manpage for options)

Fast way of course to check this on a single device is to run badblocks -s -c256 /dev/device (this is read only so it should be safe, check manpage for options)

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 19, 2012

Contributor

Thanks for your response. I'm not actually concerned about the individual disk errors, I see that quite a bit here and it's one of the things the redundancy in "RAID" is supposed to save you from. I've definitely seen ZFS recover from these other times - it's designed to fill in the missing info from the other disks, and re-writes the info back onto the errored disk. I think it rewrites the data back to the exact same sector, relying on the disk remapping a dead sector rather than writing the data to a different sector and updating it's own block pointers.

The issue is the resilver restarting when it hits an error on that specific disk.

FYI, since first posting I've had another 5 errors on sdbc, each correlated with a resilver restart, with all the restarts correlated with the sdbc errors, and another 9 errors on other disks which haven't caused any noticeable issues:

2012-07-19-15:42:47     1.28T scanned out of 29.9T at 403M/s, 20h40m to go
2012-07-19-15:43:17     47.4M scanned out of 29.9T at 2.37M/s, (scan is slow, no estimated time)

2012-07-19-15:57:19     150G scanned out of 29.9T at 178M/s, 48h37m to go
2012-07-19-15:57:49     47.7M scanned out of 29.9T at 2.27M/s, (scan is slow, no estimated time)

2012-07-19-17:01:56     1.47T scanned out of 29.9T at 399M/s, 20h42m to go
2012-07-19-17:02:26     11.0M scanned out of 29.9T at 5.49M/s, (scan is slow, no estimated time)

2012-07-19-18:00:02     854G scanned out of 29.9T at 253M/s, 33h24m to go
2012-07-19-18:00:32     11.3M scanned out of 29.9T at 5.64M/s, (scan is slow, no estimated time)

2012-07-19-18:19:04     416G scanned out of 29.9T at 383M/s, 22h25m to go
2012-07-19-18:19:34     26.5M scanned out of 29.9T at 2.21M/s, (scan is slow, no estimated time)
Jul 19 15:42:50 b5 kernel: [87839.654210] end_request: critical target error, dev sdbc, sector 89985170
Jul 19 15:57:21 b5 kernel: [88709.279875] end_request: critical target error, dev sdbc, sector 1242265885
Jul 19 16:55:10 b5 kernel: [92172.837473] end_request: critical target error, dev sdbr, sector 92411471
Jul 19 16:55:23 b5 kernel: [92185.968422] end_request: critical target error, dev sdbr, sector 92616591
Jul 19 17:02:06 b5 kernel: [92588.136047] end_request: critical target error, dev sdbe, sector 103705189
Jul 19 17:02:17 b5 kernel: [92599.486257] end_request: critical target error, dev sdbc, sector 103884710
Jul 19 17:21:29 b5 kernel: [93749.776540] end_request: critical target error, dev sdbr, sector 1249209968
Jul 19 17:26:01 b5 kernel: [94021.482272] end_request: critical target error, dev sdaz, sector 1255414684
Jul 19 17:26:16 b5 kernel: [94035.966784] end_request: critical target error, dev sdbe, sector 1255592141
Jul 19 17:27:39 b5 kernel: [94118.680380] end_request: critical target error, dev sdbh, sector 1257310971
Jul 19 17:28:40 b5 kernel: [94179.921022] end_request: critical target error, dev sdbe, sector 1258878328
Jul 19 18:00:23 b5 kernel: [96080.118181] end_request: critical target error, dev sdbc, sector 1306147533
Jul 19 18:19:15 b5 kernel: [97210.681538] end_request: critical target error, dev sdbc, sector 31382981
Jul 19 18:25:51 b5 kernel: [97606.085044] end_request: critical target error, dev sdaz, sector 1230811160
Contributor

chrisrd commented Jul 19, 2012

Thanks for your response. I'm not actually concerned about the individual disk errors, I see that quite a bit here and it's one of the things the redundancy in "RAID" is supposed to save you from. I've definitely seen ZFS recover from these other times - it's designed to fill in the missing info from the other disks, and re-writes the info back onto the errored disk. I think it rewrites the data back to the exact same sector, relying on the disk remapping a dead sector rather than writing the data to a different sector and updating it's own block pointers.

The issue is the resilver restarting when it hits an error on that specific disk.

FYI, since first posting I've had another 5 errors on sdbc, each correlated with a resilver restart, with all the restarts correlated with the sdbc errors, and another 9 errors on other disks which haven't caused any noticeable issues:

2012-07-19-15:42:47     1.28T scanned out of 29.9T at 403M/s, 20h40m to go
2012-07-19-15:43:17     47.4M scanned out of 29.9T at 2.37M/s, (scan is slow, no estimated time)

2012-07-19-15:57:19     150G scanned out of 29.9T at 178M/s, 48h37m to go
2012-07-19-15:57:49     47.7M scanned out of 29.9T at 2.27M/s, (scan is slow, no estimated time)

2012-07-19-17:01:56     1.47T scanned out of 29.9T at 399M/s, 20h42m to go
2012-07-19-17:02:26     11.0M scanned out of 29.9T at 5.49M/s, (scan is slow, no estimated time)

2012-07-19-18:00:02     854G scanned out of 29.9T at 253M/s, 33h24m to go
2012-07-19-18:00:32     11.3M scanned out of 29.9T at 5.64M/s, (scan is slow, no estimated time)

2012-07-19-18:19:04     416G scanned out of 29.9T at 383M/s, 22h25m to go
2012-07-19-18:19:34     26.5M scanned out of 29.9T at 2.21M/s, (scan is slow, no estimated time)
Jul 19 15:42:50 b5 kernel: [87839.654210] end_request: critical target error, dev sdbc, sector 89985170
Jul 19 15:57:21 b5 kernel: [88709.279875] end_request: critical target error, dev sdbc, sector 1242265885
Jul 19 16:55:10 b5 kernel: [92172.837473] end_request: critical target error, dev sdbr, sector 92411471
Jul 19 16:55:23 b5 kernel: [92185.968422] end_request: critical target error, dev sdbr, sector 92616591
Jul 19 17:02:06 b5 kernel: [92588.136047] end_request: critical target error, dev sdbe, sector 103705189
Jul 19 17:02:17 b5 kernel: [92599.486257] end_request: critical target error, dev sdbc, sector 103884710
Jul 19 17:21:29 b5 kernel: [93749.776540] end_request: critical target error, dev sdbr, sector 1249209968
Jul 19 17:26:01 b5 kernel: [94021.482272] end_request: critical target error, dev sdaz, sector 1255414684
Jul 19 17:26:16 b5 kernel: [94035.966784] end_request: critical target error, dev sdbe, sector 1255592141
Jul 19 17:27:39 b5 kernel: [94118.680380] end_request: critical target error, dev sdbh, sector 1257310971
Jul 19 17:28:40 b5 kernel: [94179.921022] end_request: critical target error, dev sdbe, sector 1258878328
Jul 19 18:00:23 b5 kernel: [96080.118181] end_request: critical target error, dev sdbc, sector 1306147533
Jul 19 18:19:15 b5 kernel: [97210.681538] end_request: critical target error, dev sdbc, sector 31382981
Jul 19 18:25:51 b5 kernel: [97606.085044] end_request: critical target error, dev sdaz, sector 1230811160
@shapemaker

This comment has been minimized.

Show comment Hide comment
@shapemaker

shapemaker Jul 19, 2012

Firstly, I hope you have backups.

Note that this is really very dangerous as it effectively zeroes out a whole HDD sector and I don't really know how ZFS will deal with the situation.

ZFS deals with the destroyed sector just fine, provided that there's redundancy available on other disk(s).

Your pool is already degraded which potentially worsens the situation, for example data in those sectors might not be redundant anymore (you have several read errors across the raidz2-2).

This is what worries me also. That many errors across the same vdev, and already 2 disks out of operation. A dangerous situation for sure. Also it's interesting to see only the WD disks are erroring like that, and having that many errors too in the scrub. For future reference, that's why one shouldn't put all similar disks in one redundant vdev if at all possible, but spread them around.

In essence this is a form of silent data corruption that we aren't protected from and requires a periodic active monitoring of the pool health from the sysadmin.

ZFS is designed to protect your data specifically from errors like this. It isn't "silent" since ZFS reports it, now is it? Linux mdraid and normal hardware raid solutions however are affected by silent corruption.

edit: Or could this be a bug that ZFS On Linux does not try to get the broken sector's data from another device in case the primary method fails, or could the redundancy already be exhausted here?

Not a bug, ZFS works perfectly fine. Might be that redundancy is lost already.

I've definitely seen ZFS recover from these other times - it's designed to fill in the missing info from the other disks, and re-writes the info back onto the errored disk. I think it rewrites the data back to the exact same sector, relying on the disk remapping a dead sector rather than writing the data to a different sector and updating it's own block pointers.

That's correct. See implementation details specific to raidz here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c - line 2109. IIRC, this is the only case copy-on-write isn't used.

Now, for what is happening... most likely the resilver restarts because ZFS gets errors from sdbc. If you're lucky, the read/checksum errors on other disks don't correspond to those sectors on sdbc and redundancy is intact. If you have backups, yank out sdbc completely and you should see resilver finish using parity data. Or stop ZFS, and run a badblocks -sv run on sdbc and remap the sectors it reports as bad with hdparm or dd (DANGEROUS).

This of course only works if parity data corresponding to those sectors you destroy is intact - hence the backups. But it beats yanking the whole disk, since most of sdbc is still readable I'd guess.

Firstly, I hope you have backups.

Note that this is really very dangerous as it effectively zeroes out a whole HDD sector and I don't really know how ZFS will deal with the situation.

ZFS deals with the destroyed sector just fine, provided that there's redundancy available on other disk(s).

Your pool is already degraded which potentially worsens the situation, for example data in those sectors might not be redundant anymore (you have several read errors across the raidz2-2).

This is what worries me also. That many errors across the same vdev, and already 2 disks out of operation. A dangerous situation for sure. Also it's interesting to see only the WD disks are erroring like that, and having that many errors too in the scrub. For future reference, that's why one shouldn't put all similar disks in one redundant vdev if at all possible, but spread them around.

In essence this is a form of silent data corruption that we aren't protected from and requires a periodic active monitoring of the pool health from the sysadmin.

ZFS is designed to protect your data specifically from errors like this. It isn't "silent" since ZFS reports it, now is it? Linux mdraid and normal hardware raid solutions however are affected by silent corruption.

edit: Or could this be a bug that ZFS On Linux does not try to get the broken sector's data from another device in case the primary method fails, or could the redundancy already be exhausted here?

Not a bug, ZFS works perfectly fine. Might be that redundancy is lost already.

I've definitely seen ZFS recover from these other times - it's designed to fill in the missing info from the other disks, and re-writes the info back onto the errored disk. I think it rewrites the data back to the exact same sector, relying on the disk remapping a dead sector rather than writing the data to a different sector and updating it's own block pointers.

That's correct. See implementation details specific to raidz here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c - line 2109. IIRC, this is the only case copy-on-write isn't used.

Now, for what is happening... most likely the resilver restarts because ZFS gets errors from sdbc. If you're lucky, the read/checksum errors on other disks don't correspond to those sectors on sdbc and redundancy is intact. If you have backups, yank out sdbc completely and you should see resilver finish using parity data. Or stop ZFS, and run a badblocks -sv run on sdbc and remap the sectors it reports as bad with hdparm or dd (DANGEROUS).

This of course only works if parity data corresponding to those sectors you destroy is intact - hence the backups. But it beats yanking the whole disk, since most of sdbc is still readable I'd guess.

@shapemaker

This comment has been minimized.

Show comment Hide comment
@shapemaker

shapemaker Jul 19, 2012

I'm not actually concerned about the individual disk errors, I see that quite a bit here and it's one of the things the redundancy in "RAID" is supposed to save you from.
[...]
The issue is the resilver restarting when it hits an error on that specific disk.

Tbh, I'd be very concerned about all those read errors. Even though ZFS should be able to save your bacon most of the time, that's just spectacularly flaky, garbage hardware right there from WD :-/

In the very least, errors like that will slow down resilvers and pool performance. At worst, they will cause complete pool loss if they overlap just so. Middle ground seems to be issues like this you're having here - however this should still be recoverable from.

My 2 cents: toss those WDs as far as you can as soon as possible. No sense in using faulty hardware...

I'm not actually concerned about the individual disk errors, I see that quite a bit here and it's one of the things the redundancy in "RAID" is supposed to save you from.
[...]
The issue is the resilver restarting when it hits an error on that specific disk.

Tbh, I'd be very concerned about all those read errors. Even though ZFS should be able to save your bacon most of the time, that's just spectacularly flaky, garbage hardware right there from WD :-/

In the very least, errors like that will slow down resilvers and pool performance. At worst, they will cause complete pool loss if they overlap just so. Middle ground seems to be issues like this you're having here - however this should still be recoverable from.

My 2 cents: toss those WDs as far as you can as soon as possible. No sense in using faulty hardware...

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 19, 2012

Contributor

Oh, yes, those WDs will be replaced as soon as possible! But first I have an operational interest in how well this goes with flakey hardware and all. Note that it's raid-z2 and only one disk has actually disappeared altogether so there's still a parity disk left. Of course, if there are 2 or more disks with errors on the same ZFS block then it's lost data. However even in that case it should only mean a lost file if it's on a data block, which should then be reported at the end with zpool status -v. I guess if I'm unlucky and lose part of the block pointer tree I could lose a great deal more than a single file.

Statistically... thus far a total of 169 sector errors have been reported since the resilver originally started, so 84.5 kB of data. There's nearly 25 TB of used data in the pool, so assuming it's evenly distributed over the 3 vdevs there's 8.3 TB on the failing vdev, and spread over the 10 remaining disks (+ 1 dead) that's around 830 GB per disk. So grossly speaking, 84.5 kB out of 830 GB means there's a pretty good chance the errors aren't in the same ZFS block. At least that's what I'm hoping!

...if the resilver is actually progressing and not simply going over the same ground again and again.

Thanks for the suggestion of badblocks, that may indeed work if the disk errors can be removed before the resilver sees them. At this point I'd rather see if the resilver is progressing.

Cheers!

Contributor

chrisrd commented Jul 19, 2012

Oh, yes, those WDs will be replaced as soon as possible! But first I have an operational interest in how well this goes with flakey hardware and all. Note that it's raid-z2 and only one disk has actually disappeared altogether so there's still a parity disk left. Of course, if there are 2 or more disks with errors on the same ZFS block then it's lost data. However even in that case it should only mean a lost file if it's on a data block, which should then be reported at the end with zpool status -v. I guess if I'm unlucky and lose part of the block pointer tree I could lose a great deal more than a single file.

Statistically... thus far a total of 169 sector errors have been reported since the resilver originally started, so 84.5 kB of data. There's nearly 25 TB of used data in the pool, so assuming it's evenly distributed over the 3 vdevs there's 8.3 TB on the failing vdev, and spread over the 10 remaining disks (+ 1 dead) that's around 830 GB per disk. So grossly speaking, 84.5 kB out of 830 GB means there's a pretty good chance the errors aren't in the same ZFS block. At least that's what I'm hoping!

...if the resilver is actually progressing and not simply going over the same ground again and again.

Thanks for the suggestion of badblocks, that may indeed work if the disk errors can be removed before the resilver sees them. At this point I'd rather see if the resilver is progressing.

Cheers!

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 19, 2012

Contributor

Just to clarify, when I said "I'm not actually concerned about the individual disk errors", I should have continued on with "with respect to the immediate issue". Obviously a disk with a high error rate is something to be concerned about, and even more so a bunch of disks! However right now, with the errors in play, I'm wondering if the resilver is actually progressing or not.

Contributor

chrisrd commented Jul 19, 2012

Just to clarify, when I said "I'm not actually concerned about the individual disk errors", I should have continued on with "with respect to the immediate issue". Obviously a disk with a high error rate is something to be concerned about, and even more so a bunch of disks! However right now, with the errors in play, I'm wondering if the resilver is actually progressing or not.

@shapemaker

This comment has been minimized.

Show comment Hide comment
@shapemaker

shapemaker Jul 19, 2012

I also noticed you have mixed 512 byte (EADS) and 4k byte (EARX) sector disks there. That could complicate sector overwrite operations, as one faulty 4k internal sector could make 8 emulated sectors unreadable. Is the pool made with ashift=12 option?

Hmm, only the EADS disks emit read errors so far? Interesting too, maybe the better ECC in "advanced format" disks helps...

I also noticed you have mixed 512 byte (EADS) and 4k byte (EARX) sector disks there. That could complicate sector overwrite operations, as one faulty 4k internal sector could make 8 emulated sectors unreadable. Is the pool made with ashift=12 option?

Hmm, only the EADS disks emit read errors so far? Interesting too, maybe the better ECC in "advanced format" disks helps...

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 19, 2012

Contributor

It's ashift=9. I did some initial testing and for my data set there was only a marginal speed loss and a significant space advantage over ashift=12.

I hadn't really considered the ECC on the 4k disks before, but yes, that would make sense.

The other thing about the EADS is they have TLER / SCTERC (set to 7s), as do the Hitachis, whereas the EARX and Seagates don't. So they're more likely to report bad sectors rather than try really, really, really hard (for minutes) to get the data back. But the EADS are also the oldest disks in the box which may have something to do with the errors seen there.

Contributor

chrisrd commented Jul 19, 2012

It's ashift=9. I did some initial testing and for my data set there was only a marginal speed loss and a significant space advantage over ashift=12.

I hadn't really considered the ECC on the 4k disks before, but yes, that would make sense.

The other thing about the EADS is they have TLER / SCTERC (set to 7s), as do the Hitachis, whereas the EARX and Seagates don't. So they're more likely to report bad sectors rather than try really, really, really hard (for minutes) to get the data back. But the EADS are also the oldest disks in the box which may have something to do with the errors seen there.

@behlendorf

This comment has been minimized.

Show comment Hide comment
@behlendorf

behlendorf Jul 19, 2012

Owner

Thanks for the detailed information guys. I haven't looked in to why the resilver keeps restarting but if I were to venture a somewhat educated guess it would be because the vdev is disappearing/reappearing. The lower level mpt2sas driver may be attempting various recovery techniques to get access to the sector. These could involve resetting the device or the channel which would percolate back up the stack to zfs.

But as I said that's just a guess. Still we should leave the bug open so we can determine exactly what's causing it and decide what to do about it.

Owner

behlendorf commented Jul 19, 2012

Thanks for the detailed information guys. I haven't looked in to why the resilver keeps restarting but if I were to venture a somewhat educated guess it would be because the vdev is disappearing/reappearing. The lower level mpt2sas driver may be attempting various recovery techniques to get access to the sector. These could involve resetting the device or the channel which would percolate back up the stack to zfs.

But as I said that's just a guess. Still we should leave the bug open so we can determine exactly what's causing it and decide what to do about it.

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 20, 2012

Contributor

TLDR; having moved the disks to a different chassis the resilver is progressing nicely (7 hrs, 10 TB and counting) without any disk errors.

So my immediate issue is resolved, and thanks to those who've responded!

It's still of interest as to why errors from sdbc, and only sdbc, were causing the resilver to restart. I don't think the vdev itself was disappearing - the error messages (per the first post in this issue) reported were the same for multiple disks, but only sdbc, and always sdbc, would cause the resilver restart. Possibly something to do with it being the source disk in a zpool replace? Perhaps combined with the fact there was another zpool replace running at the same time in the same vdev, but in that case the "source" disk had actually disappeared altogether (dead disk). Now that my pool is behaving itself I can no longer reproduce the problem, but the "dm-flakey" device manager target might be useful if someone gets inspired to chase this problem down.

For those interested...

I finally gave up on the resilvering as it really didn't seem to be getting anywhere and decided to do a disk to disk copy of the sdbc problem disk to another unused drive:

dd if=/dev/sdbc of=/dev/${new} bs=64k conv=noerror

This was a variation on the badblocks suggestion: the theory was to get a copy of as much data as I could onto a disk that wasn't erroring all the time, then restart the resilvering with the "good" disk in place of the problem disk. There would still be holes in the data where the original disk was flaky, but then the resilver could recover that data using the parity, and hopefully, without disk errors, the resilver would actually complete.

Much to my surprise the dd completed 8.5 hours later, without a single disk error. Huh?!?!

The difference was in the case of resilvering I had the 33 disks in the pool all being hit at the same time, versus the dd reading from a single disk then writing to a separate disk, i.e. heavy traffic on the SAS bus vs no contention.

The actual physical layout is a SuperMicro 847E26-R1400LPB head box with a LSI SAS 9211-8i HBA, connected to another identical SuperMicro wired as a simple JBOD, then daisy-chained to a Norco RPC-4224 JBOD. All the disks that were erroring i.e. the WD EADSes were in the Norco, along with a bunch of the other disks that weren't erroring.

On a hunch, I moved all the disks in the pool I was trying to resilver out of the Norco and into the 2nd Supermicro.

It's now been resilvering away for 7 hours with 10 TB done, and not a single disk error from any of the disks. Only 20 TB to go.

Augh! Suddenly I'm not looking kindly upon that Norco. Hmmm, I wonder, if I had a big enough blender, "will it blend"?

Contributor

chrisrd commented Jul 20, 2012

TLDR; having moved the disks to a different chassis the resilver is progressing nicely (7 hrs, 10 TB and counting) without any disk errors.

So my immediate issue is resolved, and thanks to those who've responded!

It's still of interest as to why errors from sdbc, and only sdbc, were causing the resilver to restart. I don't think the vdev itself was disappearing - the error messages (per the first post in this issue) reported were the same for multiple disks, but only sdbc, and always sdbc, would cause the resilver restart. Possibly something to do with it being the source disk in a zpool replace? Perhaps combined with the fact there was another zpool replace running at the same time in the same vdev, but in that case the "source" disk had actually disappeared altogether (dead disk). Now that my pool is behaving itself I can no longer reproduce the problem, but the "dm-flakey" device manager target might be useful if someone gets inspired to chase this problem down.

For those interested...

I finally gave up on the resilvering as it really didn't seem to be getting anywhere and decided to do a disk to disk copy of the sdbc problem disk to another unused drive:

dd if=/dev/sdbc of=/dev/${new} bs=64k conv=noerror

This was a variation on the badblocks suggestion: the theory was to get a copy of as much data as I could onto a disk that wasn't erroring all the time, then restart the resilvering with the "good" disk in place of the problem disk. There would still be holes in the data where the original disk was flaky, but then the resilver could recover that data using the parity, and hopefully, without disk errors, the resilver would actually complete.

Much to my surprise the dd completed 8.5 hours later, without a single disk error. Huh?!?!

The difference was in the case of resilvering I had the 33 disks in the pool all being hit at the same time, versus the dd reading from a single disk then writing to a separate disk, i.e. heavy traffic on the SAS bus vs no contention.

The actual physical layout is a SuperMicro 847E26-R1400LPB head box with a LSI SAS 9211-8i HBA, connected to another identical SuperMicro wired as a simple JBOD, then daisy-chained to a Norco RPC-4224 JBOD. All the disks that were erroring i.e. the WD EADSes were in the Norco, along with a bunch of the other disks that weren't erroring.

On a hunch, I moved all the disks in the pool I was trying to resilver out of the Norco and into the 2nd Supermicro.

It's now been resilvering away for 7 hours with 10 TB done, and not a single disk error from any of the disks. Only 20 TB to go.

Augh! Suddenly I'm not looking kindly upon that Norco. Hmmm, I wonder, if I had a big enough blender, "will it blend"?

@shapemaker

This comment has been minimized.

Show comment Hide comment
@shapemaker

shapemaker Jul 20, 2012

Long story short: use nearline SAS disks instead of SATA with SAS expanders.

Longer version: your SuperMicro 847E26-R1400LPB has a SAS2 backplane+2 expanders, correct? I'm playing with those at my current job, and that exact same HBA... good hardware. Anyhow, how's the 2nd chassis wired to the Norco's 6 iPASS connectors?

Reports say that SAS expanders + SATA disks are a possible recipe for disaster, especially under moderate to heavy load. Even with the newest HBA+expander firmware, there's a possibility of reset storms (or other weird error conditions) that take out a whole backplane's worth of disks either momentarily or until power cycle, depending on how badly a misbehaving SATA disk crashes the SAS subsystem. Had you described that system layout in the 1st post... :-)

Most likely the problem isn't your Norco chassis, the "backplanes" in it are simple passthrough devices... nothing fancy in them. Possibly you just moved the failure domain so that the HBA/expander(s) can better deal with flaky SATA disks.

Long story short: use nearline SAS disks instead of SATA with SAS expanders.

Longer version: your SuperMicro 847E26-R1400LPB has a SAS2 backplane+2 expanders, correct? I'm playing with those at my current job, and that exact same HBA... good hardware. Anyhow, how's the 2nd chassis wired to the Norco's 6 iPASS connectors?

Reports say that SAS expanders + SATA disks are a possible recipe for disaster, especially under moderate to heavy load. Even with the newest HBA+expander firmware, there's a possibility of reset storms (or other weird error conditions) that take out a whole backplane's worth of disks either momentarily or until power cycle, depending on how badly a misbehaving SATA disk crashes the SAS subsystem. Had you described that system layout in the 1st post... :-)

Most likely the problem isn't your Norco chassis, the "backplanes" in it are simple passthrough devices... nothing fancy in them. Possibly you just moved the failure domain so that the HBA/expander(s) can better deal with flaky SATA disks.

@shapemaker

This comment has been minimized.

Show comment Hide comment
@shapemaker

shapemaker Jul 20, 2012

Mind, direct attaching SATA disks to SAS HBAs should work just fine. It's when you add SAS expanders (and SATA Tunneling Protocol) to the mix that you start encountering weird errors with misbehaving SATA disks. SATA simply isn't designed with a switching fabric in mind, and it looks like firmwares still aren't robust enough to deal with command tunneling reliably in the presence of errors.

Too bad that nearline SAS disks are so much more expensive than SATA disks. One can hope prices come down sooner than later...

Mind, direct attaching SATA disks to SAS HBAs should work just fine. It's when you add SAS expanders (and SATA Tunneling Protocol) to the mix that you start encountering weird errors with misbehaving SATA disks. SATA simply isn't designed with a switching fabric in mind, and it looks like firmwares still aren't robust enough to deal with command tunneling reliably in the presence of errors.

Too bad that nearline SAS disks are so much more expensive than SATA disks. One can hope prices come down sooner than later...

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 20, 2012

Contributor

Norco is connected using a Chenbro CK23601. I can definitely blend that if I need to! It uses the same LSI SAS2X36 expander as the Supermicro SC847E26-RJBOD1. I'm interested in further discussing the hardware side of things, perhaps move that part over to zfs-discuss for a wider audience?

On the immediate issue, 27 hours in and 21 TB of 30 TB completed. 2 disks errors reported, both on EADS disks, neither resulting in a resilver restart:

Jul 21 04:17:54 b5 kernel: [39085.454182] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454209] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454232] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454255] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454278] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454318] sd 0:0:36:0: [sdaj] Unhandled sense code
Jul 21 04:17:54 b5 kernel: [39085.454343] sd 0:0:36:0: [sdaj]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Jul 21 04:17:54 b5 kernel: [39085.454400] sd 0:0:36:0: [sdaj]  Sense Key : Medium Error [current] 
Jul 21 04:17:54 b5 kernel: [39085.454461] Info fld=0x41d9e878
Jul 21 04:17:54 b5 kernel: [39085.454497] sd 0:0:36:0: [sdaj]  Add. Sense: Unrecovered read error
Jul 21 04:17:54 b5 kernel: [39085.454549] sd 0:0:36:0: [sdaj] CDB: Read(10): 28 00 41 d9 e7 92 00 00 e7 00
Jul 21 04:17:54 b5 kernel: [39085.454628] end_request: critical target error, dev sdaj, sector 1104799634
Jul 21 04:17:54 b5 kernel: [39085.454681] ZFS: zio error=121 type=1 offset=563655681024 size=118272 flags=20668 delay=7010

Jul 21 08:06:20 b5 kernel: [52770.786179] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786234] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786285] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786362] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786481] sd 0:0:32:0: [sdaf] Unhandled sense code
Jul 21 08:06:20 b5 kernel: [52770.786530] sd 0:0:32:0: [sdaf]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Jul 21 08:06:20 b5 kernel: [52770.786611] sd 0:0:32:0: [sdaf]  Sense Key : Medium Error [current] 
Jul 21 08:06:20 b5 kernel: [52770.786669] Info fld=0x58678f34
Jul 21 08:06:20 b5 kernel: [52770.786705] sd 0:0:32:0: [sdaf]  Add. Sense: Unrecovered read error
Jul 21 08:06:20 b5 kernel: [52770.786757] sd 0:0:32:0: [sdaf] CDB: Read(10): 28 00 58 67 8f 33 00 00 02 00
Jul 21 08:06:20 b5 kernel: [52770.786836] end_request: critical target error, dev sdaf, sector 1483181875
Jul 21 08:06:20 b5 kernel: [52770.786888] ZFS: zio error=121 type=1 offset=757387388416 size=1024 flags=20668 delay=7010

I guess that means I've just gotten lucky that the problem disk hasn't suffered an error and caused a resilver restart. Fingers crossed for the next 7 hrs!

Current pool status:

  pool: pool2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Jul 20 06:01:47 2012
    21.3T scanned out of 29.9T at 369M/s, 6h43m to go
    1.38T resilvered, 71.49% done
config:

    NAME                                        STATE     READ WRITE CKSUM
    pool2                                  DEGRADED     0     0     0
      raidz2-0                                  ONLINE       0     0     0
        sdbi  Hitachi_HDS5C30_ML0220F30E2DMD    ONLINE       0     0     0
        sdbg  Hitachi_HDS5C30_ML0221F30B7ZND    ONLINE       0     0     0
        sdac  Hitachi_HDS7230_MN1240F33XBYYD    ONLINE       0     0     0
        sdad  Hitachi_HDS7230_MN1240FA03E4HD    ONLINE       0     0     0
        sdab  Hitachi_HDS7230_MN1240FA0HUZAD    ONLINE       0     0     0
        sdaa  Hitachi_HDS7230_MN1240FA0JXU2D    ONLINE       0     0     0
        sdba  Hitachi_HDS7230_MN1240FA0K07YD    ONLINE       0     0     0
        sdbp  Hitachi_HDS7230_MN1240FA0KP7YD    ONLINE       0     0     0
        sdah  Hitachi_HDS7230_MN1240FA0LA36D    ONLINE       0     0     0
        sdae  Hitachi_HDS7230_MN1240FA0LMP9D    ONLINE       0     0     0
        sdz   Hitachi_HDS7230_MN1270FA0SV8ED    ONLINE       0     0     0
      raidz2-1                                  ONLINE       0     0     0
        sdl   ST2000DL003-9VT_5YD1V6ZL          ONLINE       0     0     0
        sday  ST2000DL003-9VT_5YD1W04H          ONLINE       0     0     0
        sdn   ST2000DL003-9VT_5YD1XEDS          ONLINE       0     0     0
        sdi   ST2000DL003-9VT_5YD1XF0K          ONLINE       0     0     0
        sdbb  ST2000DL003-9VT_5YD1XQ15          ONLINE       0     0     0
        sdo   ST2000DL003-9VT_5YD1Z9BZ          ONLINE       0     0     0
        sdbe  WDC_WD20EARX-00_WD-WMAZA5589913   ONLINE       0     0  860K  (resilvering)
        sdbd  ST2000DL003-9VT_5YD563JW          ONLINE       0     0     0
        sds   ST2000DL003-9VT_5YD56MEZ          ONLINE       0     0     0
        sdy   ST2000DL003-9VT_5YD5FWQ6          ONLINE       0     0     0
        sdbq  WDC_WD20EARX-00_WD-WMAZA5567696   ONLINE       0     0     0
      raidz2-2                             DEGRADED     0     0     0
        sdaj  WDC_WD20EADS-00_WD-WCAVY1129416   ONLINE       7     0     0  (resilvering)
        sdaf  WDC_WD20EADS-00_WD-WCAVY1162803   ONLINE       3     0     0  (resilvering)
              replacing-2                       ONLINE       0     0     0
          sdak  WDC_WD20EADS-00_WD-WCAVY1184604 ONLINE       0     0     0  (resilvering)
          sdbh  WDC_WD20EARX-00_WD-WMAZA5603921 ONLINE       0     0     0  (resilvering)
        sdag  WDC_WD20EADS-00_WD-WCAVY1188798   ONLINE       0     0     0
              replacing-4                       DEGRADED     0     0     0
                17433600781902467301            UNAVAIL      0     0     0  was /dev/mapper/WDC_WD20EADS-00_WD-WCAVY1188828
          sdaz  Hitachi_HDS7230_MN1240FA0LHJLD  ONLINE       0     0     0  (resilvering)
        sdai  WDC_WD20EADS-00_WD-WCAVY1188933   ONLINE       0     0     0
        sdbr  WDC_WD20EADS-00_WD-WCAVY7404957   ONLINE       0     0     0
        sdbf  WDC_WD20EARX-00_WD-WCAZA9871198   ONLINE       0     0     0
        sdbo  WDC_WD20EARX-00_WD-WMAZA5346235   ONLINE       0     0     0
        sdbn  WDC_WD20EARX-00_WD-WMAZA5534735   ONLINE       0     0     0
        sdbj  WDC_WD20EARX-00_WD-WMAZA5561260   ONLINE       0     0     0

errors: No known data errors

Note that, with the move to the Supermicro chassis, the sdbc disk previously triggering the restarts is now sdak (thus demonstrating the benefits of using the disk ids for the pool rather than the ephemeral /dev/sd*).

Contributor

chrisrd commented Jul 20, 2012

Norco is connected using a Chenbro CK23601. I can definitely blend that if I need to! It uses the same LSI SAS2X36 expander as the Supermicro SC847E26-RJBOD1. I'm interested in further discussing the hardware side of things, perhaps move that part over to zfs-discuss for a wider audience?

On the immediate issue, 27 hours in and 21 TB of 30 TB completed. 2 disks errors reported, both on EADS disks, neither resulting in a resilver restart:

Jul 21 04:17:54 b5 kernel: [39085.454182] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454209] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454232] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454255] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454278] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 04:17:54 b5 kernel: [39085.454318] sd 0:0:36:0: [sdaj] Unhandled sense code
Jul 21 04:17:54 b5 kernel: [39085.454343] sd 0:0:36:0: [sdaj]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Jul 21 04:17:54 b5 kernel: [39085.454400] sd 0:0:36:0: [sdaj]  Sense Key : Medium Error [current] 
Jul 21 04:17:54 b5 kernel: [39085.454461] Info fld=0x41d9e878
Jul 21 04:17:54 b5 kernel: [39085.454497] sd 0:0:36:0: [sdaj]  Add. Sense: Unrecovered read error
Jul 21 04:17:54 b5 kernel: [39085.454549] sd 0:0:36:0: [sdaj] CDB: Read(10): 28 00 41 d9 e7 92 00 00 e7 00
Jul 21 04:17:54 b5 kernel: [39085.454628] end_request: critical target error, dev sdaj, sector 1104799634
Jul 21 04:17:54 b5 kernel: [39085.454681] ZFS: zio error=121 type=1 offset=563655681024 size=118272 flags=20668 delay=7010

Jul 21 08:06:20 b5 kernel: [52770.786179] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786234] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786285] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786362] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Jul 21 08:06:20 b5 kernel: [52770.786481] sd 0:0:32:0: [sdaf] Unhandled sense code
Jul 21 08:06:20 b5 kernel: [52770.786530] sd 0:0:32:0: [sdaf]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Jul 21 08:06:20 b5 kernel: [52770.786611] sd 0:0:32:0: [sdaf]  Sense Key : Medium Error [current] 
Jul 21 08:06:20 b5 kernel: [52770.786669] Info fld=0x58678f34
Jul 21 08:06:20 b5 kernel: [52770.786705] sd 0:0:32:0: [sdaf]  Add. Sense: Unrecovered read error
Jul 21 08:06:20 b5 kernel: [52770.786757] sd 0:0:32:0: [sdaf] CDB: Read(10): 28 00 58 67 8f 33 00 00 02 00
Jul 21 08:06:20 b5 kernel: [52770.786836] end_request: critical target error, dev sdaf, sector 1483181875
Jul 21 08:06:20 b5 kernel: [52770.786888] ZFS: zio error=121 type=1 offset=757387388416 size=1024 flags=20668 delay=7010

I guess that means I've just gotten lucky that the problem disk hasn't suffered an error and caused a resilver restart. Fingers crossed for the next 7 hrs!

Current pool status:

  pool: pool2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Jul 20 06:01:47 2012
    21.3T scanned out of 29.9T at 369M/s, 6h43m to go
    1.38T resilvered, 71.49% done
config:

    NAME                                        STATE     READ WRITE CKSUM
    pool2                                  DEGRADED     0     0     0
      raidz2-0                                  ONLINE       0     0     0
        sdbi  Hitachi_HDS5C30_ML0220F30E2DMD    ONLINE       0     0     0
        sdbg  Hitachi_HDS5C30_ML0221F30B7ZND    ONLINE       0     0     0
        sdac  Hitachi_HDS7230_MN1240F33XBYYD    ONLINE       0     0     0
        sdad  Hitachi_HDS7230_MN1240FA03E4HD    ONLINE       0     0     0
        sdab  Hitachi_HDS7230_MN1240FA0HUZAD    ONLINE       0     0     0
        sdaa  Hitachi_HDS7230_MN1240FA0JXU2D    ONLINE       0     0     0
        sdba  Hitachi_HDS7230_MN1240FA0K07YD    ONLINE       0     0     0
        sdbp  Hitachi_HDS7230_MN1240FA0KP7YD    ONLINE       0     0     0
        sdah  Hitachi_HDS7230_MN1240FA0LA36D    ONLINE       0     0     0
        sdae  Hitachi_HDS7230_MN1240FA0LMP9D    ONLINE       0     0     0
        sdz   Hitachi_HDS7230_MN1270FA0SV8ED    ONLINE       0     0     0
      raidz2-1                                  ONLINE       0     0     0
        sdl   ST2000DL003-9VT_5YD1V6ZL          ONLINE       0     0     0
        sday  ST2000DL003-9VT_5YD1W04H          ONLINE       0     0     0
        sdn   ST2000DL003-9VT_5YD1XEDS          ONLINE       0     0     0
        sdi   ST2000DL003-9VT_5YD1XF0K          ONLINE       0     0     0
        sdbb  ST2000DL003-9VT_5YD1XQ15          ONLINE       0     0     0
        sdo   ST2000DL003-9VT_5YD1Z9BZ          ONLINE       0     0     0
        sdbe  WDC_WD20EARX-00_WD-WMAZA5589913   ONLINE       0     0  860K  (resilvering)
        sdbd  ST2000DL003-9VT_5YD563JW          ONLINE       0     0     0
        sds   ST2000DL003-9VT_5YD56MEZ          ONLINE       0     0     0
        sdy   ST2000DL003-9VT_5YD5FWQ6          ONLINE       0     0     0
        sdbq  WDC_WD20EARX-00_WD-WMAZA5567696   ONLINE       0     0     0
      raidz2-2                             DEGRADED     0     0     0
        sdaj  WDC_WD20EADS-00_WD-WCAVY1129416   ONLINE       7     0     0  (resilvering)
        sdaf  WDC_WD20EADS-00_WD-WCAVY1162803   ONLINE       3     0     0  (resilvering)
              replacing-2                       ONLINE       0     0     0
          sdak  WDC_WD20EADS-00_WD-WCAVY1184604 ONLINE       0     0     0  (resilvering)
          sdbh  WDC_WD20EARX-00_WD-WMAZA5603921 ONLINE       0     0     0  (resilvering)
        sdag  WDC_WD20EADS-00_WD-WCAVY1188798   ONLINE       0     0     0
              replacing-4                       DEGRADED     0     0     0
                17433600781902467301            UNAVAIL      0     0     0  was /dev/mapper/WDC_WD20EADS-00_WD-WCAVY1188828
          sdaz  Hitachi_HDS7230_MN1240FA0LHJLD  ONLINE       0     0     0  (resilvering)
        sdai  WDC_WD20EADS-00_WD-WCAVY1188933   ONLINE       0     0     0
        sdbr  WDC_WD20EADS-00_WD-WCAVY7404957   ONLINE       0     0     0
        sdbf  WDC_WD20EARX-00_WD-WCAZA9871198   ONLINE       0     0     0
        sdbo  WDC_WD20EARX-00_WD-WMAZA5346235   ONLINE       0     0     0
        sdbn  WDC_WD20EARX-00_WD-WMAZA5534735   ONLINE       0     0     0
        sdbj  WDC_WD20EARX-00_WD-WMAZA5561260   ONLINE       0     0     0

errors: No known data errors

Note that, with the move to the Supermicro chassis, the sdbc disk previously triggering the restarts is now sdak (thus demonstrating the benefits of using the disk ids for the pool rather than the ephemeral /dev/sd*).

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 21, 2012

Contributor

FYI, the resilvering completed with several more disk errors, but none on sdak, and no further restarts.

Contributor

chrisrd commented Jul 21, 2012

FYI, the resilvering completed with several more disk errors, but none on sdak, and no further restarts.

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Aug 13, 2012

Contributor

Just confirming this as more than just a once-off freak: I'm seeing this "resilver restarts on error from replacement disk" problem happening again with different disks.

Once again I was replacing a faulted-out disk, using zpool replace ${pool} ${guid} ${new_device} (where ${guid} was the GUID of the now-inaccessible failed disk). The resilver was originally started several days ago, but zpool status is now showing me it restarted only an hour ago:

scan: resilver in progress since Mon Aug 13 22:54:11 2012

...which precisely matches the timestamp of a disk error seen on the new replacement disk:

Aug 13 22:54:04 b5 kernel: [715848.288843] sd 0:0:62:0: [sdbh] Unhandled sense code
Aug 13 22:54:04 b5 kernel: [715848.288873] sd 0:0:62:0: [sdbh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 13 22:54:04 b5 kernel: [715848.288922] sd 0:0:62:0: [sdbh]  Sense Key : Medium Error [current] 
Aug 13 22:54:04 b5 kernel: [715848.288962] Info fld=0x5f7eb4a9
Aug 13 22:54:04 b5 kernel: [715848.288993] sd 0:0:62:0: [sdbh]  Add. Sense: Unrecovered read error
Aug 13 22:54:04 b5 kernel: [715848.289036] sd 0:0:62:0: [sdbh] CDB: Read(10): 28 00 5f 7e b4 a9 00 00 c8 00
Aug 13 22:54:04 b5 kernel: [715848.289099] end_request: critical target error, dev sdbh, sector 1602139305
Aug 13 22:54:04 b5 kernel: [715848.289137] ZFS: zio error=121 type=1 offset=818293592576 size=102400 flags=20640 delay=6900

The relevant section of the zpool status:

                   replacing-0                      DEGRADED     0     0     0
                     Hitachi_HDS7230_MN1270FA11PATD FAULTED     66 1.36K     0  too many errors
              sdbh   Hitachi_HDS7230_MN1270FA0SNEDD ONLINE       1     1     0  (resilvering)
Contributor

chrisrd commented Aug 13, 2012

Just confirming this as more than just a once-off freak: I'm seeing this "resilver restarts on error from replacement disk" problem happening again with different disks.

Once again I was replacing a faulted-out disk, using zpool replace ${pool} ${guid} ${new_device} (where ${guid} was the GUID of the now-inaccessible failed disk). The resilver was originally started several days ago, but zpool status is now showing me it restarted only an hour ago:

scan: resilver in progress since Mon Aug 13 22:54:11 2012

...which precisely matches the timestamp of a disk error seen on the new replacement disk:

Aug 13 22:54:04 b5 kernel: [715848.288843] sd 0:0:62:0: [sdbh] Unhandled sense code
Aug 13 22:54:04 b5 kernel: [715848.288873] sd 0:0:62:0: [sdbh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 13 22:54:04 b5 kernel: [715848.288922] sd 0:0:62:0: [sdbh]  Sense Key : Medium Error [current] 
Aug 13 22:54:04 b5 kernel: [715848.288962] Info fld=0x5f7eb4a9
Aug 13 22:54:04 b5 kernel: [715848.288993] sd 0:0:62:0: [sdbh]  Add. Sense: Unrecovered read error
Aug 13 22:54:04 b5 kernel: [715848.289036] sd 0:0:62:0: [sdbh] CDB: Read(10): 28 00 5f 7e b4 a9 00 00 c8 00
Aug 13 22:54:04 b5 kernel: [715848.289099] end_request: critical target error, dev sdbh, sector 1602139305
Aug 13 22:54:04 b5 kernel: [715848.289137] ZFS: zio error=121 type=1 offset=818293592576 size=102400 flags=20640 delay=6900

The relevant section of the zpool status:

                   replacing-0                      DEGRADED     0     0     0
                     Hitachi_HDS7230_MN1270FA11PATD FAULTED     66 1.36K     0  too many errors
              sdbh   Hitachi_HDS7230_MN1270FA0SNEDD ONLINE       1     1     0  (resilvering)
@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Aug 16, 2012

Contributor

FYI, "resilver restarts on error from replacement disk" also occurs when the original disk is still present (previously I'd only seen it when the original disk had been faulted out).

Contributor

chrisrd commented Aug 16, 2012

FYI, "resilver restarts on error from replacement disk" also occurs when the original disk is still present (previously I'd only seen it when the original disk had been faulted out).

@ryao

This comment has been minimized.

Show comment Hide comment
@ryao

ryao Jul 11, 2014

Member

@chrisrd 6594735 might have fixed this.

Member

ryao commented Jul 11, 2014

@chrisrd 6594735 might have fixed this.

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jul 12, 2014

Contributor

Yep, that looks like it addresses the problem. I don't have a test case to confirm, but I'd say the ticket can be closed.

For anyone following the saga above, in the end my problems with multiple EADS disks going bad was traced down to a single dodgy EADS. It would do "something", and other EADS disks would experience errors. Once that one disk was replaced (during a program to replace all the EADS disks) all the problems went away. Go figure. I guess the EADS disks are marginal on their bus specs or something, perhaps only seen in conjunction with the SATA over SAS expanders etc. Since then the replaced EADS disks (minus the dodgy one) have been used in test pools and we've not seen any similar issues even during prolonged stress testing etc.

Contributor

chrisrd commented Jul 12, 2014

Yep, that looks like it addresses the problem. I don't have a test case to confirm, but I'd say the ticket can be closed.

For anyone following the saga above, in the end my problems with multiple EADS disks going bad was traced down to a single dodgy EADS. It would do "something", and other EADS disks would experience errors. Once that one disk was replaced (during a program to replace all the EADS disks) all the problems went away. Go figure. I guess the EADS disks are marginal on their bus specs or something, perhaps only seen in conjunction with the SATA over SAS expanders etc. Since then the replaced EADS disks (minus the dodgy one) have been used in test pools and we've not seen any similar issues even during prolonged stress testing etc.

@chrisrd chrisrd closed this Jul 14, 2014

@thisisnotmyrealname

This comment has been minimized.

Show comment Hide comment
@thisisnotmyrealname

thisisnotmyrealname Jun 16, 2016

I really hate bumping old tickets, but I'm not sure what to do.

I have the exact same issue running 0.6.5-304_gf74b821

I had a problem where a failing drive was still present in the machine, and repeatedly caused the pool to restart resilvering due to possibly this problem? I had my resilver restart about 5 times before I finally found this, offlined the bad device and removed it from the chassis.

My setup is a supermicro enclosure with SATA disks behind SAS HBA/Expanders.
`

I really hate bumping old tickets, but I'm not sure what to do.

I have the exact same issue running 0.6.5-304_gf74b821

I had a problem where a failing drive was still present in the machine, and repeatedly caused the pool to restart resilvering due to possibly this problem? I had my resilver restart about 5 times before I finally found this, offlined the bad device and removed it from the chassis.

My setup is a supermicro enclosure with SATA disks behind SAS HBA/Expanders.
`

@tobias-k

This comment has been minimized.

Show comment Hide comment
@tobias-k

tobias-k Jun 20, 2016

Hey,
I also hate bumping..

0.6.5.6-1trusty on 3.19.0-58-generic #6414.04.1-Ubuntu
In my case 4 TB SAS NL in Supermicro Expansion.. :(

Some I/O Errors in log..

[646135.203418] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[646135.203504] sd 1:0:26:0: [sdaa] FAILED Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK
[646135.203508] sd 1:0:26:0: [sdaa] CDB:
[646135.203509] Write(16): 8a 00 00 00 00 01 9f 82 e9 ff 00 00 00 18 00 00
[646135.203520] blk_update_request: I/O error, dev sdaa, sector 6971124223

`
root@fshh7:~# zpool status
pool: fshh7
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Jun 19 22:54:38 2016
14.1T scanned out of 99.2T at 408M/s, 60h43m to go
314G resilvered, 14.17% done
config:

    NAME                          STATE     READ WRITE CKSUM
    fshh7                         DEGRADED     0     0     0
      raidz2-0                    ONLINE       0     0     0
        sdau                      ONLINE       0     0     0
        sdav                      ONLINE       0     0     0
        sdaw                      ONLINE       0     0     0
        sday                      ONLINE       0     0     0
        sdax                      ONLINE       0     0     0
        sdaz                      ONLINE       0     0     0
        sdba                      ONLINE       0     0     0
        sdbb                      ONLINE       0     0     0
        sdbc                      ONLINE       0     0     0
        sdbd                      ONLINE       0     0     0
        sdbe                      ONLINE       0     0     0
        sdbf                      ONLINE       0     0     0
        sdbg                      ONLINE       0     0     0
        sdbh                      ONLINE       0     0     0
        sdbi                      ONLINE       0     0     0
        sdbj                      ONLINE       0     0     0
        sdbk                      ONLINE       0     0     0
        sdbl                      ONLINE       0     0     0
        sdbm                      ONLINE       0     0     0
        sdbn                      ONLINE       0     0     0
        sdbo                      ONLINE       0     0     0
        sdbp                      ONLINE       0     0     0
      raidz2-2                    DEGRADED     0     0     0
        sdb                       ONLINE       0     0     0
        sdc                       ONLINE       0     0     0
        sdd                       ONLINE       0     0     0
        sde                       ONLINE       0     0     0
        sdf                       ONLINE       0     0     0
        sdg                       ONLINE       0     0     0
        sdh                       ONLINE       0     0     0
        sdi                       ONLINE       0     0     0
        sdj                       ONLINE       0     0     0
        sdk                       ONLINE       0     0     0
        sdl                       ONLINE       0     0     0
        sdm                       ONLINE       0     0     0
        sdn                       ONLINE       0     0     0
        sdo                       ONLINE       0     0     0
        sdp                       ONLINE       0     0     0
        sdq                       ONLINE       0     0     0
        sdr                       ONLINE       0     0     0
        sds                       ONLINE       0     0     0
        sdt                       ONLINE       0     0     0
        sdu                       ONLINE       0     0     0
        sdv                       ONLINE       0     0     0
        sdw                       ONLINE       0     0     0
        sdx                       ONLINE       0     0     0
        sdy                       ONLINE       0     0     0
        sdz                       ONLINE       0     0     0
        replacing-25              DEGRADED     0     0     0
          6262255769558981431     FAULTED      0     0     0  was /dev/sdaa1
          scsi-35000c50083acd193  ONLINE       0     0     0  (resilvering)
        sdab                      ONLINE       0     0     0
        sdac                      ONLINE       0     0     0
        sdad                      ONLINE       0     0     0
        sdae                      ONLINE       0     0     0
        sdaf                      ONLINE       0     0     0
        sdag                      ONLINE       0     0     0
        sdah                      ONLINE       0     0     0
        sdai                      ONLINE       0     0     0
        sdaj                      ONLINE       0     0     0
        sdak                      ONLINE       0     0     0
        sdal                      ONLINE       0     0     0
        sdam                      ONLINE       0     0     0
        sdan                      ONLINE       0     0     0
        sdao                      ONLINE       0     0     0
        sdap                      ONLINE       0     0     0
        sdaq                      ONLINE       0     0     0
        sdar                      ONLINE       0     0     0
        sdas                      ONLINE       0     0     0
        sdat                      ONLINE       0     0     0
    logs
      mirror-1                    ONLINE       0     0     0
        sdbq1                     ONLINE       0     0     0
        sdbr1                     ONLINE       0     0     0
    cache
      sdbq2                       ONLINE       0     0     0
      sdbr2                       ONLINE       0     0     0

`

tobias-k commented Jun 20, 2016

Hey,
I also hate bumping..

0.6.5.6-1trusty on 3.19.0-58-generic #6414.04.1-Ubuntu
In my case 4 TB SAS NL in Supermicro Expansion.. :(

Some I/O Errors in log..

[646135.203418] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[646135.203504] sd 1:0:26:0: [sdaa] FAILED Result: hostbyte=DID_SOFT_ERROR 
driverbyte=DRIVER_OK
[646135.203508] sd 1:0:26:0: [sdaa] CDB:
[646135.203509] Write(16): 8a 00 00 00 00 01 9f 82 e9 ff 00 00 00 18 00 00
[646135.203520] blk_update_request: I/O error, dev sdaa, sector 6971124223

`
root@fshh7:~# zpool status
pool: fshh7
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Jun 19 22:54:38 2016
14.1T scanned out of 99.2T at 408M/s, 60h43m to go
314G resilvered, 14.17% done
config:

    NAME                          STATE     READ WRITE CKSUM
    fshh7                         DEGRADED     0     0     0
      raidz2-0                    ONLINE       0     0     0
        sdau                      ONLINE       0     0     0
        sdav                      ONLINE       0     0     0
        sdaw                      ONLINE       0     0     0
        sday                      ONLINE       0     0     0
        sdax                      ONLINE       0     0     0
        sdaz                      ONLINE       0     0     0
        sdba                      ONLINE       0     0     0
        sdbb                      ONLINE       0     0     0
        sdbc                      ONLINE       0     0     0
        sdbd                      ONLINE       0     0     0
        sdbe                      ONLINE       0     0     0
        sdbf                      ONLINE       0     0     0
        sdbg                      ONLINE       0     0     0
        sdbh                      ONLINE       0     0     0
        sdbi                      ONLINE       0     0     0
        sdbj                      ONLINE       0     0     0
        sdbk                      ONLINE       0     0     0
        sdbl                      ONLINE       0     0     0
        sdbm                      ONLINE       0     0     0
        sdbn                      ONLINE       0     0     0
        sdbo                      ONLINE       0     0     0
        sdbp                      ONLINE       0     0     0
      raidz2-2                    DEGRADED     0     0     0
        sdb                       ONLINE       0     0     0
        sdc                       ONLINE       0     0     0
        sdd                       ONLINE       0     0     0
        sde                       ONLINE       0     0     0
        sdf                       ONLINE       0     0     0
        sdg                       ONLINE       0     0     0
        sdh                       ONLINE       0     0     0
        sdi                       ONLINE       0     0     0
        sdj                       ONLINE       0     0     0
        sdk                       ONLINE       0     0     0
        sdl                       ONLINE       0     0     0
        sdm                       ONLINE       0     0     0
        sdn                       ONLINE       0     0     0
        sdo                       ONLINE       0     0     0
        sdp                       ONLINE       0     0     0
        sdq                       ONLINE       0     0     0
        sdr                       ONLINE       0     0     0
        sds                       ONLINE       0     0     0
        sdt                       ONLINE       0     0     0
        sdu                       ONLINE       0     0     0
        sdv                       ONLINE       0     0     0
        sdw                       ONLINE       0     0     0
        sdx                       ONLINE       0     0     0
        sdy                       ONLINE       0     0     0
        sdz                       ONLINE       0     0     0
        replacing-25              DEGRADED     0     0     0
          6262255769558981431     FAULTED      0     0     0  was /dev/sdaa1
          scsi-35000c50083acd193  ONLINE       0     0     0  (resilvering)
        sdab                      ONLINE       0     0     0
        sdac                      ONLINE       0     0     0
        sdad                      ONLINE       0     0     0
        sdae                      ONLINE       0     0     0
        sdaf                      ONLINE       0     0     0
        sdag                      ONLINE       0     0     0
        sdah                      ONLINE       0     0     0
        sdai                      ONLINE       0     0     0
        sdaj                      ONLINE       0     0     0
        sdak                      ONLINE       0     0     0
        sdal                      ONLINE       0     0     0
        sdam                      ONLINE       0     0     0
        sdan                      ONLINE       0     0     0
        sdao                      ONLINE       0     0     0
        sdap                      ONLINE       0     0     0
        sdaq                      ONLINE       0     0     0
        sdar                      ONLINE       0     0     0
        sdas                      ONLINE       0     0     0
        sdat                      ONLINE       0     0     0
    logs
      mirror-1                    ONLINE       0     0     0
        sdbq1                     ONLINE       0     0     0
        sdbr1                     ONLINE       0     0     0
    cache
      sdbq2                       ONLINE       0     0     0
      sdbr2                       ONLINE       0     0     0

`

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Jun 20, 2016

Contributor

There's probably a bug to be found with this, but as a mitigation for either of you, do you have TLER configured on the failing drives? (e.g. smartctl -l scterc,RNUM,WNUM [drive] - RNUM/WNUM are tenths of a second to attempt read/write before giving up, while 0 will generally cause it to not give up...which, if it takes too long, can be when the various other parts of the chain decide to reset it forcibly)

I'd suggest trying it if you can, because presuming the implementation isn't buggy, most parts of IO stacks respond better to drives replying after X seconds that they've had an error than to drives which stop responding entirely until you reset the device.

Contributor

rincebrain commented Jun 20, 2016

There's probably a bug to be found with this, but as a mitigation for either of you, do you have TLER configured on the failing drives? (e.g. smartctl -l scterc,RNUM,WNUM [drive] - RNUM/WNUM are tenths of a second to attempt read/write before giving up, while 0 will generally cause it to not give up...which, if it takes too long, can be when the various other parts of the chain decide to reset it forcibly)

I'd suggest trying it if you can, because presuming the implementation isn't buggy, most parts of IO stacks respond better to drives replying after X seconds that they've had an error than to drives which stop responding entirely until you reset the device.

@tobias-k

This comment has been minimized.

Show comment Hide comment
@tobias-k

tobias-k Jun 20, 2016

Hey, thx for reply.

I can test it, Which Values for scterc would you recommend?
I will set them asap and see if the rebuild will finaly get trough..

Thank you!

Hey, thx for reply.

I can test it, Which Values for scterc would you recommend?
I will set them asap and see if the rebuild will finaly get trough..

Thank you!

@raxxy

This comment has been minimized.

Show comment Hide comment
@raxxy

raxxy Jun 20, 2016

Tobias, I'm also suffering this issue at the moment.

I've been talking to the blokes on IRC and was given values of 70,70 to try (smartctl -l scterc,70,70)

At this stage i'm on my 5th resilver restart, I'll report back on array type and failure/resolution on this setting (I'm applying it to all disks however) in the near future..

raxxy commented Jun 20, 2016

Tobias, I'm also suffering this issue at the moment.

I've been talking to the blokes on IRC and was given values of 70,70 to try (smartctl -l scterc,70,70)

At this stage i'm on my 5th resilver restart, I'll report back on array type and failure/resolution on this setting (I'm applying it to all disks however) in the near future..

@tobias-k

This comment has been minimized.

Show comment Hide comment
@tobias-k

tobias-k Jun 20, 2016

Thank you.
So I'll set 80,80 and give feedback. Maybe you can skip at least one resilvering.
60 hours to go.. in my case..
Fingers crossed..

tobias-k commented Jun 20, 2016

Thank you.
So I'll set 80,80 and give feedback. Maybe you can skip at least one resilvering.
60 hours to go.. in my case..
Fingers crossed..

@tobias-k

This comment has been minimized.

Show comment Hide comment
@tobias-k

tobias-k Jun 20, 2016

Ok. Again restarted. :(
Next try 150,150

Ok. Again restarted. :(
Next try 150,150

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Jun 20, 2016

Contributor

Changing the values to different numbers shouldn't affect whether it helps or not, unfortunately.

Contributor

rincebrain commented Jun 20, 2016

Changing the values to different numbers shouldn't affect whether it helps or not, unfortunately.

@tobias-k

This comment has been minimized.

Show comment Hide comment
@tobias-k

tobias-k Jun 20, 2016

So @rincebrain you would suggest value 0 anyway?

Thank you.

So @rincebrain you would suggest value 0 anyway?

Thank you.

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Jun 20, 2016

Contributor

No, I still claim having it set to nonzero values in a RAID array is a good idea, but if one set of nonzero values didn't workaround this issue, a different set probably won't either.

Contributor

rincebrain commented Jun 20, 2016

No, I still claim having it set to nonzero values in a RAID array is a good idea, but if one set of nonzero values didn't workaround this issue, a different set probably won't either.

@raxxy

This comment has been minimized.

Show comment Hide comment
@raxxy

raxxy Jun 20, 2016

Reset again for me, setting this on all disks had no effect.

Might remove the new drive and copy as much data as I can off before catastrophic failure and just rebuild, hopefully removing the new replacement drive cancells the resilver..

raxxy commented Jun 20, 2016

Reset again for me, setting this on all disks had no effect.

Might remove the new drive and copy as much data as I can off before catastrophic failure and just rebuild, hopefully removing the new replacement drive cancells the resilver..

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jun 20, 2016

Contributor

Original problem reporter here...

If your pool is raid-z2 (e.g. @tobias-k), I'd suggest removing the faulty drive from the pool to stop it producing errors that cause the resilver to restart. You still have a remaining parity drive so your data is still safe and this allows you to get back to 2 x parity as quickly as possible.

If you have a faulty drive on raid-z1 you can either remove the faulty drive to get the resilver to complete as quickly as possible (your data is still all there, but you may lose some blocks if another drive has problems during the resilver), or, if you have another spare drive (i.e. in addition to the one you're trying to resilver to), do as I did per my "TLDR" post above and stop the resilvering (e.g. export the pool), then "dd" the faulty drive to the second spare (don't worry too much about occasional errors), remove the faulty drive, then bring the pool back up with the new copied drive, and let the resilver complete. This second approach is a bit safer then simply removing the faulty drive because if another drive has problems during the subsequent resilver it will hopefully be covered by the new copied drive - assuming the new problem is on different blocks to where the original faulty drive was having problems.

Contributor

chrisrd commented Jun 20, 2016

Original problem reporter here...

If your pool is raid-z2 (e.g. @tobias-k), I'd suggest removing the faulty drive from the pool to stop it producing errors that cause the resilver to restart. You still have a remaining parity drive so your data is still safe and this allows you to get back to 2 x parity as quickly as possible.

If you have a faulty drive on raid-z1 you can either remove the faulty drive to get the resilver to complete as quickly as possible (your data is still all there, but you may lose some blocks if another drive has problems during the resilver), or, if you have another spare drive (i.e. in addition to the one you're trying to resilver to), do as I did per my "TLDR" post above and stop the resilvering (e.g. export the pool), then "dd" the faulty drive to the second spare (don't worry too much about occasional errors), remove the faulty drive, then bring the pool back up with the new copied drive, and let the resilver complete. This second approach is a bit safer then simply removing the faulty drive because if another drive has problems during the subsequent resilver it will hopefully be covered by the new copied drive - assuming the new problem is on different blocks to where the original faulty drive was having problems.

@chrisrd chrisrd reopened this Jun 21, 2016

@raxxy

This comment has been minimized.

Show comment Hide comment
@raxxy

raxxy Jun 22, 2016

Removed old disk. Resilver completed, did NOT remove the offlined disk (still stuck there) reboots still causing a resilver restart even though it does complete - never removing the offlined disk.

I have no more data errors (i had 5 files last time, now showing as zero).

I'm just going to complete moving data off array and blow it away.

raxxy commented Jun 22, 2016

Removed old disk. Resilver completed, did NOT remove the offlined disk (still stuck there) reboots still causing a resilver restart even though it does complete - never removing the offlined disk.

I have no more data errors (i had 5 files last time, now showing as zero).

I'm just going to complete moving data off array and blow it away.

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jun 22, 2016

Contributor

@raxxy Can you clarify what you mean by "removed old disk", then "did NOT remove the offlined disk" please? If you mean the old failing disk is still in the machine, and a reboot causes another resilver to start again, what happens if you let the resilver complete and remove the failing disk from the machine and reboot?

Contributor

chrisrd commented Jun 22, 2016

@raxxy Can you clarify what you mean by "removed old disk", then "did NOT remove the offlined disk" please? If you mean the old failing disk is still in the machine, and a reboot causes another resilver to start again, what happens if you let the resilver complete and remove the failing disk from the machine and reboot?

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Jun 23, 2016

Contributor

I believe he means that it claims to be done resilvering/scrubbing, but the device is still marked REPLACING {offline disk, new disk}, and kicks off a resilver anew every reboot.

Contributor

rincebrain commented Jun 23, 2016

I believe he means that it claims to be done resilvering/scrubbing, but the device is still marked REPLACING {offline disk, new disk}, and kicks off a resilver anew every reboot.

@raxxy

This comment has been minimized.

Show comment Hide comment
@raxxy

raxxy Jun 23, 2016

Rincebrain is correct, sorry I was in a rush to leave for work, clearly I should proofread more often.. ;)

Dead disk + new replacement disk -> IO errors triggering resilver re-start loop
Dead disk physically removed + new replacement disk in -> resilver completes but does not ever remove the disk that it was replacing (sits there 'UNAVAIL' or if i reboot, has a long numeric string

Any event to the zfs pool (export/import, reboot) will cause the resilver to re-start, and likely complete but never remove the disk (This is the 30th resilver all up, the first ones were doing the same thing - put disk back in to see if it would fix it - removed disk again so I can get some speed to move the data.)

raxxy commented Jun 23, 2016

Rincebrain is correct, sorry I was in a rush to leave for work, clearly I should proofread more often.. ;)

Dead disk + new replacement disk -> IO errors triggering resilver re-start loop
Dead disk physically removed + new replacement disk in -> resilver completes but does not ever remove the disk that it was replacing (sits there 'UNAVAIL' or if i reboot, has a long numeric string

Any event to the zfs pool (export/import, reboot) will cause the resilver to re-start, and likely complete but never remove the disk (This is the 30th resilver all up, the first ones were doing the same thing - put disk back in to see if it would fix it - removed disk again so I can get some speed to move the data.)

@chrisrd

This comment has been minimized.

Show comment Hide comment
@chrisrd

chrisrd Jun 23, 2016

Contributor

@raxxy OK, I don't know what's going on there. Does your "zpool events -v poolname" have anything interesting in it relating to the scrubs? If you're happy compiling your own ZoL you might try adding in this patch which provides a little more "zpool events" information on the scrub completion: chrisrd/zfs@92da32e

It's also worth bringing up the pool without /etc/zfs/zpool.cache. Something like: export the pool, remove /etc/zfs/zpool.cache, and import again (or reboot and import).

Contributor

chrisrd commented Jun 23, 2016

@raxxy OK, I don't know what's going on there. Does your "zpool events -v poolname" have anything interesting in it relating to the scrubs? If you're happy compiling your own ZoL you might try adding in this patch which provides a little more "zpool events" information on the scrub completion: chrisrd/zfs@92da32e

It's also worth bringing up the pool without /etc/zfs/zpool.cache. Something like: export the pool, remove /etc/zfs/zpool.cache, and import again (or reboot and import).

@behlendorf

This comment has been minimized.

Show comment Hide comment
@behlendorf

behlendorf Jul 15, 2016

Owner

Commit d14fa5d which was merged to master may have resolved this. Has anyone tested with this change?

Owner

behlendorf commented Jul 15, 2016

Commit d14fa5d which was merged to master may have resolved this. Has anyone tested with this change?

@thisisnotmyrealname

This comment has been minimized.

Show comment Hide comment
@thisisnotmyrealname

thisisnotmyrealname Sep 2, 2016

Not fixed, Still getting restarts with drives that have seek errors.

filename: /lib/modules/4.4.0-36-generic/kernel/zfs/zfs/zfs.ko
version: 0.6.5.6-0ubuntu10
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: BAF128A524F0974CFA37071
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.4.0-36-generic SMP mod_unload modversions

[Fri Sep 2 11:31:28 2016] blk_update_request: critical medium error, dev sdz, sector 4106662640
[Fri Sep 2 11:31:33 2016] sd 2:0:20:0: device_block, handle(0x001e)
[Fri Sep 2 11:31:33 2016] mpt2sas_cm2: log_info(0x31110610): originator(PL), code(0x11), sub_code(0x0610)
[Fri Sep 2 11:31:33 2016] mpt2sas_cm2: log_info(0x31110610): originator(PL), code(0x11), sub_code(0x0610)
[Fri Sep 2 11:31:33 2016] sd 2:0:20:0: device_unblock and setting to running, handle(0x001e)
[Fri Sep 2 11:31:34 2016] sd 2:0:20:0: [sdz] tag#1 CDB: Read(16) 88 00 00 00 00 00 f4 c6 b3 70 00 00 00 20 00 00
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: sas_address(0x5003048000d25b63), phy(35)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: enclosure_logical_id(0x5003048000d25b7f),slot(23)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: handle(0x001e), ioc_status(success)(0x0000), smid(162)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: request_len(16384), underflow(16384), resid(16384)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: tag(65535), transfer_count(0), sc->result(0x00000000)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)

blah blah blah

Not fixed, Still getting restarts with drives that have seek errors.

filename: /lib/modules/4.4.0-36-generic/kernel/zfs/zfs/zfs.ko
version: 0.6.5.6-0ubuntu10
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: BAF128A524F0974CFA37071
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.4.0-36-generic SMP mod_unload modversions

[Fri Sep 2 11:31:28 2016] blk_update_request: critical medium error, dev sdz, sector 4106662640
[Fri Sep 2 11:31:33 2016] sd 2:0:20:0: device_block, handle(0x001e)
[Fri Sep 2 11:31:33 2016] mpt2sas_cm2: log_info(0x31110610): originator(PL), code(0x11), sub_code(0x0610)
[Fri Sep 2 11:31:33 2016] mpt2sas_cm2: log_info(0x31110610): originator(PL), code(0x11), sub_code(0x0610)
[Fri Sep 2 11:31:33 2016] sd 2:0:20:0: device_unblock and setting to running, handle(0x001e)
[Fri Sep 2 11:31:34 2016] sd 2:0:20:0: [sdz] tag#1 CDB: Read(16) 88 00 00 00 00 00 f4 c6 b3 70 00 00 00 20 00 00
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: sas_address(0x5003048000d25b63), phy(35)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: enclosure_logical_id(0x5003048000d25b7f),slot(23)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: handle(0x001e), ioc_status(success)(0x0000), smid(162)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: request_len(16384), underflow(16384), resid(16384)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: tag(65535), transfer_count(0), sc->result(0x00000000)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[Fri Sep 2 11:31:34 2016] mpt2sas_cm2: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)

blah blah blah

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Sep 2, 2016

Contributor

@thisisnotmyrealname That's not surprising, none of the 0.6.5.X releases have the commit that @behlendorf mentioned above might fix your issue.

You'd need to either build from git master or just apply that particular commit to 0.6.5.X and build, and see if the problem persists for you.

Contributor

rincebrain commented Sep 2, 2016

@thisisnotmyrealname That's not surprising, none of the 0.6.5.X releases have the commit that @behlendorf mentioned above might fix your issue.

You'd need to either build from git master or just apply that particular commit to 0.6.5.X and build, and see if the problem persists for you.

@thisisnotmyrealname

This comment has been minimized.

Show comment Hide comment
@thisisnotmyrealname

thisisnotmyrealname Sep 9, 2016

Apologies on the previous report. Seems Ubuntu snuck the stock ZFS kernel module in on me.

filename: /lib/modules/4.4.0-36-generic/extra/zfs/zfs.ko
version: 0.6.5-437_gd02ca37
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: E4190FDAA7A1780753688F9
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.4.0-36-generic SMP mod_unload modversions

I checked out the latest git against what I have compiled, and only saw changes to zfs test suite.

status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 9 01:15:26 2016
39.2G scanned out of 84.6T at 5.45M/s, (scan is slow, no estimated time)
1.52G resilvered, 0.05% done

I started a resilver 7 days ago, and it's been restarting since then. Is there any other information I can provide that is useful/will help?

Apologies on the previous report. Seems Ubuntu snuck the stock ZFS kernel module in on me.

filename: /lib/modules/4.4.0-36-generic/extra/zfs/zfs.ko
version: 0.6.5-437_gd02ca37
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: E4190FDAA7A1780753688F9
depends: spl,znvpair,zunicode,zcommon,zavl
vermagic: 4.4.0-36-generic SMP mod_unload modversions

I checked out the latest git against what I have compiled, and only saw changes to zfs test suite.

status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 9 01:15:26 2016
39.2G scanned out of 84.6T at 5.45M/s, (scan is slow, no estimated time)
1.52G resilvered, 0.05% done

I started a resilver 7 days ago, and it's been restarting since then. Is there any other information I can provide that is useful/will help?

@behlendorf behlendorf removed the Bug label Sep 30, 2016

@behlendorf behlendorf removed this from the 0.7.0 milestone Oct 11, 2016

@TyberiusPrime

This comment has been minimized.

Show comment Hide comment
@TyberiusPrime

TyberiusPrime Dec 19, 2016

Can confirm having had a resilver-restarts-over-and-over again issue that disappeared when switching to thursday's zfs-daily ppa. My pool is whole again.

Can confirm having had a resilver-restarts-over-and-over again issue that disappeared when switching to thursday's zfs-daily ppa. My pool is whole again.

@thegreatgazoo

This comment has been minimized.

Show comment Hide comment
@thegreatgazoo

thegreatgazoo Apr 6, 2017

Member

It may have something to do with #5970.

Member

thegreatgazoo commented Apr 6, 2017

It may have something to do with #5970.

@muff1nman

This comment has been minimized.

Show comment Hide comment
@muff1nman

muff1nman Jan 12, 2018

I think I have run into this same issue, but I'm not entirely sure..

I have a raidz2 pool that I had setup with misaligned partitions. I wanted to fix this, so tried offlining out drives one by one, clearing the partition table and then replacing them but instead using the full drive. However, after replacing the drives the status for the resilver process seems to restart and never progresses. For one example, zpool status gave me:

action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 11 18:19:54 2018
        252M scanned out of 6.99T at 12.6M/s, 161h24m to go
        42.3M resilvered, 0.00% done

And a couple seconds later I got:


action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 11 18:20:15 2018
        227M scanned out of 6.99T at 13.4M/s, 152h27m to go
        37.6M resilvered, 0.00% done

Unsure what to do and noticing that it seemed to always cycle when it hit a certain point, I rebooted the server, at which point the resilver started progressing and got past the sticking point. This has happened on both of the drives I have tried repartitioning in my pool. Attached are the logs of the attempt for the second drive zpool_replace.log.redacted.gz (also pasted at https://paste.fedoraproject.org/paste/HbUQxhWB6L7j59FNLCFzHA). I did run a scrub after doing the first drive (which was successful) before going on to the second drive.

The one clue I did find was this in the journal logs around the time of a resilver restart:

Jan 11 18:18:54 ZFSHOST zed[8820]: eid=488 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:55 ZFSHOST zed[9045]: eid=489 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:55 ZFSHOST zed[9047]: eid=490 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST zed[9270]: eid=491 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST zed[9280]: eid=492 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST kernel: Alternate GPT is invalid, using primary GPT.
Jan 11 18:18:56 ZFSHOST kernel:  sda:
Jan 11 18:18:56 ZFSHOST kernel:  sda: sda1 sda9
Jan 11 18:18:56 ZFSHOST zed[9382]: eid=493 class=vdev_attach pool_guid=0x89A3B4EB7BFA7DBF vdev_path=/dev/disk/by-id/ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D1T4V6-part1 vdev_state=ONLINE
Jan 11 18:18:57 ZFSHOST zed[9539]: eid=494 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:57 ZFSHOST zed[9545]: eid=495 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:01 ZFSHOST zed[11291]: eid=496 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:06 ZFSHOST zed[13682]: eid=497 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:20 ZFSHOST zed[18764]: eid=498 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:20 ZFSHOST zed[18773]: eid=499 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:35 ZFSHOST zed[26064]: eid=500 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:35 ZFSHOST zed[26071]: eid=501 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27625]: eid=502 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27627]: eid=503 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27637]: eid=504 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:50 ZFSHOST zed[32020]: eid=505 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:50 ZFSHOST zed[32025]: eid=506 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1317]: eid=507 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1323]: eid=508 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1327]: eid=509 class=history_event pool_guid=0x89A3B4EB7BFA7DBF

EDIT: fixed a log line being chopped off

muff1nman commented Jan 12, 2018

I think I have run into this same issue, but I'm not entirely sure..

I have a raidz2 pool that I had setup with misaligned partitions. I wanted to fix this, so tried offlining out drives one by one, clearing the partition table and then replacing them but instead using the full drive. However, after replacing the drives the status for the resilver process seems to restart and never progresses. For one example, zpool status gave me:

action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 11 18:19:54 2018
        252M scanned out of 6.99T at 12.6M/s, 161h24m to go
        42.3M resilvered, 0.00% done

And a couple seconds later I got:


action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 11 18:20:15 2018
        227M scanned out of 6.99T at 13.4M/s, 152h27m to go
        37.6M resilvered, 0.00% done

Unsure what to do and noticing that it seemed to always cycle when it hit a certain point, I rebooted the server, at which point the resilver started progressing and got past the sticking point. This has happened on both of the drives I have tried repartitioning in my pool. Attached are the logs of the attempt for the second drive zpool_replace.log.redacted.gz (also pasted at https://paste.fedoraproject.org/paste/HbUQxhWB6L7j59FNLCFzHA). I did run a scrub after doing the first drive (which was successful) before going on to the second drive.

The one clue I did find was this in the journal logs around the time of a resilver restart:

Jan 11 18:18:54 ZFSHOST zed[8820]: eid=488 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:55 ZFSHOST zed[9045]: eid=489 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:55 ZFSHOST zed[9047]: eid=490 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST zed[9270]: eid=491 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST zed[9280]: eid=492 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:56 ZFSHOST kernel: Alternate GPT is invalid, using primary GPT.
Jan 11 18:18:56 ZFSHOST kernel:  sda:
Jan 11 18:18:56 ZFSHOST kernel:  sda: sda1 sda9
Jan 11 18:18:56 ZFSHOST zed[9382]: eid=493 class=vdev_attach pool_guid=0x89A3B4EB7BFA7DBF vdev_path=/dev/disk/by-id/ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D1T4V6-part1 vdev_state=ONLINE
Jan 11 18:18:57 ZFSHOST zed[9539]: eid=494 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:18:57 ZFSHOST zed[9545]: eid=495 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:01 ZFSHOST zed[11291]: eid=496 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:06 ZFSHOST zed[13682]: eid=497 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:20 ZFSHOST zed[18764]: eid=498 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:20 ZFSHOST zed[18773]: eid=499 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:35 ZFSHOST zed[26064]: eid=500 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:35 ZFSHOST zed[26071]: eid=501 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27625]: eid=502 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27627]: eid=503 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:39 ZFSHOST zed[27637]: eid=504 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:50 ZFSHOST zed[32020]: eid=505 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:50 ZFSHOST zed[32025]: eid=506 class=config_sync pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1317]: eid=507 class=history_event pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1323]: eid=508 class=resilver_start pool_guid=0x89A3B4EB7BFA7DBF
Jan 11 18:19:54 ZFSHOST zed[1327]: eid=509 class=history_event pool_guid=0x89A3B4EB7BFA7DBF

EDIT: fixed a log line being chopped off

@rincebrain

This comment has been minimized.

Show comment Hide comment
@rincebrain

rincebrain Jan 12, 2018

Contributor

If I had to guess, it's getting confused by the fact that you didn't tell it you were clobbering the old device, and might be stuck trying to relabel/import the "old" drive.

I'm reasonably confident what's happening to you is something is triggering a relabeling of the drive and, when the drive vanishes and reappears, thereby triggering the resilver to restart.

I'm less confident about what is triggering it, but I'd guess that it's eventually noticing the part1 "old" drive reappearing when it relabels, attempting to online it, and in doing so, clobbering the resilver.

I'd use zpool offline to explicitly offline the "old" device, which should probably convince it to stop trying this; if not, then making sure autoreplace=off on the pool should; if that doesn't, then that's another bug, I would think.

(Which ZoL version, distro+kernel version, etc?)

Contributor

rincebrain commented Jan 12, 2018

If I had to guess, it's getting confused by the fact that you didn't tell it you were clobbering the old device, and might be stuck trying to relabel/import the "old" drive.

I'm reasonably confident what's happening to you is something is triggering a relabeling of the drive and, when the drive vanishes and reappears, thereby triggering the resilver to restart.

I'm less confident about what is triggering it, but I'd guess that it's eventually noticing the part1 "old" drive reappearing when it relabels, attempting to online it, and in doing so, clobbering the resilver.

I'd use zpool offline to explicitly offline the "old" device, which should probably convince it to stop trying this; if not, then making sure autoreplace=off on the pool should; if that doesn't, then that's another bug, I would think.

(Which ZoL version, distro+kernel version, etc?)

@muff1nman

This comment has been minimized.

Show comment Hide comment
@muff1nman

muff1nman Jan 12, 2018

I did do a zpool offline as per the log history I attached. Also autoreplace is set to off:

# zpool get all | grep autoreplace
tank  autoreplace                    off                            default

This is version 0.7.5-1 on Centos 7 kernel 3.10.0-693.11.6.el7.x86_64.

I did do a zpool offline as per the log history I attached. Also autoreplace is set to off:

# zpool get all | grep autoreplace
tank  autoreplace                    off                            default

This is version 0.7.5-1 on Centos 7 kernel 3.10.0-693.11.6.el7.x86_64.

@StefanCohen

This comment has been minimized.

Show comment Hide comment
@StefanCohen

StefanCohen Feb 4, 2018

I'm on version 0.7.3-1 and I encountered the same issue yesterday after a "zpool replace ". I noticed that the resilvering restarted repeatedly every ten seconds with a ton of these in the zfs history:

2018-02-04.00:58:42 [txg:7268183] scan setup func=2 mintxg=3 maxtxg=7264178
2018-02-04.00:59:08 [txg:7268189] scan aborted, restarting errors=0

I searched and found the following issue: 6613. It was mentioned that it could be related to zed so i stopped zfs-zed and it resilvered in about an hour after that. I could reproduce the problem on another disk so this might in fact be a bug related to ZED.

I'm on version 0.7.3-1 and I encountered the same issue yesterday after a "zpool replace ". I noticed that the resilvering restarted repeatedly every ten seconds with a ton of these in the zfs history:

2018-02-04.00:58:42 [txg:7268183] scan setup func=2 mintxg=3 maxtxg=7264178
2018-02-04.00:59:08 [txg:7268189] scan aborted, restarting errors=0

I searched and found the following issue: 6613. It was mentioned that it could be related to zed so i stopped zfs-zed and it resilvered in about an hour after that. I could reproduce the problem on another disk so this might in fact be a bug related to ZED.

@as-com

This comment has been minimized.

Show comment Hide comment
@as-com

as-com Mar 12, 2018

I'm using archzfs-git (probably a very recent version) and I was having issues replacing a drive, with these logs in the journal:

Mar 11 22:43:47 hubble zed[19428]: eid=9385 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:50 hubble zed[19788]: eid=9386 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:51 hubble zed[19867]: eid=9387 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:43:51 hubble zed[19892]: eid=9388 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:43:52 hubble zed[19912]: eid=9389 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:52 hubble zed[19916]: eid=9390 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:52 hubble zed[19918]: eid=9391 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:02 hubble zed[20670]: eid=9392 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:06 hubble zed[20941]: eid=9393 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:06 hubble zed[21032]: eid=9394 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:06 hubble zed[21035]: eid=9395 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:07 hubble zed[21053]: eid=9396 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:07 hubble zed[21057]: eid=9397 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:07 hubble zed[21062]: eid=9398 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:18 hubble zed[21754]: eid=9399 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:21 hubble zed[22114]: eid=9400 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:21 hubble zed[22190]: eid=9401 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:22 hubble zed[22201]: eid=9402 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:22 hubble zed[22204]: eid=9403 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:22 hubble zed[22208]: eid=9404 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:22 hubble zed[22210]: eid=9405 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:36 hubble zed[23236]: eid=9406 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:36 hubble zed[23309]: eid=9407 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:42 hubble zed[23562]: eid=9408 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:42 hubble zed[23569]: eid=9409 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:43 hubble zed[23581]: eid=9410 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:43 hubble zed[23584]: eid=9411 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:43 hubble zed[23588]: eid=9412 class=history_event pool_guid=0x38BF9374E2B7FD2D

systemctl stop zfs-zed solved the problem, and resilvering is continuing normally (I hope).

as-com commented Mar 12, 2018

I'm using archzfs-git (probably a very recent version) and I was having issues replacing a drive, with these logs in the journal:

Mar 11 22:43:47 hubble zed[19428]: eid=9385 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:50 hubble zed[19788]: eid=9386 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:51 hubble zed[19867]: eid=9387 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:43:51 hubble zed[19892]: eid=9388 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:43:52 hubble zed[19912]: eid=9389 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:52 hubble zed[19916]: eid=9390 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:43:52 hubble zed[19918]: eid=9391 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:02 hubble zed[20670]: eid=9392 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:06 hubble zed[20941]: eid=9393 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:06 hubble zed[21032]: eid=9394 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:06 hubble zed[21035]: eid=9395 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:07 hubble zed[21053]: eid=9396 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:07 hubble zed[21057]: eid=9397 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:07 hubble zed[21062]: eid=9398 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:18 hubble zed[21754]: eid=9399 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:21 hubble zed[22114]: eid=9400 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:21 hubble zed[22190]: eid=9401 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:22 hubble zed[22201]: eid=9402 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:22 hubble zed[22204]: eid=9403 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:22 hubble zed[22208]: eid=9404 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:22 hubble zed[22210]: eid=9405 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:36 hubble zed[23236]: eid=9406 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:36 hubble zed[23309]: eid=9407 class=config_sync pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:42 hubble zed[23562]: eid=9408 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee262071313-part3
Mar 11 22:44:42 hubble zed[23569]: eid=9409 class=vdev.bad_ashift pool_guid=0x38BF9374E2B7FD2D vdev_path=/dev/disk/by-id/wwn-0x50014ee20cb16a07-part3
Mar 11 22:44:43 hubble zed[23581]: eid=9410 class=history_event pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:43 hubble zed[23584]: eid=9411 class=resilver_start pool_guid=0x38BF9374E2B7FD2D
Mar 11 22:44:43 hubble zed[23588]: eid=9412 class=history_event pool_guid=0x38BF9374E2B7FD2D

systemctl stop zfs-zed solved the problem, and resilvering is continuing normally (I hope).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment