-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report long delays in writing MMP blocks #7212
Conversation
b8cf5bc
to
9c01b92
Compare
9c01b92
to
5a63955
Compare
8b57257
to
94bb49e
Compare
include/sys/mmp.h
Outdated
* MMP_TREE_RO At least one leaf was found, but none were writeable. | ||
*/ | ||
typedef enum mmp_error { | ||
MMP_WRITE_TREE_RO = ENETDOWN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tethering these to EHWPOISON + 1
might be a better choice since then they won't overlap with any legitimate existing errno
for the architecture. Mind you this value will vary between architectures but that's also true for ENETDOWN
. Alternately, you could start they at say 500
which is above most values I'm aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to returning bitmap instead as discussed.
module/zfs/mmp.c
Outdated
|
||
*error = 0; | ||
if (out_vd == NULL) { | ||
if (fail_mask && MMP_FAIL_WRITE_PENDING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be &
not &&
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block of code removed entirely
module/zfs/mmp.c
Outdated
if (out_vd == NULL) { | ||
if (fail_mask && MMP_FAIL_WRITE_PENDING) | ||
*error = MMP_WRITE_PENDING; | ||
else if (fail_mask && MMP_FAIL_WRITE_PENDING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be &
not &&
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block of code removed entirely
module/zfs/mmp.c
Outdated
*/ | ||
|
||
static vdev_t * | ||
mmp_random_leaf(vdev_t *in_vd, int *error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong, since we're already changing this prototype let's make it a bit more conventional.
static int mmp_random_leaf(vdev_t *in_vd, vdev_t **out_vd);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
module/zfs/mmp.c
Outdated
vd = mmp_random_leaf(spa->spa_root_vdev, &error); | ||
|
||
mutex_enter(&mmp->mmp_io_lock); | ||
|
||
if (vd == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a brief comment before this block describing the skipped history entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
module/zfs/spa_stats.c
Outdated
(u_longlong_t)smh->mmp_kstat_id, (u_longlong_t)smh->txg, | ||
(u_longlong_t)smh->timestamp, (longlong_t)smh->io_error, | ||
(longlong_t)smh->duration, (u_longlong_t)smh->mmp_delay, | ||
(u_longlong_t)smh->vdev_guid, (u_longlong_t)smh->vdev_label, | ||
(smh->vdev_path ? smh->vdev_path : "-")); | ||
(smh->vdev_path ? smh->vdev_path : "-")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] accidentally added extra white space before smh->vdev_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
module/zfs/spa_stats.c
Outdated
* Set duration in existing "skip" record to how long we have waited for a leaf | ||
* vdev to become available. | ||
* | ||
* Important that we start search at the end of the list where new records are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/at the end of the list/at the head of the list/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
module/zfs/spa_stats.c
Outdated
@@ -730,6 +730,7 @@ typedef struct spa_mmp_history { | |||
char *vdev_path; | |||
uint64_t vdev_label; /* vdev label */ | |||
int io_error; /* error status of MMP write */ | |||
uint64_t error_start; /* hrtime of start of error period */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be hrtime_t
.
It looks like |
43089bc
to
21ce6e8
Compare
Re: the ztest failure, I am not taking any form of config lock while walking the tree to count leaves. Looks like SCL_VDEV is for exactly that purpose, but I propose I do that in a separate patch with any other locking changes that I find are required as I continue to investigate MMP. OK? |
21ce6e8
to
8dd07e8
Compare
I'll add a test to trigger the "skipped MMP write" entries in multihost_history, which should address the code coverage. Consider me duly chastised. |
module/zfs/mmp.c
Outdated
* issued mmp write: records time issued, error status, etc. | ||
* skipped mmp write: an mmp write could not be issued because no | ||
* suitable leaf vdev was available. See comment | ||
* above struct spa_mmp_history for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] formating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
8dd07e8
to
46def72
Compare
Refreshed the patch with changes to address feedback. |
include/sys/mmp.h
Outdated
MMP_WRITE_PENDING, | ||
MMP_WRITE_OTHER | ||
} mmp_error_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deadcode, please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
module/zfs/mmp.c
Outdated
static int | ||
mmp_random_leaf(vdev_t *in_vd, vdev_t **out_vd) | ||
{ | ||
int error = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe int error_mask
would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
46def72
to
99d6842
Compare
If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
99d6842
to
cf117a6
Compare
Rebased on master |
Codecov Report
@@ Coverage Diff @@
## master #7212 +/- ##
==========================================
+ Coverage 76.35% 76.53% +0.17%
==========================================
Files 327 327
Lines 103874 103922 +48
==========================================
+ Hits 79314 79536 +222
+ Misses 24560 24386 -174
Continue to review full report at Codecov.
|
Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7212
If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes openzfs#7212
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes openzfs#7212
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Fix "--enable-code-coverage" debug build openzfs#6674 - Update codecov.yml openzfs#6669 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7212
Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes openzfs#7212
Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes openzfs#7212
Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7212
Description
Check time taken to acquire
SCL_STATE
inmmp_write_uberblock()
and if that time is greater than 1/10 ofMMP_MIN_INTERVAL
report itviz zfs_dbgmsg()
.Add an entry to multihost_history when no leaf can be found to write an MMP block to, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex:
0x1 At least one vdev (interior or leaf) was not writeable.
0x2 At least one writeable leaf vdev was found, but it had a pending MMP write.
timestamp = the time in seconds since the epoch when no leaf could be found originally.
duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period.
vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred.
For records of skipped MMP writes the right-most column, vdev_path, is
reported as "-".
Sample output, truncated to fit:
Motivation and Context
To assist troubleshooting #7118 and similar problems.
How Has This Been Tested?
I ran ztest locally. I also loaded the modules, imported a pool, injected delays via zinject, and extracted multihost_history. I confirmed that the error value reported and the time spans reported were consistent with the injected delays and with the timestamps and durations of the successful MMP writes.
Types of changes
Checklist:
Signed-off-by
.