Update mmp_delay on skipped/failed writes and on txg syncs #7330

ofaaland · 2018-03-22T06:43:07Z

Description

When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low.

Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host.

Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone.

Remove unnecessary local variable, start. We do not use the start time of the loop iteration.

Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred.

Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize.

Motivation and Context

Under some circumstances, the mmp_delay value written in uberblocks was low due to being out-of-date. In others, the uberblock writes during txg sync were not being taken into account even though their uberblocks act as MMP writes.

Misc fixes to clean up minor issues found during testing.

How Has This Been Tested?

Ran zloop and the mmp portion of the test suite locally. Examined debug logs to verify the spa activity check's import_delay was set appropriately and that the test change more closely matches the real behavior.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

codecov · 2018-03-23T10:49:48Z

Codecov Report

Merging #7330 into master will decrease coverage by 0.05%.
The diff coverage is 87.5%.

@@            Coverage Diff             @@
##           master    #7330      +/-   ##
==========================================
- Coverage   76.38%   76.32%   -0.06%     
==========================================
  Files         329      329              
  Lines      104214   104200      -14     
==========================================
- Hits        79606    79533      -73     
- Misses      24608    24667      +59

Flag	Coverage Δ
#kernel	`76.41% <86.95%> (-0.01%)`	⬇️
#user	`65.73% <83.33%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10adee2...91d4fcf. Read the comment docs.

ofaaland · 2018-04-03T15:54:08Z

@dinatale2 @tonyhutter
Please take a look when you get a chance. Thanks!

tonyhutter · 2018-04-03T17:28:07Z

tests/zfs-tests/tests/functional/mmp/mmp.kshlib

@@ -163,17 +163,29 @@ function mmp_pool_set_hostid # pool hostid
 	return 0
 }

+function activity_check_duration


Can you comment this function's return value, or give it a more verbose name like total_seconds_mmp_has_been_active?

When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low. Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host. Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone. Remove unnecessary local variable, start. We do not use the start time of the loop iteration. Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred. Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize. Signed-off-by: Olaf Faaland <faaland1@llnl.gov>

When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low. Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host. Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone. Remove unnecessary local variable, start. We do not use the start time of the loop iteration. Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred. Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes openzfs#7330

When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low. Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host. Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone. Remove unnecessary local variable, start. We do not use the start time of the loop iteration. Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred. Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7330

ofaaland added the Status: Work in Progress Not yet ready for general review label Mar 22, 2018

ofaaland force-pushed the b_mmp_delay_skipped_writes branch 5 times, most recently from a376570 to 347bc5e Compare March 23, 2018 05:46

ofaaland force-pushed the b_mmp_delay_skipped_writes branch from 347bc5e to 098f086 Compare March 23, 2018 17:28

ofaaland changed the title ~~[WIP] Update mmp_delay on skipped/failed writes and on txg syncs~~ Update mmp_delay on skipped/failed writes and on txg syncs Mar 26, 2018

ofaaland requested review from tonyhutter, behlendorf and dinatale2 March 26, 2018 15:46

ofaaland removed the Status: Work in Progress Not yet ready for general review label Apr 2, 2018

tonyhutter reviewed Apr 3, 2018

View reviewed changes

ofaaland force-pushed the b_mmp_delay_skipped_writes branch from 098f086 to 91d4fcf Compare April 3, 2018 17:42

tonyhutter approved these changes Apr 3, 2018

View reviewed changes

dinatale2 approved these changes Apr 4, 2018

View reviewed changes

behlendorf approved these changes Apr 4, 2018

View reviewed changes

behlendorf merged commit 533ea04 into openzfs:master Apr 4, 2018

ofaaland deleted the b_mmp_delay_skipped_writes branch April 4, 2018 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update mmp_delay on skipped/failed writes and on txg syncs #7330

Update mmp_delay on skipped/failed writes and on txg syncs #7330

ofaaland commented Mar 22, 2018 •

edited

Loading

codecov bot commented Mar 23, 2018 •

edited

Loading

ofaaland commented Apr 3, 2018

tonyhutter Apr 3, 2018

ofaaland Apr 3, 2018

Update mmp_delay on skipped/failed writes and on txg syncs #7330

Update mmp_delay on skipped/failed writes and on txg syncs #7330

Conversation

ofaaland commented Mar 22, 2018 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Mar 23, 2018 • edited Loading

Codecov Report

ofaaland commented Apr 3, 2018

tonyhutter Apr 3, 2018

Choose a reason for hiding this comment

ofaaland Apr 3, 2018

Choose a reason for hiding this comment

ofaaland commented Mar 22, 2018 •

edited

Loading

codecov bot commented Mar 23, 2018 •

edited

Loading