Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend deadman logic #6999

Closed
wants to merge 3 commits into from
Closed

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Dec 28, 2017

Description

The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include:

  • Added a new zfs_deadman_failmode module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic.

    • wait - Wait for the "hung" I/O (default)
    • continue - Attempt to recover from a "hung" I/O
    • panic - Panic the system
  • Added a new zfs_deadman_ziotime_ms module option which is analogous to zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio.

  • The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected.

  • The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging.

  • The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event.

  • The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device
    but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs.

  • The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s.

  • Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on.

Motivation and Context

Add some of the needed infrastructure to make it possible to root cause ztest "hangs" observed during automated testing. With this change applied at least basic debugging information will be collected for any "hangs". This change can be further augmented with improvements to the debugging infrastructure.

Issue #6901.

How Has This Been Tested?

Locally by running zloop.sh in-tree for approximated 4 days. Over this time period the deadman behaved as expected and properly terminated ztest when it appeared to be hung. Further analysis of the debug logs and cores obtained is still needed. The expectation is they will provide some statistical insight in the most often observed failures.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Dec 28, 2017
Copy link
Contributor

@dweeezil dweeezil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf I'll get this reviewed within the next day. Based on a reading of the commit comments, it all sounds very good to me. The one thing I looked for right away was the setting of zfs_deadman_synctime_ms which I'm glad to see you lowered. Do you think we should have yet another module parameter to differentiate between the spa deadman and the vdev deadman?

@behlendorf
Copy link
Contributor Author

@dweeezil thanks, it's appreciated. Any additional testing you can offer or recommended values for the tunings would be welcome.

Do you think we should have yet another module parameter to differentiate between the spa deadman and the vdev deadman?

I'm not sure I follow. What did you have in mind?

module/zfs/zio.c Outdated

if (error == -1) {
uint64_t delta = gethrtime() - zio->io_queued_timestamp;
if (delta > spa_deadman_ziotime(zio->io_spa))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to add an additional check here such that zio_deadman() can only run once for a hung zio. Or alternately it runs at the lower frequency of zfs_deadman_ziotime_ms intervals, instead of every zfs_deadman_checktime_ms which is what the code does now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to leave this as-is such that it's consistent with the zfs_deadman_synctime_ms behavior. The document ion has been updated accordingly.

Copy link
Contributor

@dweeezil dweeezil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM save for some additional documentation we ought to have as indicated in the individual comments.

I think the whole deadman logic is sufficiently important that it might eventually warrant its own man page, maybe zfs_deadman(5), to explain the overall system.

Also, in the commit comment regarding zio_interrupt(), we ought to have a mention of something like "flaky hardware" as potentially being a cause of missing completion events.

@@ -37,6 +37,7 @@ extern "C" {
#define FM_EREPORT_ZFS_IO "io"
#define FM_EREPORT_ZFS_DATA "data"
#define FM_EREPORT_ZFS_DELAY "delay"
#define FM_EREPORT_ZFS_DEADMAN "deadman"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to add this to the zfs-events(5) man page and also probably beef up the existing documentation for the plain "delay" event. In particular, we'll want to mention its (the plain 'delay' event) interaction with zfs_deadman_ziotime_ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I also opted simply to expand on the deadman behavior in the existing modules man page rather than add a new one.

MSEC2NSEC(zfs_deadman_synctime_ms);

(void) poll(NULL, 0, (int)NSEC2MSEC(delta));
total += zfs_deadman_synctime_ms / 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this after the "overdue = " line below to better align with the upstream code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

* continue - Attempt to recover from a "hung" I/O
* panic - Panic the system
*/
char *zfs_deadman_failmode = "wait";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the rationale for using "wait" as the default instead of "continue" (which would probably be more useful) is not to change the current behavior (much)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That and possible that the "continue" behavior might have side effects or result in additional instability. We can revisit the default setting once we're more comfortable with the proposed recovery logic. But for now let's keep the core behavior the same aside from improving the logging.

int error = 0;

if (strcmp(failmode, "wait") == 0)
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_WAIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the default should be to set to ZIO_FAILURE_MODE_WAIT and then this can become a void-returning function and the return value check in spa_add() can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@sempervictus
Copy link
Contributor

Hasnt blown up yet in zloop on a 4.9.74-unofficial_grsec kernel (kernexec and such tend to catch runtime issues pretty well, nothing yet).

@behlendorf
Copy link
Contributor Author

Refreshed. After discovering one more possible way ztest could hang, fixed by 733a041, I'm no longer seeing the deadman fail to take action when a deadlock is encountered. Locally zloop.sh has been running for 7 days and it's always been able to use gdb to automatically collect the requested debugging from the core dump.

@behlendorf behlendorf removed the Status: Work in Progress Not yet ready for general review label Jan 12, 2018
@behlendorf behlendorf force-pushed the zio-deadman branch 2 times, most recently from 6e09952 to c8b3299 Compare January 17, 2018 18:44
@codecov
Copy link

codecov bot commented Jan 18, 2018

Codecov Report

Merging #6999 into master will decrease coverage by 0.02%.
The diff coverage is 70.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6999      +/-   ##
==========================================
- Coverage   75.37%   75.34%   -0.03%     
==========================================
  Files         296      296              
  Lines       95539    95633      +94     
==========================================
+ Hits        72014    72059      +45     
- Misses      23525    23574      +49
Flag Coverage Δ
#kernel 74.64% <78.87%> (-0.17%) ⬇️
#user 67.55% <28.08%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bc4a23...8adbd54. Read the comment docs.

@behlendorf behlendorf force-pushed the zio-deadman branch 3 times, most recently from 954520f to 8adbd54 Compare January 23, 2018 00:01
behlendorf added a commit to behlendorf/zfs-buildbot that referenced this pull request Jan 23, 2018
Save debugging information automatically collected from the last
three core dumps generated by ztest.  This change does not modify
the existing behavior when those log files are missing.

Requires openzfs/zfs#6999.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f8.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Requires-spl: refs/pull/674/head
TEST_ZTEST_TIMEOUT=3600
The zdb(8) command may not terminate in the case where the pool
gets suspended and there is a caller in zio_wait() blocking on
an outstanding read I/O that will never complete.  This can in
turn cause ztest(1) to block indefinitely despite the deadman.

Resolve the issue by setting the default failure mode for zdb(8)
to panic.  In user space we always want the command to terminate
when forward progress is no longer possible.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
In order to debug issues encounted by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Requires-spl: refs/pull/674/head
TEST_ZTEST_TIMEOUT=3600
Copy link
Contributor

@tcaputi tcaputi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch looks great to me. 2 minor nitpicks, but otherwise I can't wait to get these changes on our systems.

\fBzfs_deadman_synctime_ms\fR milliseconds, continue to check for slow
operations every \fBzfs_deadman_checktime_ms\fR milliseconds.
Check time in milliseconds. This defines the frequency at which we check
for hung I/O and invoke the \fBzfs_deadman_failmode\fR behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and potentially invoke....

export SYNCTIME_DEFAULT=600000
export ZIOTIME_DEFAULT=300000
export CHECKTIME_DEFAULT=60000
export FAILMODE_DEFAULT="wait"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be detected just by cating the /sys/ directory? Might make it a bit more future-proof. Just a nitpick.

behlendorf added a commit to behlendorf/zfs that referenced this pull request Jan 25, 2018
The zdb(8) command may not terminate in the case where the pool
gets suspended and there is a caller in zio_wait() blocking on
an outstanding read I/O that will never complete.  This can in
turn cause ztest(1) to block indefinitely despite the deadman.

Resolve the issue by setting the default failure mode for zdb(8)
to panic.  In user space we always want the command to terminate
when forward progress is no longer possible.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
behlendorf added a commit to behlendorf/zfs that referenced this pull request Jan 25, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
dinatale2 pushed a commit to openzfs/zfs-buildbot that referenced this pull request Jan 25, 2018
Save debugging information automatically collected from the last
three core dumps generated by ztest.  This change does not modify
the existing behavior when those log files are missing.

Requires openzfs/zfs#6999.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f8.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
The zdb(8) command may not terminate in the case where the pool
gets suspended and there is a caller in zio_wait() blocking on
an outstanding read I/O that will never complete.  This can in
turn cause ztest(1) to block indefinitely despite the deadman.

Resolve the issue by setting the default failure mode for zdb(8)
to panic.  In user space we always want the command to terminate
when forward progress is no longer possible.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 31, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999

Conflicts:
	scripts/zloop.sh
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Feb 1, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999

Conflicts:
	scripts/zloop.sh
tonyhutter pushed a commit that referenced this pull request Feb 6, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6999

Conflicts:
	scripts/zloop.sh
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f8.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
The zdb(8) command may not terminate in the case where the pool
gets suspended and there is a caller in zio_wait() blocking on
an outstanding read I/O that will never complete.  This can in
turn cause ztest(1) to block indefinitely despite the deadman.

Resolve the issue by setting the default failure mode for zdb(8)
to panic.  In user space we always want the command to terminate
when forward progress is no longer possible.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
dweeezil pushed a commit to dweeezil/zfs that referenced this pull request Apr 9, 2018
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f8.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
dweeezil pushed a commit to dweeezil/zfs that referenced this pull request Apr 10, 2018
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f8.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6999
@behlendorf behlendorf deleted the zio-deadman branch April 19, 2021 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants