-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pool health /proc entry, "SUSPENDED" pools #7563
Conversation
module/zcommon/zfs_comutil.c
Outdated
@@ -207,10 +208,53 @@ const char *zfs_history_event_names[ZFS_NUM_LEGACY_HISTORY_EVENTS] = { | |||
"pool split", | |||
}; | |||
|
|||
#if defined(_KERNEL) | |||
/* Dummy gettext() for kernel builds */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred method is to use:
#if defined(_KERNEL)
#define gettext(x) x
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest push
b9e2785
to
c98aa0b
Compare
lib/libzfs/libzfs_pool.c
Outdated
* "SUSPENDED", etc). | ||
*/ | ||
const char * | ||
zpool_get_health_str_from_zhp(zpool_handle_t *zhp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd suggest dropping the _from_zhp
suffix.
module/zcommon/zfs_comutil.c
Outdated
@@ -207,10 +208,48 @@ const char *zfs_history_event_names[ZFS_NUM_LEGACY_HISTORY_EVENTS] = { | |||
"pool split", | |||
}; | |||
|
|||
#if defined(_KERNEL) | |||
#define gettext(x) x | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid allowing the localization logic to seep down in to the shared user/kernel code. Which means in this case we're going to have to live with a little code duplication. I'd suggest moving the zpool_state_to_name
back to where it was and adding a new spa_state_to_name(spa_t *)
function without gettext
to spa_misc.c
. This has the additional advantage of not disrupting the existing libzfs
API.
module/zfs/spa_stats.c
Outdated
{ | ||
return (spa_suspended(spa) && (spa_get_failmode(spa) | ||
!= ZIO_FAILURE_MODE_CONTINUE)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding spa_state_to_name
you might find it clearer to move this check and the associated strlcpy
in to the new function. Then you wouldn't need a conditional at all in spa_health_data
.
@behlendorf thanks for looking at it. I included your changes in my latest push. |
bb0e94f
to
214c865
Compare
2558e24
to
0783f1b
Compare
str = gettext("FAULTED"); | ||
} else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT || | ||
status == ZPOOL_STATUS_IO_FAILURE_MMP) { | ||
str = gettext("SUSPENDED"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since suspended state is in addition to other states, do we need a separate kstat for suspension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The kstat is there to be a quick "is the pool healthy?" check for pacemaker to use. If you wanted to know why the pool was suspended, you could always look at the zpool status
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish it were that easy, in the zpool command suspension message is sent to stderr, so most screen-scrapers miss it.
An implementation could be as easy as exposing the value of spa->spa_suspended and it can wait for another PR.
From a practical implementation of HA perspective, it is not always desirable to failover if a pool is suspended. Hence we do want to know why, in addition to the rest of the state, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the zpool command suspension message is sent to stderr, so most screen-scrapers miss it.
@richardelling which zpool status
suspension message are you referring too? My recollection is that this has always been a point of confusion for users because a pool may be suspended yet appear totally healthy in the zpool status
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is confusing :-( zfs_standard_error[_fmt]()
has many callers and buried in there is the case where users are notified over stderr that the pool is suspended. In any case, I think exposing spa_suspended can be useful, but belongs in another PR
module/zfs/spa_misc.c
Outdated
return ("DEGRADED"); | ||
case VDEV_STATE_HEALTHY: | ||
return ("ONLINE"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra whitespace
include/spl/sys/kstat.h
Outdated
@@ -69,6 +69,7 @@ | |||
#define KSTAT_FLAG_WRITABLE 0x04 | |||
#define KSTAT_FLAG_PERSISTENT 0x08 | |||
#define KSTAT_FLAG_DORMANT 0x10 | |||
#define KSTAT_FLAG_NO_HEADERS 0x20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep this aligned with the Illumos ones for consistency and add yours to the end.
#define KSTAT_FLAG_VIRTUAL 0x01
#define KSTAT_FLAG_VAR_SIZE 0x02
#define KSTAT_FLAG_WRITABLE 0x04
#define KSTAT_FLAG_PERSISTENT 0x08
#define KSTAT_FLAG_DORMANT 0x10
#define KSTAT_FLAG_INVALID 0x20
#define KSTAT_FLAG_LONGSTRINGS 0x40
@behlendorf my latest push fixes your last two comments. |
include/spl/sys/kstat.h
Outdated
@@ -69,6 +69,9 @@ | |||
#define KSTAT_FLAG_WRITABLE 0x04 | |||
#define KSTAT_FLAG_PERSISTENT 0x08 | |||
#define KSTAT_FLAG_DORMANT 0x10 | |||
#define KSTAT_FLAG_INVALID 0x20 | |||
#define KSTAT_FLAG_LONGSTRINGS 0x40 | |||
#define KSTAT_FLAG_NO_HEADERS 0x60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/0x60/0x80
include/spl/sys/kstat.h
Outdated
@@ -69,6 +69,9 @@ | |||
#define KSTAT_FLAG_WRITABLE 0x04 | |||
#define KSTAT_FLAG_PERSISTENT 0x08 | |||
#define KSTAT_FLAG_DORMANT 0x10 | |||
#define KSTAT_FLAG_INVALID 0x20 | |||
#define KSTAT_FLAG_LONGSTRINGS 0x40 | |||
#define KSTAT_FLAG_NO_HEADERS 0x60 | |||
#define KSTAT_FLAG_UNSUPPORTED \ | |||
(KSTAT_FLAG_VAR_SIZE | KSTAT_FLAG_WRITABLE | \ | |||
KSTAT_FLAG_PERSISTENT | KSTAT_FLAG_DORMANT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also remove KSTAT_FLAG_UNSUPPORTED
and it's one use too in __kstat_create
lib/libspl/include/sys/kstat.h
Outdated
@@ -304,6 +304,8 @@ typedef struct kstat32 { | |||
#define KSTAT_FLAG_PERSISTENT 0x08 | |||
#define KSTAT_FLAG_DORMANT 0x10 | |||
#define KSTAT_FLAG_INVALID 0x20 | |||
#define KSTAT_FLAG_LONGSTRINGS 0x40 | |||
#define KSTAT_FLAG_NO_HEADERS 0x60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/0x60/0x80
str = gettext("FAULTED"); | ||
} else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT || | ||
status == ZPOOL_STATUS_IO_FAILURE_MMP) { | ||
str = gettext("SUSPENDED"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the zpool command suspension message is sent to stderr, so most screen-scrapers miss it.
@richardelling which zpool status
suspension message are you referring too? My recollection is that this has always been a point of confusion for users because a pool may be suspended yet appear totally healthy in the zpool status
output.
Codecov Report
@@ Coverage Diff @@
## master #7563 +/- ##
==========================================
+ Coverage 77.44% 77.53% +0.08%
==========================================
Files 361 362 +1
Lines 110322 110338 +16
==========================================
+ Hits 85442 85551 +109
+ Misses 24880 24787 -93
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good after the ZTS fix.
@@ -0,0 +1,5 @@ | |||
pkgdatadir = $(datadir)/@PACKAGE@/zfs-tests/tests/functional/health |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be functional/kstat instead of functional/health, which caused them to be installed in the wrong location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll fix that
Would it be possible to use this to generate a proc file like mdstat, maybe /proc/zfsstat that has some appreciated info on pool status? maybe things like # vdevs and/or members, health, scrubbing vs resilvering, degraded, etc? rpool: scrubbing [22%] zraid1 sdf1 sdj1 sda1 + mirror sdn1 sdd1 Something more compact than zpool status but more than zpool list. |
@beren12 can you start a conversation on the mail list regarding zpool status and procfs? |
One more last minute change - I'm going to rename the proc name from
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I have one question, about zpool_get_health_str().
I agree with the plan to call the kstat 'state' instead of 'health'.
|
||
if (zpool_get_state(zhp) == POOL_STATE_UNAVAIL) { | ||
str = gettext("FAULTED"); | ||
} else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the contents of status really valid here? I don't see an assignment or a pointer being passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good gravy. The assignment is right there. Nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#7331
Can this be added to the 0.7.10 todo list? |
1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#7331 Closes openzfs#7563
1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#7331 Closes openzfs#7563
1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#7331 Closes openzfs#7563
1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#7331 Closes openzfs#7563
JFYI Updated ZFS resource agent is released in resource-agents v4.2.0 ClusterLabs/resource-agents@2bdeee4#diff-2f9687bda2dc6253e000b30aaea222d9 |
Description
Add a proc entry to display the pool's health:
This is done without using the spa config locks, so it will never hang.
Also, fix
zpool status
andzpool list -o health
output to printSUSPENDED
instead ofONLINE
for suspended pools.Motivation and Context
#7331
How Has This Been Tested?
Added test case
Types of changes
Checklist:
Signed-off-by
.