-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PENDING type to healthchecks #360
base: main
Are you sure you want to change the base?
Conversation
gateway-ha/src/main/java/io/trino/gateway/ha/resource/EntityEditorResource.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small nitpicks, mostly around use of a 'healthy' variable that has > 2 states
@@ -46,7 +47,7 @@ static class LocalStats | |||
{ | |||
private int runningQueryCount; | |||
private int queuedQueryCount; | |||
private boolean healthy; | |||
private TrinoHealthStateType healthy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what are your thoughts on having this be heathState
instead of healthy
?
as well as respective places where healthy
is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. healthy
kinda has a binary implication
@@ -34,7 +34,7 @@ public HealthChecker(Notifier notifier) | |||
public void observe(List<ClusterStats> clustersStats) | |||
{ | |||
for (ClusterStats clusterStats : clustersStats) { | |||
if (!clusterStats.healthy()) { | |||
if (clusterStats.healthy() == TrinoHealthStateType.UNHEALTHY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: feels weird to read does healthy() = UNHEALTHY ?
, would healthState() == UNHEALTHY
read better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe just clusterStats.Health()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, open to whichever naming - seeing the ‘y’ suffix (to me) on healthy implies boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like healthState() == UNHEALTHY
gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/TrinoHealthStateType.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about TrinoHealthStateType
to TrinoHealthType
? i find it quite a redundant to say healthState as health is state of some status.
@@ -34,7 +34,7 @@ public HealthChecker(Notifier notifier) | |||
public void observe(List<ClusterStats> clustersStats) | |||
{ | |||
for (ClusterStats clusterStats : clustersStats) { | |||
if (!clusterStats.healthy()) { | |||
if (clusterStats.healthy() == TrinoHealthStateType.UNHEALTHY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe just clusterStats.Health()
?
gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStatsInfoApiMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve conversation after fixes has been made.
It makes it easier to PR (know that fix has been made for the commend)
LGTM 👍
gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/TrinoHealthStateType.java
Outdated
Show resolved
Hide resolved
@@ -68,6 +68,9 @@ public abstract class BaseApp | |||
{ | |||
private static final Logger logger = Logger.get(BaseApp.class); | |||
private final ImmutableList.Builder<Module> appModules = ImmutableList.builder(); | |||
// this injector reference is needed to use reflection in | |||
// TestGatewayHaSingleBackend and TestGatewayMultipleBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using reflection. We should find different ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't use reflection another way is to add a timeout to actually run the healthcheck TestGatewayHaSingleBackend
and TestGatewayMultipleBackend
are integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestGatewayMultipleBackend
uses the TrinoContainer
from TestContainers
for trino1
and trino2
, which does not finish its startup()
until SELECT 1
returns. So the health check should succeed. customBackend
is a MockWebServer
so you will need to add an endpoint to satisfy the healthcheck. I do not believe you should need to set health status manually through injection
gateway-ha/src/main/java/io/trino/gateway/ha/resource/EntityEditorResource.java
Outdated
Show resolved
Hide resolved
gateway-ha/src/main/java/io/trino/gateway/ha/router/RoutingManager.java
Outdated
Show resolved
Hide resolved
@@ -181,7 +182,7 @@ protected String findBackendForUnknownQueryId(String queryId) | |||
} | |||
|
|||
// Predicate helper function to remove the backends from the list | |||
// We are returning the unhealthy (not healthy) | |||
// We are returning the unhealthy (not healthState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
gateway-ha/src/test/java/io/trino/gateway/ha/router/TestStochasticRoutingManager.java
Outdated
Show resolved
Hide resolved
gateway-ha/src/test/java/io/trino/gateway/ha/router/TestStochasticRoutingManager.java
Outdated
Show resolved
Hide resolved
@@ -68,6 +68,9 @@ public abstract class BaseApp | |||
{ | |||
private static final Logger logger = Logger.get(BaseApp.class); | |||
private final ImmutableList.Builder<Module> appModules = ImmutableList.builder(); | |||
// this injector reference is needed to use reflection in | |||
// TestGatewayHaSingleBackend and TestGatewayMultipleBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestGatewayMultipleBackend
uses the TrinoContainer
from TestContainers
for trino1
and trino2
, which does not finish its startup()
until SELECT 1
returns. So the health check should succeed. customBackend
is a MockWebServer
so you will need to add an endpoint to satisfy the healthcheck. I do not believe you should need to set health status manually through injection
/** | ||
* PENDING is for ui/observability purpose and functionally it's unhealthy | ||
* We should use PENDING when Trino clusters are still spinning up | ||
* HEALTHY is when health checks report clusters as up | ||
* UNHEALTHY is when health checks report clusters as down | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be added to the docs. As to placement I think a section on health checks should be added, and linked to from the routing logic and operation sections. Wdyt @mosabua ?
Sorry for the confusion. In The default healthstate when clusters are first added to the gateway is trino-gateway/gateway-ha/src/test/java/io/trino/gateway/ha/TestGatewayHaSingleBackend.java Lines 65 to 70 in d45c64f
PENDING when the test cases run. Unless we wait until the first round of healthcheck kicks in and changes the states from PENDING to HEALTHY , no clusters are available.
|
Description
Resolves #222 part 1
Additional context and related issues
Release notes
(X) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
*