Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PENDING type to healthchecks #360

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

andythsu
Copy link
Member

@andythsu andythsu commented May 24, 2024

Description

Resolves #222 part 1

Additional context and related issues

Release notes

(X) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* 

@cla-bot cla-bot bot added the cla-signed label May 24, 2024
Copy link
Contributor

@rdsarvar rdsarvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nitpicks, mostly around use of a 'healthy' variable that has > 2 states

@@ -46,7 +47,7 @@ static class LocalStats
{
private int runningQueryCount;
private int queuedQueryCount;
private boolean healthy;
private TrinoHealthStateType healthy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what are your thoughts on having this be heathState instead of healthy?

as well as respective places where healthy is used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. healthy kinda has a binary implication

@@ -34,7 +34,7 @@ public HealthChecker(Notifier notifier)
public void observe(List<ClusterStats> clustersStats)
{
for (ClusterStats clusterStats : clustersStats) {
if (!clusterStats.healthy()) {
if (clusterStats.healthy() == TrinoHealthStateType.UNHEALTHY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: feels weird to read does healthy() = UNHEALTHY ?, would healthState() == UNHEALTHY read better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just clusterStats.Health() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, open to whichever naming - seeing the ‘y’ suffix (to me) on healthy implies boolean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like healthState() == UNHEALTHY

Copy link
Member

@Chaho12 Chaho12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TrinoHealthStateType to TrinoHealthType ? i find it quite a redundant to say healthState as health is state of some status.

@@ -34,7 +34,7 @@ public HealthChecker(Notifier notifier)
public void observe(List<ClusterStats> clustersStats)
{
for (ClusterStats clusterStats : clustersStats) {
if (!clusterStats.healthy()) {
if (clusterStats.healthy() == TrinoHealthStateType.UNHEALTHY) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just clusterStats.Health() ?

Copy link
Member

@Chaho12 Chaho12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve conversation after fixes has been made.
It makes it easier to PR (know that fix has been made for the commend)

LGTM 👍

@@ -68,6 +68,9 @@ public abstract class BaseApp
{
private static final Logger logger = Logger.get(BaseApp.class);
private final ImmutableList.Builder<Module> appModules = ImmutableList.builder();
// this injector reference is needed to use reflection in
// TestGatewayHaSingleBackend and TestGatewayMultipleBackend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using reflection. We should find different ways.

Copy link
Member Author

@andythsu andythsu May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't use reflection another way is to add a timeout to actually run the healthcheck TestGatewayHaSingleBackend and TestGatewayMultipleBackend are integration tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestGatewayMultipleBackend uses the TrinoContainer from TestContainers for trino1 and trino2, which does not finish its startup() until SELECT 1 returns. So the health check should succeed. customBackend is a MockWebServer so you will need to add an endpoint to satisfy the healthcheck. I do not believe you should need to set health status manually through injection

@@ -181,7 +182,7 @@ protected String findBackendForUnknownQueryId(String queryId)
}

// Predicate helper function to remove the backends from the list
// We are returning the unhealthy (not healthy)
// We are returning the unhealthy (not healthState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -68,6 +68,9 @@ public abstract class BaseApp
{
private static final Logger logger = Logger.get(BaseApp.class);
private final ImmutableList.Builder<Module> appModules = ImmutableList.builder();
// this injector reference is needed to use reflection in
// TestGatewayHaSingleBackend and TestGatewayMultipleBackend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestGatewayMultipleBackend uses the TrinoContainer from TestContainers for trino1 and trino2, which does not finish its startup() until SELECT 1 returns. So the health check should succeed. customBackend is a MockWebServer so you will need to add an endpoint to satisfy the healthcheck. I do not believe you should need to set health status manually through injection

Comment on lines +16 to +21
/**
* PENDING is for ui/observability purpose and functionally it's unhealthy
* We should use PENDING when Trino clusters are still spinning up
* HEALTHY is when health checks report clusters as up
* UNHEALTHY is when health checks report clusters as down
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be added to the docs. As to placement I think a section on health checks should be added, and linked to from the routing logic and operation sections. Wdyt @mosabua ?

@andythsu
Copy link
Member Author

andythsu commented Jun 6, 2024

@willmostly

TestGatewayMultipleBackend uses the TrinoContainer from TestContainers for trino1 and trino2, which does not finish its startup() until SELECT 1 returns. So the health check should succeed. customBackend is a MockWebServer so you will need to add an endpoint to satisfy the healthcheck. I do not believe you should need to set health status manually through injection

Sorry for the confusion.

In TestGatewayMultipleBackend and TestGatewaySingleBackend, the clusters are added by calling the post api

The default healthstate when clusters are first added to the gateway is PENDING (and should be). Because PENDING is functionally treated as unhealthy, the test cases will fail (for example,

Request request =
new Request.Builder()
.url("http://localhost:" + routerPort + "/v1/statement")
.addHeader("X-Trino-User", "test")
.post(requestBody)
.build();
) since all clusters' states are still in PENDING when the test cases run. Unless we wait until the first round of healthcheck kicks in and changes the states from PENDING to HEALTHY, no clusters are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Trino gateway health state design
5 participants