Skip to content

warning should not be a fail status#4537

Merged
traefiker merged 5 commits intotraefik:v1.7from
saez0pub:master
Aug 5, 2019
Merged

warning should not be a fail status#4537
traefiker merged 5 commits intotraefik:v1.7from
saez0pub:master

Conversation

@saez0pub
Copy link

I would like to have a behavior like the DNS provider of consul.
Even if the service is warning, It is considered as UP.

Moreover, another difference is that if any service not served by traefik is in warning state, traefik remove it from consul catalog.

What does this PR do?

Change failure detection for consul catalog

Motivation

A warning on a service on a node disable all frontends of this node on consul

More

  • Not Added/updated tests
  • Not Added/updated documentation

Additional Notes

I have tested it manualy, I don't know what test I could write.
related to #4536

@ldez
Copy link
Contributor

ldez commented Apr 5, 2019

Hello @trondhindenes what's your opinion about this PR ?

@trondhindenes
Copy link

If I understand this PR correctly, it makes the following change:
Previously, Traefik would treat a backend node as "unhealthy" if it was in any of the following states: critical, maintenance
This PR changes the behavior so that the following states disables the backend node in Traefik: critical, warning, maintenance

Assuming I understand it correctly, I don't have an opinion - we don't use the warning status for anything internally.
It is interesting tho. For example if a backend contains 10 nodes and 2 of them are in warning while 8 are healthy I would probably prefer to only send traffic to the 8 healthy ones. However, if all 10 nodes were in a warning state, I would prefer traffic to be sent to all of them.

Traefik doesn't currently have any logic to decide routing based on the health of individual nodes in the backend (I'm pretty sure HaProxy has something like this), so for now I guess it comes down to whether or not to treat "warning" as a failure state or not. I don't really have a strong opinion, but since "warning" is a degraded state, I guess backwards compatibility should be most important (meaning this setting shouldn't be changed just like that). I guess the natural thing to do is to introduce a config setting to control the behavior of the "warning" state but keep the default as-is.

That's just my two cents :-)

@pgrange
Copy link

pgrange commented Apr 8, 2019

Hi,

This is the way consul manages healthyness: while a node has all its checks either passing or warning, the node is part of the service. Only when a check becomes critical or the node is in maintenance, is the node unable to serve.

Since it is the way consul handles consul checks, I would say this is the way they have to be handled. So I believe this is the way traefik has to handle consul checks.

Regarding backward compatibility, I would be very surprised that anyone using consul, with the way it handles checks, would be badly impacted by this pr since it reflects the expected behavior.

See also #4536

Regards,

@ldez
Copy link
Contributor

ldez commented Apr 8, 2019

Thank you for this PR.

Sorry for the late answer.

Could you rebase this pull request on v1.7 branch?

@saez0pub saez0pub changed the base branch from master to v1.7 April 9, 2019 08:19
@saez0pub saez0pub force-pushed the master branch 2 times, most recently from 30e74d1 to 813ad93 Compare April 9, 2019 08:26
@saez0pub
Copy link
Author

saez0pub commented Apr 9, 2019

Hello, I have rebased on 1.7.

Thank you.

@saez0pub
Copy link
Author

saez0pub commented Jul 9, 2019

Hello,

Any status update ?

@mmatur
Copy link
Member

mmatur commented Jul 9, 2019

Hello @saez0pub,

Sorry for the late reply. Could you please, like suggested by @trondhindenes, add a parameter to keep the default as-is.

@saez0pub
Copy link
Author

Hello,

I've added an option, and tested it manually.

with the option not set (by default WarningIsCritical = true)
Start :

  • passing : in backend
  • warning : not in backend
  • critical : not in backend

Node Check state transition :

  • passing > warning : removed from backend
  • passing > critical : removed from backend
  • warning > passing : added in backend
  • warning > critical : stil not in backend
  • critical > passing : added in backend
  • critical > warning : stil not in backend

with the consulcatalog option WarningIsCritical = false
Start :

  • passing : in backend
  • warning : in backend
  • critical : not in backend

Node Check state transition :

  • passing > warning : still in backend
  • passing > critical : removed from backend
  • warning > passing : still in backend
  • warning > critical : removed from backend
  • critical > passing : added in backend
  • critical > warning : added in backend

This option is by default on the old behaviour.

I suggest you to match consul's logic in the next major release and invert the default value of this parameter. Using a consul catalog provider and changing the healthyness detection is something hard for users comming from consul.

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

Copy link
Member

@mmatur mmatur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @saez0pub 👏
LGTM

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
:shipit:

saez0pub and others added 5 commits August 5, 2019 15:24
I would like to have a behavior like the DNS provider of consul.
Even if the service is warning, It is considered as UP.

Moreover, another difference is that if any service not served by traefik is in warning state, traefik remove it from consul catalog.
If you start a service with warning status, it was not be added unless a critical status occurs

If your node has a critcal service then warning, it was not added unless OK

Now warning is considered as passing if the option is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants