Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure each provider keeps frontends even if backends are missing/empty #1689

Open
2 of 3 tasks
timoreimann opened this issue May 31, 2017 · 7 comments
Open
2 of 3 tasks
Labels
kind/proposal a proposal that needs to be discussed. priority/P2 need to be fixed in the future

Comments

@timoreimann
Copy link
Contributor

timoreimann commented May 31, 2017

In order to achieve #1688, all providers must leave frontends around even if backends are missing or empty.

Kubernetes already does this correctly, Marathon presumably not; we need to go through each and fix things where needed. This could be as simple as updating the template to making changes to the provider implementation.

It's worth noting that for the case where a provider API isn't available temporarily, all implementations should back off and let Traefik reuse the previous configuration. Thus, there seem to be only two cases to still get a 404 when we wanted a 503 instead, namely:

  1. A fresh Traefik instance is started while the provider API is currently unavailable.
  2. The provider conceptually cannot yield empty frontends when backends are missing or unavailable.

List to providers that are passing the requirement already (please extend as progress is made):

  • Kubernetes
  • Marathon
  • Docker

Refs #1077.

@timoreimann timoreimann added the kind/enhancement a new or improved feature. label May 31, 2017
@ldez ldez added kind/proposal a proposal that needs to be discussed. and removed kind/enhancement a new or improved feature. labels May 31, 2017
@timoreimann
Copy link
Contributor Author

Case 1 seems to be somewhat of a corner case. It could be remedied by means of exposing an "unready" state until the API has been accessed at least once successfully. An external process could then leverage the state information to mark the bootstrapping as incomplete until the ready state turns successful.

This should be considered only if we really deem it necessary, and if so be covered by a separate issue.

@timoreimann
Copy link
Contributor Author

Case 2 seems to be one that Consul is affected by, at least that's my understanding from #1077 #1077 (comment).

@grobinson-blockchain and @bsphere, can you confirm? If so, we can continue working towards a solution. One idea that @emilevauge had was to use custom error pages as designed in #1634 and have a special backend return just the desirable error code. That'd keep Traefik free from special-casing for providers that cannot be changed according to our needs.

@timoreimann
Copy link
Contributor Author

Found to be working for Marathon as well.

@beniwtv
Copy link

beniwtv commented Sep 7, 2017

"It's worth noting that for the case where a provider API isn't available temporarily, all implementations should back off and let Traefik reuse the previous configuration. Thus, there seem to be only two cases to still get a 404 when we wanted a 503 instead, ..."

There seems to be one more case of 404s. I am evaluating a 3-node cluster, and doing outage testing. Currently, if Consul has a small hiccup (say, due to a network issue) and returns error 500, Traefik currently removes all configuration, which makes the services running on it unavailable for a few seconds, returning a 404, until Consul can recover.

This is on Traefik v1.3.7.

@urosgruber
Copy link

Any news or is there a plan to do anything around consul. I'm also seeing the same problem is backend goes away and service is still registered in consul. Traefik returns 404.

@mback2k
Copy link

mback2k commented Jul 10, 2019

@timoreimann Has there been any update regarding this issue and Docker Swarm? Currently the temporary 404 errors break my WebDAV sync clients connected to a Nextcloud instance running in the Docker Swarm behind Traefik. This also applies to standalone containers with a health check.

To me it looks like this code completely filters out unhealthy containers and therefore causes 404 errors to be returned for them instead of 502/503. Are there any plans / possibilities to change this behavior in v1.7? Same applies to this code in the master/v2 branches.

breunigs added a commit to breunigs/traefik that referenced this issue Apr 30, 2020
Without it, any service that is unavailable will make Traefik return
a 404 or use the next matching IngressRoute, instead of returning a
503 as would be expected.

Related tickets:
traefik#1689
traefik#5332
@jinnatar
Copy link

jinnatar commented Jun 3, 2021

Is there any reference to follow up on why this is impossible with Consul? Upstream bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/proposal a proposal that needs to be discussed. priority/P2 need to be fixed in the future
Projects
Status: Done
Development

No branches or pull requests

6 participants