Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Failover Load-balancing #1007

Closed
Tester98 opened this issue Dec 31, 2016 · 21 comments · Fixed by #8825
Closed

Support Failover Load-balancing #1007

Tester98 opened this issue Dec 31, 2016 · 21 comments · Fixed by #8825
Labels
kind/enhancement a new or improved feature. priority/P2 need to be fixed in the future status/5-frozen-due-to-age
Milestone

Comments

@Tester98
Copy link

Tester98 commented Dec 31, 2016

What version of Traefik are you using (traefik version)?

v1.1.2

What is your environment & configuration (arguments, toml...)?

linux, simple file backend

[backends]
  [backends.testing]
    [backends.testing.servers.server1]
    url = "http://primary:80"
    weight = 1
    [backends.testing.servers.server2]
    url = "http://backup:80"
    weight = 1

What did you do?

I want to do load balancing simple active passive. however i want to use server only as backup ( in case of failure of network issues)

What did you expect to see?

Wanted to see a simple case working where failover is supported instead of load-balancing

What did you see instead?

Couldn't figure out how it will work by reading documentation N times.

The use case is simple i have 2 server in single backend, one primary and one backup. However i would like to sent traffic to backup only when primary is not working ( tcp connection could not be made).

I have tested some odd forms, where i add some abnormally high weight to primary and set retry to yes. In this case if primary is not responding, retry again chooses the primary instead of backup. So, retry should be skipping the primary server toactually retry. This is broken.

Can someone help me to configure such as a case ?

@digipigeon
Copy link

I am having the exact same problem. I want to deploy Traefik to fail-over to a different zone (e.g a different data-center).

It actually seems to compound the problem if the weights are set high as it appears that the weights consume retry.attempts.

If I add 8 servers (5 primary, 3 secondary) but all with equal weights, then take the primary ones offline it continues to field requests. But requests will also go to the backup servers.

If I set a higher weight (10) on the primary servers, then take them offline it seems like the amount of retries is "spent" by the weights of the primaries that its trying, so it never gets chance to try the backup servers. (well actually 3/8 requests succeed, as it lands on them first).

If I set the primary weight as 2 and the secondary as 1, then take the primaries offline, the ratio goes to 3/4 successful. The attempts automatically is set to 7, so if it lands on the first server, all 7 retries are spent on the weights of the 5 failed servers. All of the others work.

Lastly if I set the primary weight considerably higher (1000) and a high retry count 10000, and primaries offline, it 50 seconds to complete the first request.

If I was serving a static file then I could set the primary weight to 50 and secondaries to 1 then set 1000 as the retry count, and maybe drr strategy would help. The only problem is if there is an application level failure I have just DoS'd my own system, so no real solution here.

Would love to hear what I am doing wrong, or how I can configure this better. Thanks.

@mattcollier
Copy link
Contributor

A HealthCheck feature was added in v1.2 by way of #918 and #1132

https://github.com/containous/traefik/blob/master/docs/basics.md#backends

@ldez ldez added the kind/question a question label Apr 21, 2017
@ldez ldez added kind/enhancement a new or improved feature. priority/P2 need to be fixed in the future and removed kind/question a question labels Jun 8, 2017
@mr-manuel
Copy link

Is there now a possible configuration for an active/passive failover configuration in v2 without setting the weight on a very high value?

@luklss
Copy link

luklss commented Apr 24, 2020

It would be really nice to have this! Traefik rocks, but I will probably have to choose another solution given this is not supported.

@mr-manuel
Copy link

Can someone address this issue please?

@sparkacus
Copy link

I am in need of a similar feature.

An example I gave in #6856 is of an AWS Target group, in that if there are no healthy services it'll route to all registered instances.

Currently if all instances fail a Consul health check, the route is completely removed.

@rdxmb
Copy link
Contributor

rdxmb commented Jul 27, 2020

I've read all the actual mechanisms available and wondering about how this could be implemented:

  • CircuitBreaker with a customizable Fallback mechanism
  • Supporting a Fallback mechanism within LoadBalancing
  • WeightedRoundRobin with special weight (e.g. a negative integer weight like -1 which will only be used when the positive weights like 1 and 3 are not available.

@jdoss
Copy link

jdoss commented Aug 13, 2020

I've read all the actual mechanisms available and wondering about how this could be implemented:

  • CircuitBreaker with a customizable Fallback mechanism
  • Supporting a Fallback mechanism within LoadBalancing
  • WeightedRoundRobin with special weight (e.g. a negative integer weight like -1 which will only be used when the positive weights like 1 and 3 are not available.

I think the Fallback mechanism within LoadBalancing itself or the WeightedRoundRobin with special weight are the best ideas here. I want to be able to load a static page if my downstream servers are having issues. I can't seem to find a way to do this with Traefik in its current form.

@tvld
Copy link

tvld commented Sep 9, 2020

Would a weight of 0 or -1 not be enough? Simply meaning: never use this server, unless the other(s) fail ?

@rdxmb
Copy link
Contributor

rdxmb commented Sep 10, 2020

The weight in RoundRobin is already defined, I would not try to mix that.

As a user, I would prefer to have the weight within LoadBalancer. Maybe it would be even possible to work with positive numbers here - so you could do things like

5 - (main service)
5 - (second main service)
3 - (fallback service - optional)
1 - (maintenance site)

@mr-manuel
Copy link

Can we push this a little bit more? The first request was almost 4 years ago... There were already good solutions here. Can this be assigned to someone in the Traefik team?

@SantoDE SantoDE self-assigned this Nov 26, 2020
@Ankurkh1
Copy link

Ankurkh1 commented Feb 25, 2021

Any update on this request please?
Traefik is leap and bounds better than any reverse proxy solution available. Yet it is missing something as simple and straightforward as Active/Passive Configuration.
Something which Nginx achieves using simple backup keyword against a server

upstream backend {
server backend1.example.com;
server backup1.example.com backup;
}

The definition of backup as per documentation here https://nginx.org/en/docs/http/ngx_http_upstream_module.html

backup

  • marks the server as a backup server. It will be passed requests when the primary servers are unavailable.

Can you please consider implementing this in some form? Just because of this one functionality, we would have to use NGINX instead of Traefik :(.

@sbrattla
Copy link

I'm very much cheering for this feature as well! The use case is multiple data centers, but I'd like not to send traffic to other regions unless servers in the local region is down.

@Per0x
Copy link

Per0x commented Apr 29, 2021

Any update about this feature? I spent days reading the whole doc and understanding traefik to realize in the end that it was not possible to have such a basic but essential function. For now it seems round-robin only :/

@stefaanv
Copy link

I'm looking for the same feature
Any idea if/when this will be available ?

@jazzmuesli
Copy link

jazzmuesli commented Oct 29, 2021

I wanted to implement blue/green deployments with resilience on a nomad cluster with consul and traefik, I hope my example helps someone.

I have a nomad cluster with blue/green instances in tomcats/docker and consul for service discovery that is used by traefik. Green instances register themselves in consul with tags
traefik.http.routers.website-green.priority=123
traefik.http.routers.website-green.rule=Host("website") || Host("green.website")

similar for blue instance:

traefik.http.routers.website-blue.priority=120
traefik.http.routers.website-blue.rule=Host("website") || Host("blue.website")

This way traefik routes to green instance by default. In case the green instance dies, traefik will route to the blue instance. This mechanism can also be used for blue/green deployments in general: you deploy the blue instance, check it by accessing blue.website and if it's ok, promote it by increasing blue priority to 126 and later deploy green/increase green priority to 124.

@rdxmb
Copy link
Contributor

rdxmb commented Oct 30, 2021 via email

@tobiasb
Copy link

tobiasb commented Nov 5, 2021

How about changing the Retry middleware to retry not the same backend but from a different backend? This would also finally enable us to do zero downtime deployments while using Docker service discovery.

@ldez ldez unassigned SantoDE Nov 5, 2021
@tobiasb
Copy link

tobiasb commented Nov 8, 2021

I wanted to implement blue/green deployments with resilience on a nomad cluster with consul and traefik, I hope my example helps someone.

I have a nomad cluster with blue/green instances in tomcats/docker and consul for service discovery that is used by traefik. Green instances register themselves in consul with tags traefik.http.routers.website-green.priority=123 traefik.http.routers.website-green.rule=Host("website") || Host("green.website")

similar for blue instance:

traefik.http.routers.website-blue.priority=120 traefik.http.routers.website-blue.rule=Host("website") || Host("blue.website")

This way traefik routes to green instance by default. In case the green instance dies, traefik will route to the blue instance. This mechanism can also be used for blue/green deployments in general: you deploy the blue instance, check it by accessing blue.website and if it's ok, promote it by increasing blue priority to 126 and later deploy green/increase green priority to 124.

@jazzmuesli this seems to work great, thanks for sharing ❤️ . In addition to the router I had to also "namespace" the service, otherwise Traefik gets confused about differences in configuration of the same service between different backends.

@mannharleen
Copy link

mannharleen commented Nov 18, 2021

Almost 5 years and no love from the Traefik team. 😕
The workaround from @jazzmuesli does work though

@tobiasb
Copy link

tobiasb commented Nov 18, 2021

@mannharleen @jazzmuesli It didn't end up working for me after all because backends were added before they were healthy, see #8570

@tomMoulard tomMoulard mentioned this issue Mar 8, 2022
2 tasks
@ddtmachado ddtmachado added this to the next milestone Mar 10, 2022
@kevinpollet kevinpollet changed the title Simple Failover Load-balancing Not Possible ? Support Failover Load-balancing Mar 11, 2022
@traefik traefik locked and limited conversation to collaborators Apr 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement a new or improved feature. priority/P2 need to be fixed in the future status/5-frozen-due-to-age
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.