Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sticky sessions under SWARM Mode. #1024 #1033

Merged
merged 2 commits into from Feb 6, 2017

Conversation

foleymic
Copy link

@foleymic foleymic commented Jan 11, 2017

SWARM Mode has it's own built in Load balancer, so if we want to leverage sticky sessions, or if we would just prefer to bypass it and go directly to the containers (aka tasks), via
--label traefik.backend.bypass.swarm.loadbalancer=true
then we need to let Traefik know about the underlying tasks and register them as services within it's backend.

See #1024 for more details

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @foleymic! Thanks a lot for your contribution!
I have few comments:

  • I don't really like the option name traefik.backend.bypass.swarm.loadbalancer ;) It's too long.
  • I would get this option as default, and rename the option to traefik.backend.loadbalancer.swarm instead.
  • Could you complete documentation in doc/?

@vdemeester WDYT?


for _, service := range serviceList {
dockerData := parseService(service, networkMap)

dockerDataList = append(dockerDataList, dockerData)
sticky, _ := strconv.ParseBool(provider.getSticky(dockerData))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this line. We don't need to know the stickiness to bypass or not the swarm LB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because you need to automatically bypass the swarm LB if you want to use sticky sessions. Basically if you explicitly set the flag to bypass or you are using sticky sessions then you should bypass the swarm LB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change. Thanks.

@joke
Copy link

joke commented Jan 11, 2017

Hi,

out of curiosity how does this relate to the swarm dnsrr endpoint mode. From a docker swarm point of view that mode is supposed to be used for loadbalancing. There are is an traefik issues regarding this topic: #833 as well.

@foleymic
Copy link
Author

@emilevauge - thanks for the option name suggestion, I agree that it is way to verbose. Just so I'm clear, with the new name, are you suggesting that when traefik.backend.loadbalancer.swarm=true we use Swarm's internal load balancer (and sticky won't work) and that we make the default to bypass? I think that makes sense and would fit better in with what Traefik is doing, but wanted to make sure this was your intent.

@foleymic
Copy link
Author

@joke - The intention of this PR is to bypass the Docker's swarm enpoint dssrr. Without this change, Traefik will only register the swarm service as a single server backend, even if that service has multiple containers behind it. In that case, there is no way for Traefik to honor sticky sessions because he is always directing traffic to a single VIP. From there, Swarm would do the round robin load balancing between the individual containers/tasks.
Regarding #833, with this option set, it will essentially avoid the issue seen in that ticket, but this is only because you no longer hit the swarm VIP, and let Traefik round robin the requests to the containers.

@emilevauge
Copy link
Member

@foleymic

are you suggesting that when traefik.backend.loadbalancer.swarm=true we use Swarm's internal load balancer (and sticky won't work) and that we make the default to bypass? I think that makes sense and would fit better in with what Traefik is doing, but wanted to make sure this was your intent.

Exactly :)

@foleymic
Copy link
Author

@emilevauge got it. Thanks. I will make the change today.

@foleymic foleymic force-pushed the feature-1024 branch 2 times, most recently from 3e5827c to 59cec1c Compare January 13, 2017 02:07
docs/toml.md Outdated
@@ -795,6 +795,7 @@ Labels can be used on containers to override default behaviour:
- `traefik.backend.maxconn.extractorfunc=client.ip`: set the function to be used against the request to determine what to limit maximum connections to the backend by. Must be used in conjunction with the above label to take effect.
- `traefik.backend.loadbalancer.method=drr`: override the default `wrr` load balancer algorithm
- `traefik.backend.loadbalancer.sticky=true`: enable backend sticky sessions
- `traefik.backend.loadbalancer.swarm=true `: use Swarm's inbuilt load balncer (only relevant under Swarm Mode).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*balancer ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge - thanks for catching that. Fixing it now.

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment ;)
Other than that LGTM!
Could you also rebase your PR?
/cc @containous/traefik

@foleymic
Copy link
Author

Thanks @emilevauge. I rebased and corrected toml.md.

Copy link
Contributor

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments. I need to try this out still 👼

docker-machine ssh manager "docker service create \
--name whoami1 \
--label traefik.port=80 \
--network traefik-net \
--label traefik.frontend.rule=Host:whoam1.traefik \
--label traefik.backend=whoami1 \
--label traefik.backend.loadbalancer.sticky=true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this up-to-date ? (isn't it supposed to be traefik.backend.loadbalance.swarm ?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Default behavior is traefik.backend.loadbalancer.swarm=true, so I left it out here but used the existing sticky flag to demonstrate that we can maintain sticky sessions under swarm mode. Let me know if you think I should include traefik.backend.loadbalancer.swarm=true in the guide as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see 😅

Copy link
Author

@foleymic foleymic Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdemeester - I pushed your other recommended changes.

Thanks,
Mike


func listTasks(ctx context.Context, dockerClient client.APIClient, serviceID string,
serviceDockerData dockerData, networkMap map[string]*dockertypes.NetworkResource) ([]dockerData, error) {
taskList, err := dockerClient.TaskList(ctx, dockertypes.TaskListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use filters here (and thus not get all tasks and having to iterate over it).

filters := filters.NewArgs()
filters.Add("service", serviceID)
taskList, err := dockerClient.TaskList(ctx, dockertypes.TaskListOptions{Filters: filters})
// […]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - I'll make that change.

if task.NetworksAttachments != nil {
dockerData.NetworkSettings.Networks = make(map[string]*networkData)
for _, virtualIP := range task.NetworksAttachments {
networkService := networkMap[virtualIP.Network.ID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be the following

if networkService, present := networkMap[virtualIP.Network.ID]; present {
    // […]
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vdemeester. I'm trying this out now.

@emilevauge emilevauge force-pushed the feature-1024 branch 2 times, most recently from 8ba0b66 to 6acd687 Compare January 24, 2017 20:32
@foleymic
Copy link
Author

@vdemeester - I just wanted to reach out to see if you have any other concerns or changes you would like on this PR.

Copy link
Contributor

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is currently not working as expected, any value for the traefik.backend.loadbalancer.swarm label means true which is wrong. getIsBackendLBSwarm should return the value of the label if present.
  • This changes the default behavior : the swarm load-balancer is disabled by default with this PR. ping @emilevauge @containous/traefik to discuss if that's what we want or not.

When the swarm load balancer is disabled, I'm getting 404 … not sure why…

@@ -463,6 +463,13 @@ func (provider *Docker) getSticky(container dockerData) string {
return "false"
}

func (provider *Docker) getIsBackendLBSwarm(container dockerData) string {
if _, err := getLabel(container, "traefik.backend.loadbalancer.swarm"); err == nil {
return "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's right here. It will always return true, whatever the value of the label is — and false otherwise 😅

Copy link
Author

@foleymic foleymic Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to question that when I noticed that was how the sticky label was handled (getSticky) - I ended up doing the same thing here to be consistent. Shall I correct them both?

I'll look into the 404 error as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh 😓 yeah, I would think it needs to be fixed (both)

@emilevauge
Copy link
Member

@vdemeester

This changes the default behavior : the swarm load-balancer is disabled by default with this PR. ping @emilevauge @containous/traefik to discuss if that's what we want or not.

#1033 (comment)

I think this is a good idea to NOT use Swarm LB by default.

@vdemeester
Copy link
Contributor

I think this is a good idea to NOT use Swarm LB by default.

SGTM, was asking just to make sure 👼

@emilevauge
Copy link
Member

@foleymic we are going to freeze code for v1.2 tomorrow. If you want your PR to be in this release, please address #1033 (review) :)

@foleymic
Copy link
Author

foleymic commented Feb 2, 2017

@emilevauge & @vdemeester - I just pushed the requested changes. I was not able to reproduce the 404 errors that Vincent mentioned. Let me know if you're still seeing it.

@foleymic
Copy link
Author

foleymic commented Feb 3, 2017

@vdemeester - I see the 404 now. If I don't include the front-end rule I get it. I'm looking at it now. Because in non-swarm mode I'm setting the dockerData.name to be the service name + the task slot number (e.g. whoami0.1) so if the front-end rule is not provided traefik will generate one for you off of the this dockerData.name (as container.name). I guess that means it's not matching with the backend somewhere because of this.

I fixed this by adding a new string to the dockerData struct called ServiceName. In swarm mode when using traefik to load balance between containers (traefik.backend.loadbalancer.swarm=false) ServiceName will be the name of the Service, while Name will be the Name of the task (e.g. whoami0.1). If traefik.backend.loadbalancer.swarm=true then ServiceName and Name will be the same.

@emilevauge
Copy link
Member

@foleymic Docker integration tests are failing

@foleymic
Copy link
Author

foleymic commented Feb 3, 2017

@emilevauge Sorry for not catching the test errors before I pushed last night. The issue is that when testing the front end and backend rules, dockerDate struct is manually created and is still relying on the container name versus the service name. Any concerns with me updating the tests to use ServiceName instead?

@emilevauge
Copy link
Member

@foleymic Not sure, seems like a breaking change no ?

@foleymic
Copy link
Author

foleymic commented Feb 3, 2017

@emilevauge Yeah, I was hesitant to suggest it at first. The whole issue I had was due to the way the rules were generated based off the container name versus the service name. Prior to this PR, they were the same in swarm mode because it never went down to the task/container level. Now with this PR, I added ServiceName and left Name to mean be the container name (which is what I thought the original intent was by looking at the code). My first pass at fixing this was to add taskname to the struct and leave name as is, but I was worried about not knowing when we needed the actual container instance (task name) versus the service name. Let me take a look down that path again - it may be a safer approach.

Update - I found a much simpler solution. Just set dockerData.ServiceName to dockerData.Name in parseContainer. Now that I understand the tests a bit better, I will also try to add some swarm mode tests as a seperate PR.

@emilevauge
Copy link
Member

ping @vdemeester :)

@emilevauge emilevauge modified the milestone: 1.2 Feb 6, 2017
Mike Foley added 2 commits February 6, 2017 14:44
SWARM Mode has it's own built in Load balancer, so if we want to leverage sticky sessions,
 or if we would just prefer to bypass it and go directly to the containers (aka tasks), via
	--label traefik.backend.disable.swarm.loadbalancer=true
 then we need to let Traefik know about the underlying tasks and register them as
 services within it's backend.
In Swarm mode with with Docker Swarm’s Load Balancer disabled (traefik.backend.loadbalancer.swarm=false)
service name will be the name of the docker service and name will be the container task name
(e.g. whoami0.1).  When generating backend and fronted rules, we will use service name instead of name if a
rule is not provided.

Initialize dockerData.ServiceName to dockerData.Name to support non-swarm mode.
Copy link
Contributor

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸 🐳 🐝

@emilevauge emilevauge merged commit db63e84 into traefik:master Feb 6, 2017
@ldez ldez added kind/enhancement a new or improved feature. and removed status/2-needs-review labels Apr 29, 2017
@kevtainer
Copy link
Contributor

I'm fairly confused by this ticket. Are we able to use dns-rr mode with the traefik network with swarm lb disabled? I see a reference to #833 but no clear indication whether or not it is resolved by this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants