Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a clustering example with Docker Swarm #2589

Merged
merged 3 commits into from Jan 7, 2018

Conversation

jmaitrehenry
Copy link
Contributor

@jmaitrehenry jmaitrehenry commented Dec 18, 2017

What does this PR do?

Add a clustering example in the user-guide documentation

Motivation

The first time I check what the cluster mode do, I can't find the information.
I think it's a really cool feature and adding more information about it, it is important.

More

  • Added/updated documentation

Additional Notes

It's base on my blog post: https://jmaitrehenry.ca/2017/12/15/using-traefik-with-docker-swarm-and-consul-as-your-load-balancer/ but I change the format and rewrite a good part of it for the documentation.

In the beginning, I would like to improve the Clustering/HA user-guide, but, as this example use Docker Swarm and Consul with a specific docker-compose file for Docker Swarm, I think it's better to have a page just for it.

Fixes #1200, #736

Copy link
Member

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 👍 👍 👍

Could you change the base branch to v1.5?

```

For listening on different ports, we need to create an entrypoint for each. The CLI syntaxe is `--entrypoints=Name:a_name Address:an_ip_or_empty:a_port options`.
If you want to redirect traffic from one entrypoint to another, it's the option Redirect.EntryPoint:entrypoint_name`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing `

For more information about challenge: [Automatic Certificate Management Environment (ACME)](https://github.com/ietf-wg-acme/acme/blob/master/draft-ietf-acme-acme.md#tls-with-server-name-indication-tls-sni)

## Prerequisites
You will need a working Docker Swarm cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add one empty line between all titles and the content

@jmaitrehenry
Copy link
Contributor Author

@ldez Done!

Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jmaitrehenry.

Many thanks for this really useful PR! 👍
Sure this kind of example will help a lot of users!!!

I have few remarks.
In particulary the part about ACME + KV store.
A PR allows fixing the behavior and is enable in the v1.5-rc3 version.


Why we need Traefik in cluster mode? Running multiple instances should work out of the box?

If you don't use Let's Encrypt with Traefik, you may not need Traefik cluster/HA. But, if you use Let's Encrypt, you need to store certificates somewhere shared by all the Traefik instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not agree with If you don't use Let's Encrypt with Traefik, you may not need Traefik cluster/HA.
IMHO, you can use cluster mode to share configuration, TLS certificates. Not only ACME certifcates.

WDYT? Can you change this sentence?

Can you split in two lines please? One line = one sentence


What Traefik should do:
- Listen to 80 and 443
- Redirect HTTP traffic to HTTPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPS

--acme.email=contact@mydomain.ca
```

Let's Encrypt need 3 parameters: en entrypoint to listen on, a storage for certificates, and en email for the registration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs, s/en entrypoint to listen on/an entryPoint to listen to/


Let's Encrypt need 3 parameters: en entrypoint to listen on, a storage for certificates, and en email for the registration.

For activing Let's Encrypt support, you need to add `--acme` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/For activing/To enable


For activing Let's Encrypt support, you need to add `--acme` flag.

Now, Traefik need to know where to store the certificates, we can choose between a key in a Key-Value store, or a file path: `--acme.storage=my/key` or `--acme.storage=/path/to/acme.json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs

[...]
```

If you have some update to do, update the initializer service and re-deploy it. The new configuration will be store on Consul, and you need to restart the Traefik node: `docker service update --force traefik_traefik`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/will be store on Consul/will be stored in Consul
s/Traefik/Træfik

Can you split in two lines please? One line = one sentence


If you have some update to do, update the initializer service and re-deploy it. The new configuration will be store on Consul, and you need to restart the Traefik node: `docker service update --force traefik_traefik`.

## Complete Docker compose file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of Full docker-compose file?

version: "3.4"
services:
traefik_init:
image: traefik:1.4@sha256:9c299d9613cb01564c8219f4bc56ecc55f30d8f06d35cf3ecf83a85426c13225
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use træfik 1.5-rc3 please?

depends_on:
- consul
traefik:
image: traefik:1.4@sha256:9c299d9613cb01564c8219f4bc56ecc55f30d8f06d35cf3ecf83a85426c13225
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use træfik 1.5-rc3 please?

- "--consul"
- "--consul.endpoint=consul:8500"
- "--consul.prefix=traefik"
- "--acme.storage=traefik/acme/account"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete this argument?

@jmaitrehenry
Copy link
Contributor Author

jmaitrehenry commented Dec 21, 2017

Hello @nmengin thanks for your feedback.
I have a question about using traefik 1.5-rc3. If this page will merge with the release of traefik-1.5 why not just use 1.5 in the documentation?

Else, we will need to update the documentation once the 1.5 version will be out.

@nmengin
Copy link
Contributor

nmengin commented Dec 21, 2017

@jmaitrehenry

Yes you're right, 1.5 is better than the full version name 👍

@jmaitrehenry
Copy link
Contributor Author

I made the changes asked.

One of the change is:

-If you don't use Let's Encrypt with Traefik, you may not need Traefik cluster/HA.
-But, if you use Let's Encrypt, you need to store certificates somewhere shared by all the Traefik instances.
+If you want to use Let's Encrypt with Traefik, sharing configuration or TLS certificates, you need Traefik cluster/HA.

What do you think?

@nmengin
Copy link
Contributor

nmengin commented Dec 22, 2017

+If you want to use Let's Encrypt with Traefik, sharing configuration or TLS certificates, you need Traefik cluster/HA.

Can you add sharing configuration or TLS certificates betwwen many Træfik instances?

@jmaitrehenry
Copy link
Contributor Author

@nmandery done!

Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jmaitrehenry and happy new year ;)

The PR SGTM but I still have few little comments.


This guide explains how to use Træfik in high availability mode in a Docker Swarm and with Let's Encrypt.

Why we need Traefik in cluster mode? Running multiple instances should work out of the box?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need

Running multiple instances should work out of the box?
Not sure to understand your sentence. Do you mean How a cluster shoud work out of the box? ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running multiple instances should work out of the box?

It's because you can run multiple instances of Traefik and it works but you miss some stuff like sharing challenge for LE and more. And it's what I want to explain in this guide: you can't just start traefik instances and hope it will magically work with LE.

--acme.email=contact@mydomain.ca
```

Let's Encrypt needs 3 parameters: an entryPoint to listen to, a storage for certificates, and en email for the registration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/en email/an email


Now, Traefik needs to know where to store the certificates, we can choose between a key in a Key-Value store, or a file path: `--acme.storage=my/key` or `--acme.storage=/path/to/acme.json`.

For your email and the entrypoints, it's `--acme.entryPoint` and `--acme.email` flags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/entrypoints/entryPoint/

--docker.domain=mydomain.ca \
--docker.watch
```
To enable docker and support, you need to add `--docker` and `--docker.swarmmode` flags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the word swarm-mode is missing?

--docker.watch
```
To enable docker and support, you need to add `--docker` and `--docker.swarmmode` flags.
To enable watch docker changes, add `--docker.watch`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about simplying the sentence : To watch docker changes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of: To watch docker events ?

@jmaitrehenry
Copy link
Contributor Author

@nmengin Happy new year 🎉!

I made the change asked, but, for the last one, I add a comment:

WDYT about simplifying the sentence: To watch docker changes?

WDYT of: To watch docker events?

@jmaitrehenry
Copy link
Contributor Author

@wryfi you right, it's really focussed on Docker Swarm, it's on the name of the menu: Traefik cluster example with Swarm (I could prefix Swarm with Docker).

And the title of the guide show it clearly:

+# Clustering / High Availability on Docker Swarm with Consul
 +
 +This guide explains how to use Træfik in high availability mode in a Docker Swarm and with Let's Encrypt.

Could we make another guide for another use-case? Sure we can, but I never build it without docker and I prefer write about something I build and tested before :) .

I don't speak about the virtual IP or how Traefik is exposed and why I expose the port on host-mode and I could complete the guide. I write it based on my blog post where I give more detail about the architecture. You can read it here: https://jmaitrehenry.ca/2017/12/15/using-traefik-with-docker-swarm-and-consul-as-your-load-balancer/

But, maybe we can merge this one first and complete after, what do you think @nmengin @mmatur @ldez ?

@wryfi
Copy link

wryfi commented Jan 4, 2018

@jmaitrehenry sure, no arguments from me. I put my comments here because the title of the issue is "DOCS - Add a clustering example." So maybe if this ticket is about a specific docker example, the title could be updated to reflect reality. ;)

@jmaitrehenry jmaitrehenry changed the title DOCS - Add a clustering example DOCS - Add a clustering example with Docker Swarm Jan 4, 2018
@jmaitrehenry
Copy link
Contributor Author

@wryfi Right, I just updated the title :)

@dtomcej
Copy link
Contributor

dtomcej commented Jan 4, 2018

@wryfi I understand why you would think that the docker method would be different however, there are more similarities than you might think:

  1. The binary in the docker container is the same binary you would build for your linux hosts.
  2. The configuration file (toml) used for the docker binary is the same for the linux binary you would build for your linux hosts.
  3. The recommendations in regards to the kv store for acme HA storage is exactly the same for both.

The issues you are raising are in regards to the configuration of the network upstream of the binary (aka the IP/etc that traefik listens on). Whether you choose containerization (and use docker/kubernetes), Virtualization (and use vms), or baremetal (and use VRRP [keepalived/pacemaker etc]), the upstream networking is up to you or your devop/sysadmin.

Being that Traefik is at its core A modern HTTP reverse proxy and load balancer made to deploy microservices with ease the number of people that will run this as a bare binary are probably few. Being that we have over 5 million pulls on our docker image, I could easily hazard the guess that this is the more popular route.

To answer your question in regards to pacemaker, sure, you could float an IP through that, you could also round robin between nodes, as traefik clusters in an active-active state. You could also use VRRP and use keepalived to float IPs from node to node. All will work, but again, we will probably not come out with any "recommended" guide for baremetal installations, due to the specificity of baremetal installations and networking requirements that are abstracted when using containers.

Finally, I don't feel that there is any reason that Traefik cannot or should not be marketed as having built-in HA. We understand that currently to implement the full featureset (including letsencrypt) you will require a KV store to handle the transactional creation and updating of letsencrypt certificates, but for those people that want HA, running a KV store should not be out of the realm of possibility.

I would like to give a huge 🎉 to @jmaitrehenry for writing this, as it has been a much requested document.

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
:shipit:

@wryfi
Copy link

wryfi commented Jan 4, 2018

@dtomcej I think you take my comments a bit too critically. I just came here trying to understand how your software works.

IMHO, more precise use of the terms "highly available" and "clustered" would be helpful for new users trying to understand what your software does (and does not). Those terms are not interchangeable. The concept of HA encompasses the entire service architecture, including the networking layer. Clustering or shared configuration is a requirement for an HA architecture, but does not itself provide high availability.

I did not come here to disparage your project, express any opinion about Docker, or tell @jmaitrehenry to change his documentation. I posted here to ask what features your code supports, because it is not clear in the existing docs.

That github does not provide a better forum for this type of question is another (unfortunate) issue.

Thanks!

@dtomcej
Copy link
Contributor

dtomcej commented Jan 4, 2018

@wryfi I apologize for the trite response, we get a fair bit of naysaying through these tickets sometimes, and I might have misinterpreted your intentions for your comments.

You would not believe how many times we get told "traefik doesn't work for my use case X, therefore you can't say it has feature Y".

You are correct on all accounts about the difference between HA and clustering. Hopefully having better documentation (such as the current ticket) will allow us to move forward to better implementations, and use cases for further development.

Sorry again for the argumentative tone of my previous response.

Please hit us up on slack if you would like to further discuss your use case (I have run Traefik in production without docker as well).

Thanks!

@wryfi
Copy link

wryfi commented Jan 4, 2018

@dtomcej thanks for your reply, and no worries. I understand the trolling that projects get.

Assuming my preliminary tests are positive, you will likely see more of me (maybe even some contributions). Cheers! :-)

@ldez ldez changed the title DOCS - Add a clustering example with Docker Swarm Add a clustering example with Docker Swarm Jan 5, 2018
Copy link
Member

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@actraiser
Copy link

Hi there, thanks for the docs on the integration Swarm/Consul/Traefik. I have set up Traefik on Docker Swarm in HA mode with Consul as KV-Store as described.

It works fine when starting the setup for the first time. However, when I need to reboot the host that runs the consul container, e.g. on critical system updates, then consul is not able to find a a leader after reboot and remains in a loop and does not recover. In that case Traefik is not able to fetch the ACME certificates anymore and all my https-clients are unreachable.

I can reset everything by deleting all my consul data and rebuild the stack but then the previously stored traefik configuration-key in consul including the certificates is of course empty and traefik restarts requesting all the Let's Encrypt certificates one by one once again.

So my question is with the documented setup how to properly restart a consul container on docker swarm so consul will actually get properly back into action when restarted, that is, selects itself as a leader and serves the existing kv-store.

Greets -act

@jmaitrehenry
Copy link
Contributor Author

@actraiser You right, the problem you have is that consul is not HA in this example. You need to have a 3 node cluster and restart one consul at a time.

I had this problem myself in a not HA cluster (1 consul node).

@jmaitrehenry
Copy link
Contributor Author

@actraiser check this compose-file for consul: https://gist.github.com/jmaitrehenry/40d8272f622a45ecca53cefa16362fb5

It create a 3 nodes cluster but, restricted to a single node each with a local volume.

If you already have a distributed volume driver like rexray, ceph, nfs, or something else, you can change the volume definition and the placement constraint.

@actraiser
Copy link

@jmaitrehenry Thank you but I switched to etcd where the cluster deployment process into Docker Swarm went out of the box much smoother than using Consul.

@trajano
Copy link

trajano commented Sep 18, 2018

Can this be updated to 1.7?

@jjsaunier
Copy link

Why do we still this traefik_init ... We won't that, why it could not simply work with k/v store directly? Instead of start traefik with json file, and a traefik "init" that push json file into kv. It's so complicated. There is a way to simplify that ? Direct bind K/v to traefik without that init vodoo hack traefik init. Traefik could not init itself when booting with kv options enabled ?

@trajano
Copy link

trajano commented Sep 18, 2018

I won't mind the hack provided it works in swarm mode. To date I haven't gotten traefik working in HA mode in Swarm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants