Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket inconsistency #1858

Closed
jp83 opened this issue Jul 11, 2017 · 11 comments
Closed

websocket inconsistency #1858

jp83 opened this issue Jul 11, 2017 · 11 comments

Comments

@jp83
Copy link

jp83 commented Jul 11, 2017

Do you want to request a feature or report a bug?

bug

What did you do?

I first tried to get support on stackoverflow, but I think there's something else going on.

Post here for ref, but I'll restate below
https://stackoverflow.com/questions/44981861/websockets-in-traefik-opening-handshake-timed-out

I'm trying to get home-assistant in docker running behind traefik and I think I've had similar issues with other apps using websockets in the past (linuxserver/unifi controller and https://github.com/Division42LLC/division42llc-dotnet-webca/issues/2). The front page loads but when I try to login it just says connecting... and keeps spinning.

I open up the debugger in chrome and see this:

WebSocket connection to 'ws://home-assistant.docker.int.xxx.com/api/websocket' failed: WebSocket opening handshake timed out

If I pass the port through directly to the host (i.e. http://docker.int.xxx.com:8123) it works.

What did you expect to see?

websockets pass through successfully for app to work

What did you see instead?

page loads but no connection to backend through websocket

Output of traefik version: (What version of Traefik are you using?)

v1.3.2, v1.3.3

What is your environment & configuration (arguments, toml, provider, platform, ...)?

Docker in a Photon VM on esxi 5.5
Version: 1.13.1 API version: 1.26 (minimum version 1.12) Go version: go1.8.1 Git commit: 092cba3 Built: Fri May 5 02:08:33 2017 OS/Arch: linux/amd64

docker run -d --name="home-assistant" \
  --restart=always \
  -v $PWD/config:/config \
  -v /etc/localtime:/etc/localtime:ro \
  --network=traefik-net \
  -p 8123:8123 \
  --label traefik.port=8123 \
  --label traefik.docker.network=traefik-net \
  homeassistant/home-assistant
debug = true
logLevel = "debug"

defaultEntryPoints = ["http", "https", "ws", "wss"]
[entryPoints]
  [entryPoints.http]
  address = ":80"

  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]

InsecureSkipVerify = true

#... acme and some manual backends ...

[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "docker.int.xxx.com"

If applicable, please paste the log output in debug mode (--debug switch)

time="2017-07-08T01:58:18Z" level=debug msg="Writing outgoing Websocket request to target connection: &{Method:GET URL:ws:/api/websocket Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Cache-Control:[no-cache] Connection:[Upgrade] User-Agent:[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36] Accept-Language:[en-US,en;q=0.8] Sec-Websocket-Version:[13] Cookie:[__zlcmid=eqgBjgiVtncb4A; _ga=GA1.2.930888288.1496028630] Sec-Websocket-Extensions:[permessage-deflate; client_max_window_bits] Accept-Encoding:[gzip, deflate] Origin:[http://home-assistant.docker.int.xxx.com] Upgrade:[websocket] Sec-Websocket-Key:[FsZu8u+pr9IQDAc3xZYiVQ==] Pragma:[no-cache]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:home-assistant.docker.int.xxx.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:192.168.2.1:49364 RequestURI:/api/websocket TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc4206ce4c0}"
@jp83
Copy link
Author

jp83 commented Jul 11, 2017

I have also since tried reverting back to traefik:v1.2.1 where it sounds like sockets were supposed to be working and fixed other peoples' problems. I simply stopped, rm'ed, and restarted this image label, otherwise same setup. Home-assistant now magically works. Interestingly portainer (which I've seen others ask about) doesn't work now, but did for me in v1.3.3. This is what the chrome debugger has to say, but didn't see anything in traefik log.

GET http://portainer.docker.int.xxx.com/api/docker/1/version 500 (Internal Server Error)
(anonymous) @ app.48f07597.js:2
o @ app.48f07597.js:2
(anonymous) @ app.48f07597.js:2
(anonymous) @ app.48f07597.js:3
$eval @ app.48f07597.js:3
$digest @ app.48f07597.js:3
$apply @ app.48f07597.js:3
i @ app.48f07597.js:2
u @ app.48f07597.js:2
y.onload @ app.48f07597.js:2

Since there seems to be inconsistencies between the two different apps working I'm suspicious of traefik.

@ViViDboarder
Copy link
Contributor

I'm seeing the exact same behaviour with the following traefik versions:

Version:      v1.3.3
Codename:     raclette
Go version:   go1.8.3
Built:        2017-07-06_05:12:20PM
OS/Arch:      linux/arm
Version:      v1.3.2
Codename:     raclette
Go version:   go1.8.3
Built:        2017-06-29_04:31:40PM
OS/Arch:      linux/arm

However it does work for the following!

Version:      v1.3.1
Codename:     raclette
Go version:   go1.8.3
Built:        2017-06-16_11:00:34AM
OS/Arch:      linux/arm

So if looking to narrow down what is breaking the ws request is somewhere between v1.3.1 and v1.3.2.

Also worth noting that at v1.3.1, both Home Assistant and Portainer work just fine.

@ViViDboarder
Copy link
Contributor

In a quick look at the change log, the only thing that looks like it could be relevant would be #1782, however I have no clue if that's actually the culprit.

Trying to spin up my dev environment again to test...

@ViViDboarder
Copy link
Contributor

Confirmed.

Issue reproduces with commit f99f3b9 (Merge of #1782) but on previous commit, fe4d0e9 (actually the prep commit for the v1.3.1 tag) the websockets work just fine.

So it's definitely something in that patch. I'm not very familiar with websockets myself, so hopefully that tips someone off.

@ViViDboarder
Copy link
Contributor

Logs from my test for the failed version (f99f3b9)

traefik_1  | time="2017-07-12T05:38:31Z" level=debug msg="Writing outgoing Websocket request to target connection: &{Method:GET URL:ws:/api/websocket Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Connection:[keep-alive, Upgrade] Dnt:[1] Pragma:[no-cache] User-Agent:[Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0] Sec-Websocket-Key:[OT+dTXPXMylaXf+sG5l1VQ==] Sec-Websocket-Version:[13] Cache-Control:[no-cache] Accept-Encoding:[gzip, deflate] Origin:[http://hass.composevalidation.docker.local] Upgrade:[websocket] Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8] Accept-Language:[en-US,en;q=0.5] Sec-Websocket-Extensions:[permessage-deflate]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:hass.composevalidation.docker.local Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:172.23.0.1:44102 RequestURI:/api/websocket TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc4205a92c0}" 
hass_1     | 2017-07-11 22:38:31 INFO (MainThread) [homeassistant.components.http] Serving /api/websocket to 172.23.0.2 (auth: True)

And for the working version (fe4d0e9):

hass_1     | 2017-07-11 22:44:22 INFO (MainThread) [homeassistant.components.http] Serving /api/websocket to 172.23.0.2 (auth: True)
traefik_1  | time="2017-07-12T05:44:22Z" level=debug msg="Writing outgoing Websocket request to target connection: &{Method:GET URL:ws:/api/websocket Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[User-Agent:[Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0] Cache-Control:[no-cache] Pragma:[no-cache] Sec-Websocket-Version:[13] Accept-Language:[en-US,en;q=0.5] Upgrade:[websocket] Accept-Encoding:[gzip, deflate] Sec-Websocket-Key:[jYis0ls37xa1XvZgKjxjIg==] Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8] Sec-Websocket-Extensions:[permessage-deflate] Connection:[keep-alive, Upgrade] Origin:[http://hass.composevalidation.docker.local] Dnt:[1]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:hass.composevalidation.docker.local Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:172.23.0.1:45684 RequestURI:/api/websocket TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc42031e340}" 

They look just the same... This is with debug on and logging at debug level. I'm not sure how to view the logs from oxy since it looks like the only change would be if resp.StatusCode == http.StatusSwitchingProtocols was false. I'm seeing any logs from that context in my Docker logs.

@timoreimann
Copy link
Contributor

@ViViDboarder thanks for the hard work in bisecting and narrowing down the cause of the problem. 👏

Likely related to containous/oxy#17 then. /cc @juliens

@ldez ldez added kind/bug/confirmed a confirmed bug (reproducible). priority/P0 needs hot fix and removed status/0-needs-triage labels Jul 12, 2017
@juliens juliens self-assigned this Jul 12, 2017
@Korrd
Copy link

Korrd commented Jul 14, 2017

Any idea when will the next version be released? We had to downgrade to 1.2.3 due to this issue, and that version freezes when our swarm has too many stacks on it, forcing us to restart the service.

@ldez
Copy link
Member

ldez commented Jul 14, 2017

@Korrd before release we need to fix 😉 (a PR coming soon).
This affect 1.3.2 and 1.3.3 you can downgrade to 1.3.1.

Seems you have another problem (freezes with swarm), come explain your problem on the Traefik community Slack channel.

@Korrd
Copy link

Korrd commented Jul 14, 2017

Cool. Thanks! I'll give 1.3.1 a try.

The issue we are having with 1.2.3 occurs when we provision or deprovision many stacks at a time, some of the services from each stack having traefik labels. I'm still doing some research, and will present a reproducible case once I can properly pinpoint it.

@seblucas
Copy link

I can also confirm that using 1.3.1 fixes the problem (no need to go down to 1.2.X)

@ldez
Copy link
Member

ldez commented Jul 28, 2017

closed by #1896

@ldez ldez closed this as completed Jul 28, 2017
@ldez ldez added this to the 1.3 milestone Jul 28, 2017
@ldez ldez removed the priority/P0 needs hot fix label Jul 28, 2017
@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants