Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket handshake errors #2078

Closed
eskp opened this issue Sep 7, 2017 · 22 comments
Closed

Websocket handshake errors #2078

eskp opened this issue Sep 7, 2017 · 22 comments
Assignees

Comments

@eskp
Copy link

eskp commented Sep 7, 2017

Do you want to request a feature or report a bug?

Bug.

Possibly related to #1835 and #1900

What did you do?

Upgrade Traefik to 1.3.6 from 1.3.1 and watch websocket connection no longer working.

Our applications POST request times out with 504 on the new version. 1.3.1 works, returning successfully.

What did you expect to see?

No errors.

What did you see instead?

Errors described below.

Output of traefik version: (What version of Traefik are you using?)

1.3.1 upgrading to 1.3.6 and up, including 1.4 RC

What is your environment & configuration (arguments, toml, provider, platform, ...)?

Docker Swarm with Traefik routing to Swarm services from the outside.

[backends]
  [backends.static]
    [backends.static.servers.server1]
    url = "http://frontend:80"

  [backends.backend]
    [backends.backend.servers.server1]
    url = "http://backend:8080"

[frontends]
  [frontends.backend]
  backend = "backend"
  priority = 5
    [frontends.backend.routes.new]
    rule = "PathPrefix:/api,/wss"

  [frontends.static]
  backend = "static"
  priority = 1
    [frontends.static.routes.new]
    rule = "PathPrefix:/"

If applicable, please paste the log output in debug mode (--debug switch)

When upgraded to 1.3.6 stopped seeing the following which I presume is a successful handshake:

level=debug msg="Writing outgoing Websocket request to target connection: &{Method:GET URL:ws:/wss Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[User-Agent:[Go-http-client/1.1] Connection:[Upgrade] Sec-Websocket-Key:[key_here==] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Device-Token:[token_here]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:host.cloud Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:84.1.1.22:34992 RequestURI:/wss TLS:0xc4208096b0 Cancel:<nil> Response:<nil> ctx:0xc4205c5c80}"

Seeing lots of these in both versions:

server.go:2753: http: TLS handshake error from 18.6.6.152:39254: read tcp 172.18.0.3:443->18.6.6.152:39254: read: connection reset by peer
server.go:2753: http: TLS handshake error from 18.6.6.152:39272: EOF
@juliens juliens added area/websocket kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Sep 7, 2017
@OndroNR
Copy link

OndroNR commented Sep 25, 2017

We have similar problem in our setup. Problem occured in version 1.3.4. 1.3.3 works fine.

@juliens
Copy link
Member

juliens commented Sep 25, 2017

@OndroNR Can you test with the latest version (1.3.8) ?

@OndroNR @eskp I really need some more information about your use case. With the information you give, I can't reproduce (I have created a lots of integration tests).

I would really appreciate to find THE thing that make it happen on your configuration.

@OndroNR
Copy link

OndroNR commented Sep 26, 2017

@juliens Just tried with 1.4.0-rc3 and 1.3.8, does not work yet.

Relevant Traefik configuration. HTTPS cert comes via ACME. Other frontend endpoints have just normal HTTP backends.

logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]

[entryPoints]
  [entryPoints.http]
  address = ":80"
    [entryPoints.http.redirect]
    entryPoint = "https"
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]

[backends]
  [backends.streaming-api-backend-dev]
    [backends.streaming-api-backend-dev.servers.server1]
    url = "ws://streaming-api.backend-dev:7788"

[frontends]
  [frontends.streaming-api-dev]
  backend = "streaming-api-backend-dev"
    [frontends.streaming-api-dev.routes.test_1]
    rule = "Host:streaming.example.com"

Browser console log:

withSocket.js:21 WebSocket connection to 'wss://username:pass@streaming.example.com/stream?operation_id=123&floor_id=456' failed: Error during WebSocket handshake: Unexpected response code: 500

Traefik log:

[traefik-1]2017-09-26T15:03:55.254374139Z time="2017-09-26T15:03:55Z" level=warning msg="Error dialing `streaming.example.com`: websocket: bad handshake" 

Censored HTTP request:

GET wss://username:pass@streaming.example.com/stream?operation_id=123&floor_id=456 HTTP/1.1
Host: streaming.example.com
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Origin: https://app.example.com
Sec-WebSocket-Version: 13
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Accept-Encoding: gzip, deflate, br
Accept-Language: sk-SK,sk;q=0.8,cs;q=0.6,en-US;q=0.4,en;q=0.2
Cookie: __zlcmid=hFgPsgrERTTisI
Sec-WebSocket-Key: 3Q6DL0gNL3ahBAj24OTQWA==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

HTTP response:

HTTP/1.1 500 Internal Server Error
Date: Tue, 26 Sep 2017 15:13:47 GMT
Content-Length: 21
Content-Type: text/plain; charset=utf-8

@juliens
Copy link
Member

juliens commented Sep 27, 2017

@OndroNR

Traefik log:

[traefik-1]2017-09-26T15:03:55.254374139Z time="2017-09-26T15:03:55Z" level=warning msg="Error dialing `streaming.example.com`: websocket: bad handshake" 

In this log, you should have the backend url, not the frontend, is this the real log, or this is an error when you try to anonymize it ?

@OndroNR
Copy link

OndroNR commented Sep 27, 2017

@juliens checked it again to be sure. I see real frontend hostname in that log line.

@juliens
Copy link
Member

juliens commented Sep 27, 2017

@OndroNR Can you try with this docker image

docker pull juliens/traefik:websocket

I have added more informations in the error log

@OndroNR
Copy link

OndroNR commented Sep 27, 2017

@juliens Thanks for taking time on this.

[traefik-1]2017-09-27T13:08:10.118433871Z time="2017-09-27T13:08:10Z" level=debug msg="Look for provided certificate to validate [streaming.example.com]..." 
[traefik-1]2017-09-27T13:08:10.118455132Z time="2017-09-27T13:08:10Z" level=debug msg="No provided certificate found for domains [streaming.example.com], get ACME certificate." 
[traefik-1]2017-09-27T13:08:10.118461942Z time="2017-09-27T13:08:10Z" level=debug msg="Challenge GetCertificate streaming.example.com" 
[traefik-1]2017-09-27T13:08:10.118740809Z time="2017-09-27T13:08:10Z" level=debug msg="ACME got domain cert streaming.example.com" 
[traefik-1]2017-09-27T13:08:10.503226597Z time="2017-09-27T13:08:10Z" level=warning msg="Error dialing `streaming.example.com`: websocket: bad handshake" 
[traefik-1]2017-09-27T13:08:15.662228176Z time="2017-09-27T13:08:15Z" level=debug msg="Look for provided certificate to validate [streaming.example.com]..." 
[traefik-1]2017-09-27T13:08:15.662276498Z time="2017-09-27T13:08:15Z" level=debug msg="No provided certificate found for domains [streaming.example.com], get ACME certificate." 
[traefik-1]2017-09-27T13:08:15.662282730Z time="2017-09-27T13:08:15Z" level=debug msg="Challenge GetCertificate streaming.example.com" 
[traefik-1]2017-09-27T13:08:15.662316232Z time="2017-09-27T13:08:15Z" level=debug msg="ACME got domain cert streaming.example.com" 
[traefik-1]2017-09-27T13:08:15.725431564Z time="2017-09-27T13:08:15Z" level=warning msg="Error dialing `streaming.example.com`: websocket: bad handshake" 
[traefik-1]2017-09-27T13:08:17.064112223Z 2017-09-27 13:08:17.063952 I | 2017/09/27 13:08:17 h2_bundle.go:4292: http2: server: error reading preface from client 77.247.224.71:61674: timeout waiting for client preface
[traefik-1]2017-09-27T13:08:17.121429869Z 2017-09-27 13:08:17.121347 I | 2017/09/27 13:08:17 h2_bundle.go:4292: http2: server: error reading preface from client 77.247.224.71:61676: timeout waiting for client preface
[traefik-1]2017-09-27T13:08:17.125533895Z 2017-09-27 13:08:17.125477 I | 2017/09/27 13:08:17 h2_bundle.go:4292: http2: server: error reading preface from client 77.247.224.71:61675: timeout waiting for client preface
[traefik-1]2017-09-27T13:08:17.136624104Z 2017-09-27 13:08:17.136562 I | 2017/09/27 13:08:17 h2_bundle.go:4292: http2: server: error reading preface from client 77.247.224.71:61678: timeout waiting for client preface
[traefik-1]2017-09-27T13:08:17.141790428Z 2017-09-27 13:08:17.141737 I | 2017/09/27 13:08:17 h2_bundle.go:4292: http2: server: error reading preface from client 77.247.224.71:61677: timeout waiting for client preface

@juliens
Copy link
Member

juliens commented Sep 27, 2017

@OndroNR Thanks to you for your time to help me debug this !

Can you re-pull the image and try again ?

@OndroNR
Copy link

OndroNR commented Sep 27, 2017

@juliens

[traefik-1]2017-09-27T21:37:26.468005624Z time="2017-09-27T21:37:26Z" level=debug msg="Look for provided certificate to validate [streaming.example.com]..." 
[traefik-1]2017-09-27T21:37:26.468190338Z time="2017-09-27T21:37:26Z" level=debug msg="No provided certificate found for domains [streaming.example.com], get ACME certificate." 
[traefik-1]2017-09-27T21:37:26.468236280Z time="2017-09-27T21:37:26Z" level=debug msg="Challenge GetCertificate streaming.example.com" 
[traefik-1]2017-09-27T21:37:26.468302456Z time="2017-09-27T21:37:26Z" level=debug msg="ACME got domain cert streaming.example.com" 
[traefik-1]2017-09-27T21:37:26.915226282Z time="2017-09-27T21:37:26Z" level=debug msg="Try dialing `streaming-api.backend-dev:7788`" 
[traefik-1]2017-09-27T21:37:26.915263389Z time="2017-09-27T21:37:26Z" level=warning msg="Error dialing `streaming-api.backend-dev:7788`: websocket: bad handshake with resp: &{401 Unauthorized 401 HTTP/1.1 1 1 map[Www-Authenticate:[Basic realm=""] Server:[akka-http/2.3.16] Date:[Wed, 27 Sep 2017 21:37:26 GMT] Content-Type:[text/plain; charset=UTF-8] Content-Length:[77]] {0xc420542450} 77 [] false false map[] 0xc420806200 <nil>}" 
[traefik-1]2017-09-27T21:37:32.116914266Z time="2017-09-27T21:37:32Z" level=debug msg="Look for provided certificate to validate [streaming.example.com]..." 
[traefik-1]2017-09-27T21:37:32.116944842Z time="2017-09-27T21:37:32Z" level=debug msg="No provided certificate found for domains [streaming.example.com], get ACME certificate." 
[traefik-1]2017-09-27T21:37:32.116974729Z time="2017-09-27T21:37:32Z" level=debug msg="Challenge GetCertificate streaming.example.com" 
[traefik-1]2017-09-27T21:37:32.117612246Z time="2017-09-27T21:37:32Z" level=debug msg="ACME got domain cert streaming.example.com" 
[traefik-1]2017-09-27T21:37:32.261379578Z time="2017-09-27T21:37:32Z" level=debug msg="Try dialing `streaming-api.backend-dev:7788`" 
[traefik-1]2017-09-27T21:37:32.261446671Z time="2017-09-27T21:37:32Z" level=warning msg="Error dialing `streaming-api.backend-dev:7788`: websocket: bad handshake with resp: &{401 Unauthorized 401 HTTP/1.1 1 1 map[Content-Type:[text/plain; charset=UTF-8] Content-Length:[77] Www-Authenticate:[Basic realm=""] Server:[akka-http/2.3.16] Date:[Wed, 27 Sep 2017 21:37:31 GMT]] {0xc4205a3fb0} 77 [] false false map[] 0xc4204f1700 <nil>}" 

Maybe Traefik is failing to forward HTTP BASIC authentication credentials?

That corresponds with backend service logs:

2017-09-27 21:37:26,597 c.p.s.SocketServer$: Unauthorized user
2017-09-27 21:37:31,946 c.p.s.SocketServer$: Unauthorized user

@juliens juliens added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Sep 28, 2017
@juliens juliens self-assigned this Sep 28, 2017
@juliens
Copy link
Member

juliens commented Sep 28, 2017

Fine, you found the problem 😛 , I'll try to fix it quickly.

@OndroNR
Copy link

OndroNR commented Sep 28, 2017

@juliens looking forward to it :) I am just not sure if @eskp's problem is the same. I might have hijacked this issue :/

@juliens
Copy link
Member

juliens commented Sep 28, 2017

@OndroNR you're right, can you create a new issue ?

@OndroNR
Copy link

OndroNR commented Sep 28, 2017

@juliens done, we can continue at #2182

@eskp
Copy link
Author

eskp commented Sep 28, 2017

@juliens I can try with the more verbose container that you provided today and report back.

@juliens
Copy link
Member

juliens commented Sep 28, 2017

@eskp Good news if you can try with this image, moreover, I have just pushed a new version with a fix.

I'm waiting for your feedback

Thx 😛

@eskp
Copy link
Author

eskp commented Sep 29, 2017

No luck here @juliens.

From both custom and 1.3.1 version I see lots of these:

cloud_proxy.1.3afk1mv96t4x@ip-10-1-0-20    | 2017/09/29 08:45:20 server.go:2848: http: TLS handshake error from <redacted>:24980: read tcp 172.18.0.3:443-><redacted>:24980: read: connection reset by peer
cloud_proxy.1.3afk1mv96t4x@ip-10-1-0-20    | 2017/09/29 08:45:20 server.go:2848: http: TLS handshake error from <redacted>:24981: EOF

From the custom version:

cloud_proxy.1.3afk1mv96t4x@ip-10-1-0-20    | time="2017-09-29T08:45:20Z" level=debug msg="Try dialing `cloud_backend:8080`" 
cloud_proxy.1.3afk1mv96t4x@ip-10-1-0-20    | time="2017-09-29T08:45:20Z" level=debug msg="Incoming request: &{Method:GET URL:http://cloud_backend:8080 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[User-Agent:[Go-http-client/1.1] Connection:[Upgrade] Sec-Websocket-Key:[<redacted>==] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Device-Token:[<redacted>]]  Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:<redacted>:36004 RequestURI:/wss TLS:0xc420c84fd0  Cancel:<nil> Response:<nil> ctx:0xc420511aa0}" 
cloud_proxy.1.3afk1mv96t4x@ip-10-1-0-20    | time="2017-09-29T08:45:20Z" level=debug msg="Outgoing request: &{Method:GET URL:ws://cloud_backend:8080/wss Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Host:[domain.com] X-Device-Token:[<redacted>] User-Agent:[Go-http-client/1.1]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:dev.airtame.cloud Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:<redacted>:36004 RequestURI:/wss TLS:0xc420c84fd0 Cancel:<nil> Response:<nil> ctx:0xc420511aa0}"

From 1.3.1:

cloud_proxy.1.qchm290i87ce@ip-10-1-0-20    | time="2017-09-29T08:50:23Z" level=debug msg="Writing outgoing Websocket request to target connection: &{Method:GET URL:ws:/wss Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[User-Agent:[Go-http-client/1.1] Connection:[Upgrade] Sec-Websocket-Key:[<redacted>==] Sec-Websocket-Version:[13] Upgrade:[websocket] X-Device-Token:[<redacted>]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:dev.airtame.cloud Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:<redacted>:36103 RequestURI:/wss TLS:0xc420b03c30 Cancel:<nil> Response:<nil> ctx:0xc4200f6b80}"

I see these in 1.3.1 but not in the custom:

cloud_proxy.1.qchm290i87ce@ip-10-1-0-20    | time="2017-09-29T08:57:27Z" level=debug msg="Round trip: http://cloud_backend:8080, code: 200, duration: 14.189896ms tls:version: 303, tls:resume:false, tls:csuite:c02f, tls:server:<redacted>"

@eskp
Copy link
Author

eskp commented Nov 6, 2017

Hi @juliens, with the new version of Traefik(1.4.1) I now see this warning message in the logs:

level=warning msg="Error dialing "domain.com": websocket: bad handshake with resp: 200 200 OK"

I have tried to test the websocket connection with wscat utility and I saw a somewhat similar response:

wscat -c ws://localhost:8080/wss
error: Error: unexpected server response (200)

Confirming with the developers the application requires a token to be set in the header. So I have tried to do this with wscat and the connection succeeded:

# wscat -c ws://localhost:8080/wss -H "x-token:<token here>"
connected (press CTRL+C to quit)
>

So I'm thinking maybe this token isn't getting passed with Traefik? I have passHostHeader = true set on the configured frontend in Traefik.

@juliens
Copy link
Member

juliens commented Nov 8, 2017

@eskp I made a test to be sure header are forwarded, and infortunately (or not 😉) they are forwarded.

May be you can have more logs from your websocket server ?

@juliens
Copy link
Member

juliens commented Feb 1, 2018

@eskp Can you try again with 1.5.1 ?
And if it continue to failed, can you have more logs from your websocket server ?

@eskp
Copy link
Author

eskp commented Feb 1, 2018

Appreciate you getting back to me. Will try again in the next few days.

@eskp
Copy link
Author

eskp commented Feb 7, 2018

Hi @juliens,
It appears to be working. I can now complete actions done via websockets. However I still see some errors in Traefik logs:
level=error msg="vulcand/oxy/forward/websocket: Error dialing "mydomain.com": websocket: bad handshake with resp: 200 200 OK"

@juliens
Copy link
Member

juliens commented Feb 8, 2018

It appears to be working.
👍

For the error logs, do you thing you can open another issue with more information (how to reproduce) ?

@juliens juliens closed this as completed Feb 8, 2018
@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
@nmengin nmengin added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/fix a bug fix labels Dec 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants