New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against cross origin websockets. #980

Merged
merged 10 commits into from May 26, 2014

Conversation

Projects
None yet
3 participants
@rgbkrk
Contributor

rgbkrk commented Jan 24, 2014

Since CORS headers don't exist in WebSockets land, we need to check origin and host. This adds origin and host checking within the WebSocketHandler.

The opinionated piece of this PR is that cross origin is disabled by default.

If a user wants different behavior, they'll need to override check_origin.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Jan 24, 2014

Contributor

Since websocket_connect doesn't set origin or host and is used in conjunction with WebSocketHandlers, this is definitely going to fail tests for the moment. I'll have to look deeper in the tests, but would love some feedback on this PR.

Contributor

rgbkrk commented Jan 24, 2014

Since websocket_connect doesn't set origin or host and is used in conjunction with WebSocketHandlers, this is definitely going to fail tests for the moment. I'll have to look deeper in the tests, but would love some feedback on this PR.

@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell Jan 24, 2014

Member

We should definitely make it easy for applications to check the origin header, and it should probably be turned on by default (I'm not 100% convinced here, but I was surprised when I learned that cross-origin websockets were allowed by default so it's probably better to be conservative). A request with no origin header should probably be allowed rather than rejected - browsers always send the origin, so a request without one is coming from a source where the origin header would provide no meaningful authentication anyway.

Because of proxies like nginx and haproxy, a tornado server doesn't necessarily know its correct origin (it may be listening on another port than the one the client connects to). We need some way to tell Tornado of its true origin. This could either be another option at the WebSocketHandler level, or perhaps we could put it on HTTPServer to make it a part of the incoming request (by analogy with the existing protocol option there).

You added the allow_cross_origin argument to _execute, but it's not really feasible for applications to pass in arguments here. This should probably be managed via a new overridable method, similar to allow_draft76. Maybe structure it as check_origin(self, origin) so applications can choose to allow selected non-same origins.

When comparing origins the default ports :80 and :443 are optional and should probably be either added or removed from both sides of the comparison for consistency.

Member

bdarnell commented Jan 24, 2014

We should definitely make it easy for applications to check the origin header, and it should probably be turned on by default (I'm not 100% convinced here, but I was surprised when I learned that cross-origin websockets were allowed by default so it's probably better to be conservative). A request with no origin header should probably be allowed rather than rejected - browsers always send the origin, so a request without one is coming from a source where the origin header would provide no meaningful authentication anyway.

Because of proxies like nginx and haproxy, a tornado server doesn't necessarily know its correct origin (it may be listening on another port than the one the client connects to). We need some way to tell Tornado of its true origin. This could either be another option at the WebSocketHandler level, or perhaps we could put it on HTTPServer to make it a part of the incoming request (by analogy with the existing protocol option there).

You added the allow_cross_origin argument to _execute, but it's not really feasible for applications to pass in arguments here. This should probably be managed via a new overridable method, similar to allow_draft76. Maybe structure it as check_origin(self, origin) so applications can choose to allow selected non-same origins.

When comparing origins the default ports :80 and :443 are optional and should probably be either added or removed from both sides of the comparison for consistency.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Jan 24, 2014

Contributor

@bdarnell - Thank you so much for the excellent feedback! I'm never sure how PRs out of nowhere will be received on a project.

A request with no origin header should probably be allowed rather than rejected - browsers always send the origin, so a request without one is coming from a source where the origin header would provide no meaningful authentication anyway.

It is a protection for endusers on a browser, so that's completely sane. I'll change to match that.

Because of proxies like nginx and haproxy, a tornado server doesn't necessarily know its correct origin (it may be listening on another port than the one the client connects to).

The Host header is actually passed by the browser as well. I did some testing with nginx where the host name is different (tornado server running only on localhost, nginx forwarding to another interface). @minrk tested it with SSH tunnels and node-http-proxy, but again only with browsers.

This should probably be managed via a new overridable method, similar to allow_draft76. Maybe structure it as check_origin(self, origin) so applications can choose to allow selected non-same origins.

Just the feedback I needed. Wasn't quite sure how to structure this. That's a really good idea. Disable by default but let users override how the checking is done, to include checking for particular origins?

When comparing origins the default ports :80 and :443 are optional and should probably be either added or removed from both sides of the comparison for consistency.

In the server I'm using, which is a client side application, the ports are actually high ports and do appear both on the origin and the header. I can provide samples of this directly to you if necessary.


I submitted this on the road but I'll be back to it soon, including getting tests to pass (and adding some).

Contributor

rgbkrk commented Jan 24, 2014

@bdarnell - Thank you so much for the excellent feedback! I'm never sure how PRs out of nowhere will be received on a project.

A request with no origin header should probably be allowed rather than rejected - browsers always send the origin, so a request without one is coming from a source where the origin header would provide no meaningful authentication anyway.

It is a protection for endusers on a browser, so that's completely sane. I'll change to match that.

Because of proxies like nginx and haproxy, a tornado server doesn't necessarily know its correct origin (it may be listening on another port than the one the client connects to).

The Host header is actually passed by the browser as well. I did some testing with nginx where the host name is different (tornado server running only on localhost, nginx forwarding to another interface). @minrk tested it with SSH tunnels and node-http-proxy, but again only with browsers.

This should probably be managed via a new overridable method, similar to allow_draft76. Maybe structure it as check_origin(self, origin) so applications can choose to allow selected non-same origins.

Just the feedback I needed. Wasn't quite sure how to structure this. That's a really good idea. Disable by default but let users override how the checking is done, to include checking for particular origins?

When comparing origins the default ports :80 and :443 are optional and should probably be either added or removed from both sides of the comparison for consistency.

In the server I'm using, which is a client side application, the ports are actually high ports and do appear both on the origin and the header. I can provide samples of this directly to you if necessary.


I submitted this on the road but I'll be back to it soon, including getting tests to pass (and adding some).

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Jan 25, 2014

Contributor

While adding tests, I noticed how terrible of an interface I made by putting this in _execute. If they override it, they'd have to provide a default argument for it to do anything different.

class SomeWebSocketHandler(WebSocketHandler):

  def check_origin(self, allowed_origins=["localhost"]):
      super(SomeWebSocketHandler, self).check_origin(allowed_origins)

That's so ugly.

This is my first time hacking on Tornado, so I'm a bit lost on how to get to certain data and responses within the tests. In particular, I would want to verify the status code that was returned to the WebSocketClientConnection. Can you point me in the right direction so I can make better, more robust tests?

On a side note, I noticed that some of the tests aren't enabled (by way of the gen_test decorator). In particular, test_websocket_callbacks within websocket_test. If enabled, it's a failing test (and completely unrelated to this PR). Is this by design?

Contributor

rgbkrk commented Jan 25, 2014

While adding tests, I noticed how terrible of an interface I made by putting this in _execute. If they override it, they'd have to provide a default argument for it to do anything different.

class SomeWebSocketHandler(WebSocketHandler):

  def check_origin(self, allowed_origins=["localhost"]):
      super(SomeWebSocketHandler, self).check_origin(allowed_origins)

That's so ugly.

This is my first time hacking on Tornado, so I'm a bit lost on how to get to certain data and responses within the tests. In particular, I would want to verify the status code that was returned to the WebSocketClientConnection. Can you point me in the right direction so I can make better, more robust tests?

On a side note, I noticed that some of the tests aren't enabled (by way of the gen_test decorator). In particular, test_websocket_callbacks within websocket_test. If enabled, it's a failing test (and completely unrelated to this PR). Is this by design?

@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell Jan 25, 2014

Member

Good catch on the test_websocket_callbacks bug - that wasn't intentional. I'll fix that and I think I've got a trick to detect and prevent these in the future.

For testing the error codes, you should be able to catch the HTTPError raised by websocket_connect - see test_websocket_http_fail.

It's not always possible for an application to pass in all its allowed origins (unless perhaps we introduce some sort of wildcard matching, but that just invites complexity), so I'd turn the interface around a bit: _execute calls self.check_origin(origin) with the value extracted from the headers (and perhaps normalized? lowercase it, add a port if it's missing, etc), and then it's up to check_origin to decide whether to allow it. The default implementation would compute an expected origin from the request and compare to that, but applications are free to augment or replace that default.

Member

bdarnell commented Jan 25, 2014

Good catch on the test_websocket_callbacks bug - that wasn't intentional. I'll fix that and I think I've got a trick to detect and prevent these in the future.

For testing the error codes, you should be able to catch the HTTPError raised by websocket_connect - see test_websocket_http_fail.

It's not always possible for an application to pass in all its allowed origins (unless perhaps we introduce some sort of wildcard matching, but that just invites complexity), so I'd turn the interface around a bit: _execute calls self.check_origin(origin) with the value extracted from the headers (and perhaps normalized? lowercase it, add a port if it's missing, etc), and then it's up to check_origin to decide whether to allow it. The default implementation would compute an expected origin from the request and compare to that, but applications are free to augment or replace that default.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Jan 26, 2014

Contributor

For testing the error codes, you should be able to catch the HTTPError raised by websocket_connect - see test_websocket_http_fail.

Cool. That helped. Also made me realize I had the origin check in the wrong spot. Yay for tests.

...with the value extracted from the headers (and perhaps normalized? lowercase it, add a port if it's missing, etc)

We won't actually know the port since origin is in the context of the browser. I can try to reach a websocket hosted on localhost using my javascript console while on github.com - the origin will show http://github.com.


This is in a happy state now I think. Ready for comments or merging!

Contributor

rgbkrk commented Jan 26, 2014

For testing the error codes, you should be able to catch the HTTPError raised by websocket_connect - see test_websocket_http_fail.

Cool. That helped. Also made me realize I had the origin check in the wrong spot. Yay for tests.

...with the value extracted from the headers (and perhaps normalized? lowercase it, add a port if it's missing, etc)

We won't actually know the port since origin is in the context of the browser. I can try to reach a websocket hosted on localhost using my javascript console while on github.com - the origin will show http://github.com.


This is in a happy state now I think. Ready for comments or merging!

@@ -296,7 +296,7 @@ directory as your Python file, you could render this template with:
self.render("template.html", title="My title", items=items)
Tornado templates support *control statements* and *expressions*.
Control statements are surronded by ``{%`` and ``%}``, e.g.,
Control statements are surrounded by ``{%`` and ``%}``, e.g.,

This comment has been minimized.

@rgbkrk

rgbkrk Jan 26, 2014

Contributor

Whoops, this typo fix was meant to go in a separate PR. I can move this if necessary.

@rgbkrk

rgbkrk Jan 26, 2014

Contributor

Whoops, this typo fix was meant to go in a separate PR. I can move this if necessary.

Show outdated Hide outdated tornado/websocket.py
Show outdated Hide outdated tornado/websocket.py
host = self.request.headers.get("Host")
# Check to see that origin matches host directly, including ports
return origin == host

This comment has been minimized.

@bdarnell

bdarnell Jan 27, 2014

Member

We can't trust the Host header here - the attacker could simply be proxying the TCP traffic from the user's browser, in which case the browser will send a Host header corresponding to the attacker's site (and Tornado's HTTPServer does no validation of the Host header by default). I'm not sure if we can do any origin validation without an explicit specification of what hostname we expect to be (at least not for HTTP. HTTPS mostly prevents the class of attack where the Host header is not what we expect it to be).

@bdarnell

bdarnell Jan 27, 2014

Member

We can't trust the Host header here - the attacker could simply be proxying the TCP traffic from the user's browser, in which case the browser will send a Host header corresponding to the attacker's site (and Tornado's HTTPServer does no validation of the Host header by default). I'm not sure if we can do any origin validation without an explicit specification of what hostname we expect to be (at least not for HTTP. HTTPS mostly prevents the class of attack where the Host header is not what we expect it to be).

This comment has been minimized.

@rgbkrk

rgbkrk Jan 28, 2014

Contributor

Pardon my lack of knowledge here, but how do you spoof the host header clientside?

@rgbkrk

rgbkrk Jan 28, 2014

Contributor

Pardon my lack of knowledge here, but how do you spoof the host header clientside?

This comment has been minimized.

@ellisonbg

ellisonbg Jan 28, 2014

I agree that to do proper origin validation we need to have an explicit list of acceptable origin values.

@ellisonbg

ellisonbg Jan 28, 2014

I agree that to do proper origin validation we need to have an explicit list of acceptable origin values.

This comment has been minimized.

@rgbkrk

rgbkrk Jan 28, 2014

Contributor

Re-reading this again, the idea is that an attacker would be doing a MITM attack between the client and the actual web server?

What prevents them from bypassing this entirely by just not adding the Origin and Host headers?

Clearly websockets should be done over HTTPS and that additional auth on websockets should be required.

@rgbkrk

rgbkrk Jan 28, 2014

Contributor

Re-reading this again, the idea is that an attacker would be doing a MITM attack between the client and the actual web server?

What prevents them from bypassing this entirely by just not adding the Origin and Host headers?

Clearly websockets should be done over HTTPS and that additional auth on websockets should be required.

This comment has been minimized.

@ellisonbg

ellisonbg Jan 28, 2014

My statement was not based on preventing any particular method of attack the HOST header. It more has to do with the general idea that I don't think it is a good idea to trust the incoming request for both pieces of information.

@ellisonbg

ellisonbg Jan 28, 2014

My statement was not based on preventing any particular method of attack the HOST header. It more has to do with the general idea that I don't think it is a good idea to trust the incoming request for both pieces of information.

This comment has been minimized.

@bdarnell

bdarnell Jan 29, 2014

Member

Yes, it's risky on its face to compare two pieces of information from the request, but it kind of makes sense given the weird browser security model. The problem is that in a cross-site websocket scenario (does this have a name yet, like XSS or [CX]SRF?), the browser will automatically send cookies, even though the origin site does not have permission to read them. In a non-browser scenario, the client obviously knows the cookies it is sending (if any), and if the host and origin headers match, the initiating javascript could have read the cookies out of the DOM (modulo the httponly flag. I'm not sure how that interacts with websockets). In the problematic case, when the browser will use the user's cookies on behalf of a site that cannot read them, the host and origin headers will not match.

I think this holds across all browsers that support websockets, so that comparing the host and origin headers from the same request is in fact useful for security, but I haven't done much research on the topic. I think it's a good practice for sites to validate the host header against an explicit whitelist (even though Tornado's defaults implicitly discourage this), but that doesn't mean it's not worthwhile to compare host and origin to provide some baseline level of security without configuration.

@bdarnell

bdarnell Jan 29, 2014

Member

Yes, it's risky on its face to compare two pieces of information from the request, but it kind of makes sense given the weird browser security model. The problem is that in a cross-site websocket scenario (does this have a name yet, like XSS or [CX]SRF?), the browser will automatically send cookies, even though the origin site does not have permission to read them. In a non-browser scenario, the client obviously knows the cookies it is sending (if any), and if the host and origin headers match, the initiating javascript could have read the cookies out of the DOM (modulo the httponly flag. I'm not sure how that interacts with websockets). In the problematic case, when the browser will use the user's cookies on behalf of a site that cannot read them, the host and origin headers will not match.

I think this holds across all browsers that support websockets, so that comparing the host and origin headers from the same request is in fact useful for security, but I haven't done much research on the topic. I think it's a good practice for sites to validate the host header against an explicit whitelist (even though Tornado's defaults implicitly discourage this), but that doesn't mean it's not worthwhile to compare host and origin to provide some baseline level of security without configuration.

Show outdated Hide outdated tornado/websocket.py
Show outdated Hide outdated tornado/websocket.py
url = 'ws://localhost:%d/echo' % port
# subdomains should be invalid by default
headers = {'Origin': 'http://subtenant.somewhereelse.com',
'Host': 'subtenant2.somewhereelse.com'}

This comment has been minimized.

@bdarnell

bdarnell Mar 4, 2014

Member

It's not obvious that this Host header has any effect at all - I initially expected it to be clobbered by the Host header set by SimpleAsyncHTTPClient (although after checking the code I see that it's not). This test would pass even if the Host header were ignored; if you want to test with an explicit Host header you should devise a test where it changes the outcome.

@bdarnell

bdarnell Mar 4, 2014

Member

It's not obvious that this Host header has any effect at all - I initially expected it to be clobbered by the Host header set by SimpleAsyncHTTPClient (although after checking the code I see that it's not). This test would pass even if the Host header were ignored; if you want to test with an explicit Host header you should devise a test where it changes the outcome.

This comment has been minimized.

@rgbkrk

rgbkrk May 8, 2014

Contributor

The origin and host header are both provided by the browser. I'm not sure what you want to test here.

@rgbkrk

rgbkrk May 8, 2014

Contributor

The origin and host header are both provided by the browser. I'm not sure what you want to test here.

This comment has been minimized.

@bdarnell

bdarnell May 9, 2014

Member

There are two hostnames given in this HTTPRequest: "localhost" in the url, and "subtenant2.somewhereelse.com" in the headers dict. It's unclear what an HTTP client is supposed to do with that; I could easily imagine some clients using the header, some using the url, and some throwing an error. As it happens, tornado's simple_httpclient currently uses the header. If it used the url instead, this test would be redundant with the previous one and there would be no test of the handling of subdomains.

I think the cleanest way to ensure that this test actually covers subdomains is to construct a new SimpleAsyncHTTPClient using the hostname_mapping parameters, so you can point both subdomains of somewhereelse.com to 127.0.0.1, and you don't have to use a host header that doesn't match the url.

@bdarnell

bdarnell May 9, 2014

Member

There are two hostnames given in this HTTPRequest: "localhost" in the url, and "subtenant2.somewhereelse.com" in the headers dict. It's unclear what an HTTP client is supposed to do with that; I could easily imagine some clients using the header, some using the url, and some throwing an error. As it happens, tornado's simple_httpclient currently uses the header. If it used the url instead, this test would be redundant with the previous one and there would be no test of the handling of subdomains.

I think the cleanest way to ensure that this test actually covers subdomains is to construct a new SimpleAsyncHTTPClient using the hostname_mapping parameters, so you can point both subdomains of somewhereelse.com to 127.0.0.1, and you don't have to use a host header that doesn't match the url.

port = self.get_http_port()
url = 'ws://localhost:%d/echo' % port
headers = {'Origin': 'http://localhost:%d' % port}

This comment has been minimized.

@bdarnell

bdarnell Mar 4, 2014

Member

We should probably also have a test with a non-empty path in the Origin to show that paths (unlike subdomains) are ignored.

@bdarnell

bdarnell Mar 4, 2014

Member

We should probably also have a test with a non-empty path in the Origin to show that paths (unlike subdomains) are ignored.

Show outdated Hide outdated tornado/websocket.py
@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 27, 2014

Contributor

@bdarnell, just realized this PR was still out and about. Would you still want to bring this in if I address your last few items?

Contributor

rgbkrk commented Apr 27, 2014

@bdarnell, just realized this PR was still out and about. Would you still want to bring this in if I address your last few items?

@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell Apr 27, 2014

Member

Yes, I think if you address my last few comments it'll be ready to merge.

Member

bdarnell commented Apr 27, 2014

Yes, I think if you address my last few comments it'll be ready to merge.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 8, 2014

Contributor

I've addressed the last few comments and performed a rebase to make this easier to merge.

Contributor

rgbkrk commented May 8, 2014

I've addressed the last few comments and performed a rebase to make this easier to merge.

# If there was an origin header, check to make sure it matches
# according to check_origin. When the origin is None, we assume it
# came from a browser and that it can be passed on.

This comment has been minimized.

@bdarnell

bdarnell May 9, 2014

Member

You mean "we assume it did not come from a browser and therefore we do not need to enforce the same-origin policy", right?

@bdarnell

bdarnell May 9, 2014

Member

You mean "we assume it did not come from a browser and therefore we do not need to enforce the same-origin policy", right?

This comment has been minimized.

@rgbkrk

rgbkrk May 26, 2014

Contributor

Yes, "did not come from a browser".

@rgbkrk

rgbkrk May 26, 2014

Contributor

Yes, "did not come from a browser".

# Due to how stdlib's urlparse is implemented, urls without a //
# are interpreted to be paths (resulting in netloc being None)
if("//" not in origin):

This comment has been minimized.

@bdarnell

bdarnell May 9, 2014

Member

I'm still of the opinion that non-absolute Origins should be rejected, but if this stays in we need a test for it.

@bdarnell

bdarnell May 9, 2014

Member

I'm still of the opinion that non-absolute Origins should be rejected, but if this stays in we need a test for it.

@bdarnell bdarnell merged commit 7fcbe30 into tornadoweb:master May 26, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell May 26, 2014

Member

Merged with the handling of non-absolute origins removed.

Member

bdarnell commented May 26, 2014

Merged with the handling of non-absolute origins removed.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2014

Contributor

Awesome, thanks for working through it with me. Checking out 6403cb9 now.

Contributor

rgbkrk commented May 26, 2014

Awesome, thanks for working through it with me. Checking out 6403cb9 now.

@rgbkrk rgbkrk deleted the rgbkrk:cross_origin branch May 26, 2014

a-pertsev added a commit to a-pertsev/tornado that referenced this pull request May 3, 2018

a-pertsev added a commit to a-pertsev/tornado that referenced this pull request May 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment