Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogus HTTP 403 errors after hotfix XS82E031 for valid HTTPS requests #4517

Closed
stormi opened this issue Sep 7, 2021 · 1 comment · Fixed by xapi-project/xen-api-libs-transitional#109

Comments

@stormi
Copy link
Contributor

stormi commented Sep 7, 2021

The commits from PR #4422 not only block requests on port 80, but also some requests on port 443. After the attempt at debugging @benjamreis and I did, it looks like this happens for every subsequent request within the same TLS connection.

How to reproduce (1)

From a web browser, load a web page that fetches various resources from the host such as JS or images. This is what XCP-ng's web page does (sources):
image
Traces we added in the code showed most requests for resources were rejected with a 403 error.

This is what the page should look like:
image

How to reproduce (2)

You can also reproduce what is probably another manifestation of the same issue on Citrix Hypervisor 8.2 with XS82E031 applied: just load the web page from the host with Firefox, and refresh, again and again. It should reject one of two requests: the first request succeeds (new TLS connection logged in secure.log). The next one fails (HTTP 403. secure.log shows the second request causes the TLS connection to be closed). Then the next one succeeds (new TLS connection opened in secure.log), the one after that fails, and so on.
One of us reproduced it in Chrome too, but others didn't unless they'd mash F5 fast enough, so that two queries overlap, maybe. I think that Firefox leaves the connection open for subsequent requests more eagerly than chrome does.

What we found out so far

The main test to check for a valid TLS connection in XAPI is

let access_forbidden req s =
   (* Reject external non-TLS requests (depending on config) *)
   !Xapi_globs.website_https_only
   && (not (Context.is_unix_socket s))
   && Context._client_of_rq req = None

What plays a significant role here is Context._client_of_rq req = None, which only lets requests that pass Context._client_of_rq req != None through.

And the definition of Context._client_of_rq is:

let _client_of_rq rq =
 let stunnel_proxy =
   List.assoc_opt "STUNNEL_PROXY" rq.Http.Request.additional_headers
 in
 Option.bind stunnel_proxy (fun proxy ->
     try
       Scanf.sscanf proxy "TCP6 %s %s %d %d" (fun client _ _ _ -> client)
       |> _clean_addr_of_string
     with _ ->
       R.error "Failed to parse STUNNEL_PROXY='%s'" proxy ;
       None)

We have verified that Context._client_of_rq req was usually not None for the first request of a series, but was None for any other request made under the same TLS connection. Thus causing a rejection and the connection to be closed. Then, after the connection has been closed, the next one succeeds because we start again with a fresh connection and Context._client_of_rq req is not None.

We assume that either Context._client_of_rq req is not an appropriate test to check that the request came from a valid TLS connection, or stunnel does not set the appropriate headers (STUNNEL_PROXY?) for any request within a TLS connection that is not the first of the series.

Addendum

I also tested what happens if I set website-https-only to false in xapi.conf. Then the page always loads correctly, as before, but the debug traces I had added showed that only the first request of a series would pass the Context._client_of_rq req != None test. When loading the same file successively as described in "How to reproduce (2)", I would get such behaviour visible in secure.log + xensource.log (output to the latter being my added traces):

secure.log: `Service [xapi] accepted connection from...`
xensource.log: fileserver.send_file: new request. Context._client_of_rq req != None
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
[...]
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
secure.log: `Connection closed`
secure.log: `Service [xapi] accepted connection from...`
xensource.log: fileserver.send_file: new request. Context._client_of_rq req != None
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
xensource.log: fileserver.send_file: new request. Context._client_of_rq req = None
[etc.]

As I tried to illustrate with forged logs (mine are less readable), the first request after establishing the TLS connection passes the Context._client_of_rq req != None test but not any of the subsequent ones... Until the connection finally closes and a new one is established, starting a new similar series.

@robhoes
Copy link
Member

robhoes commented Sep 7, 2021

Hi @stormi, thanks for the report and comprehensive analysis. I think that you are quite right, and I can see why this is happening:

When making a connection to port 443, then stunnel receives it and handles the TLS part, and then passes the connection to xapi. We have configured stunnel to send a special header of TCP parameters (the PROXY header) right at the beginning of the connection to xapi, before sending the decrypted data from the external client. This header contains the client's IP address, which we want to log. We also use the presence of this header to determine whether the connection came from through stunnel, and is therefore a TLS connection.

Now, this header isn't part of the HTTP request. Xapi receives it first, and then receives the HTTP request. If the connection remains open and another request comes in on the same connection, then stunnel won't send the PROXY header again, as stunnel doesn't work at the HTTP layer but just below – after establishing the connection, it just decrypts and forwards whatever bits come next. The HTTP server in xapi, however, looks for the PROXY header and HTTP header in one step. So the first request on the connection gets annotated with this PROXY data (which end up in Context._client_of_rq), and any following requests on the same connection don't. It doesn't remember that it had already seen the PROXY header.

It should be quite easy to fix this in http_svr.ml in xen-api-libs-transitional. I'll take a look tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants