Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate HTTPConnection.auto_open #2775

Closed
sethmlarson opened this issue Nov 8, 2022 · 2 comments · Fixed by #2776
Closed

Investigate HTTPConnection.auto_open #2775

sethmlarson opened this issue Nov 8, 2022 · 2 comments · Fixed by #2776

Comments

@sethmlarson
Copy link
Member

sethmlarson commented Nov 8, 2022

urllib3 makes multiple mentions to a property on http.client.HTTPConnection called auto_open. The property isn't used much in the current stdlib http.client module, but has some implications for urllib3: mainly that connections may not be re-used based on this value?

In urllib3.connection.py:

# Calls self._set_hostport(), so self.host is
# self._tunnel_host below.
self._tunnel()  # type: ignore[attr-defined]
# Mark this connection as not reusable
self.auto_open = 0

In urllib3.connectionpool.py:

conn.close()
if getattr(conn, "auto_open", 1) == 0:
    # This is a proxied connection that has been mutated by
    # http.client._tunnel() and cannot be reused (since it would
    # attempt to bypass the proxy)
    conn = None

I suspect that http.client.HTTPConnection._tunnel() in the past may have rewritten .host and .port properties or something like this and this was our method of "guarding" against that? Currently host and port properties on a connection point to the first origin (read: proxy) on a requests' journey to the destination origin.

  • What is the intent of this property and it's values?
  • Has that intent changed since the original code in urllib3 using the value was written (git blame!)?
  • Can we remove our use/dependence on this value?
  • Should we write some tests for this value? (We currently have none!!! 😱)
  • Can any of these fixes land in 1.26.x?
@pquentin
Copy link
Member

pquentin commented Nov 9, 2022

Your suspicion is correct!

  • What is the intent of this property and it's values?

#369 added auto_open to work around a bug in CPython where host and port were mutated to the target origin, which broke socket reuse. It can be unset, or set to 0.

  • Has that intent changed since the original code in urllib3 using the value was written (git blame!)?

Yes. This CPython bug was then fixed as part of BPO 7776 by python/cpython@9da047b which went to Python 3.4.1 and was backported to Python 2.7.7.

That change caused a regression in urllib3 that was fixed in #385 and 4fb351c. I found this difficult to follow given the two changes, so here is the full diff:

diff --git a/urllib3/connection.py b/urllib3/connection.py
index 45bef659..5feb3322 100644
--- a/urllib3/connection.py
+++ b/urllib3/connection.py
@@ -168,21 +168,25 @@ class VerifiedHTTPSConnection(HTTPSConnection):
         resolved_cert_reqs = resolve_cert_reqs(self.cert_reqs)
         resolved_ssl_version = resolve_ssl_version(self.ssl_version)
 
-        # the _tunnel_host attribute was added in python 2.6.3 (via
-        # http://hg.python.org/cpython/rev/0f57b30a152f) so pythons 2.6(0-2) do
-        # not have them.
+        hostname = self.host
         if getattr(self, '_tunnel_host', None):
+            # _tunnel_host was added in Python 2.6.3
+            # (See: http://hg.python.org/cpython/rev/0f57b30a152f)
+
             self.sock = sock
             # Calls self._set_hostport(), so self.host is
             # self._tunnel_host below.
             self._tunnel()
 
+            # Override the host with the one we're requesting data from.
+            hostname = self._tunnel_host
+
         # Wrap socket using verification with the root certs in
         # trusted_root_certs
         self.sock = ssl_wrap_socket(sock, self.key_file, self.cert_file,
                                     cert_reqs=resolved_cert_reqs,
                                     ca_certs=self.ca_certs,
-                                    server_hostname=self.host,
+                                    server_hostname=hostname,
                                     ssl_version=resolved_ssl_version)
 
         if resolved_cert_reqs != ssl.CERT_NONE:
@@ -191,7 +195,7 @@ class VerifiedHTTPSConnection(HTTPSConnection):
                                    self.assert_fingerprint)
             elif self.assert_hostname is not False:
                 match_hostname(self.sock.getpeercert(),
-                               self.assert_hostname or self.host)
+                               self.assert_hostname or hostname)
 
 
 if ssl:

We used to rely on the mutation because we wanted to know the hostname of the target origin, which is why we needed that fix.

  • Can we remove our use/dependence on this value?

Yes, given we don't support any CPython version with the bug anymore. Pull request incoming.

  • Should we write some tests for this value? (We currently have none!!! 😱)

Actually, we do have a test: #363 added test_connect_reconn. I confirmed that it does cover the if getattr(conn, "auto_open", 1) == 0 condition. Thankfully, we'll be able to remove auto_open but keep the test.

  • Can any of these fixes land in 1.26.x?

No, given we currently support Python < 2.7.9 in 1.26.x.

sethmlarson pushed a commit that referenced this issue Nov 9, 2022
It is no longer needed as HTTPConnection no longer mutates host and
port. See #2775 for full info.
@sethmlarson
Copy link
Member Author

@pquentin Thank you for all the digging, this is excellent! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants