Skip to content

Remove Authorization header when redirecting cross-host #1346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 29, 2018

Conversation

sethmlarson
Copy link
Member

This change makes the default behavior of an HTTP redirect when the Location specifies a different host than the original request to remove the Authentication header if present. This behavior can be turned off by using Retry(forward_auth_headers_across_hosts=True) which defaults to False. Closes #1316.

@codecov-io
Copy link

codecov-io commented Mar 26, 2018

Codecov Report

Merging #1346 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1346   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          21      21           
  Lines        2014    2019    +5     
======================================
+ Hits         2014    2019    +5
Impacted Files Coverage Δ
urllib3/util/retry.py 100% <100%> (ø) ⬆️
urllib3/poolmanager.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4f123d...63948f3. Read the comment docs.

@sethmlarson sethmlarson force-pushed the redirect-with-auth branch 2 times, most recently from 281d675 to 1d2eaed Compare March 26, 2018 02:18
Copy link
Contributor

@haikuginger haikuginger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't believe this to be there quite yet.

To start out with, the header is Authorization, not Authentication, but that's a relatively minor nit.

I would prefer an implementation where the set of headers that are allowed to pass between hosts is explicitly defined - this will allow for cases where different headers are used to pass equally sensitive information - for example, someone might use X-Vender-Api-Key-Token and we should be able to take advantage of the same logic.

I would suggest the following, although @jonparrott may have a different thought and I'd like to defer to him on this:

  • I can pass a keyword argument forward_headers_across_hosts that is a dict to either instantiate or make a request using a PoolManager, with explicitly-passed values in the request overriding the PM's instance state.
  • The dict I pass contains a set of header names mapping to booleans defining whether those headers should be forwarded to any new host, possibly defaulting to universally False, or possibly only defaulting to false for certain well-defined sensitive headers like Authorization.
  • There is an overridable method on the PoolManager, receiving as arguments original_host, redirect_host, and header_name that defines the behavior around whether a given header should be forwarded for a given request/redirect. By default, it implements the above, but can be overridden (for example, a user could set up an overridden method that allows a certain authorization header to be forwarded between different hosts only with certain domain names).

@sethmlarson
Copy link
Member Author

Hah, I should read MDN instead of assuming header names.

And I was definitely thinking about allowing user-defined header stripping. That sounds very useful. I'll add it using whatever logic is decided to be best.

Also quick question: Is PoolManager the only place for this sort of logic or is me adding this to ConnectionPool also necessary?

@haikuginger
Copy link
Contributor

A ConnectionPool should, IIRC, only ever connect to a single host, so it makes sense for this logic to be either in PoolManager or Redirect.

@theacodes
Copy link
Member

This is a good start, but I would ask for a slightly more robust implementation (I'm happy to help with that as well).

  1. PoolManager is the only thing that does cross-host requests, so the logic should live (mostly) there.
  2. Retry configures redirect handling, so the configuration should live there.
  3. Instead of stripping just one hard-coded header, I'd prefer if we had a remove_headers_on_redirect which can be set to a Sequence. By default, this will just be {'Authorization'}.

Probably useful to reference https://curl.haxx.se/libcurl/c/CURLOPT_UNRESTRICTED_AUTH.html as that's where this behavior comes from. It is not formally specified in any RFC.

@sethmlarson
Copy link
Member Author

Thanks for the reference! I wasn't sure about whether cross-host requests would be handled in ConnectionPool as I saw the assert_same_host parameter. I'll update with your specifications, thank you!

@sethmlarson sethmlarson force-pushed the redirect-with-auth branch 2 times, most recently from d6c49b9 to 3a9bb62 Compare March 27, 2018 01:56
@sethmlarson sethmlarson changed the title Remove Authentication header when redirecting cross-host Remove Authorization header when redirecting cross-host Mar 27, 2018
- Default to Authorization header.
- Allow different settings on the Retry object.
- Remove logic from ConnectionPool.
@sethmlarson
Copy link
Member Author

Still trying to shake transient macOS SSL tests failing.

@sethmlarson sethmlarson reopened this Mar 27, 2018
if (retries.remove_headers_on_redirect
and not conn.is_same_host(redirect_location)):
for header in retries.remove_headers_on_redirect:
if header in headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use headers.pop(header, None)

if 'headers' not in kw:
kw['headers'] = self.headers
kw['headers'] = headers = self.headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you set kw['headers'] below, do you need to set it here?

Also, you should probably copy if using self.headers so you don't modify that dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've implemented this.

@theacodes
Copy link
Member

This is mostly looking good, I just have one major concern:

This is a very big behavioral change that's likely to break some users. Some users won't be able to directly control how PoolManager is constructed. Should we consider some sort of environment variable to change this back to the old behavior?

@sethmlarson
Copy link
Member Author

Failing PyOpenSSL test on AppVeyor.

@sethmlarson sethmlarson reopened this Mar 27, 2018
@sethmlarson sethmlarson dismissed haikuginger’s stale review March 27, 2018 18:27

Functionality has been implemented.

@sethmlarson
Copy link
Member Author

I'm +0 on an environment variable to toggle this behavior, others will have a better history of experience with this type of feature. :)

@shazow
Copy link
Member

shazow commented Mar 27, 2018

In my day, we erred on the side of breaking things for the sake of better security defaults. (Seems just about the most justifiable reason for breaking backwards compatibility.)

Do other projects do env vars? I feel like by the time you looked up how to fix it via env var, it would be the same amount of work to just fix the problem properly.

Definitely worth putting this change at the top of the CHANGES list for the next release.

What do @haikuginger @Lukasa think?

CHANGES.rst Outdated
@@ -18,6 +18,10 @@ dev (master)

* Lazily load `uuid` to boost performance on imports (Pull #1270)

* Allow providing a list of headers to strip from requests when redirecting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding another entry at the top of the CHANGES list that explicitly mentions the breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this, I also want to make mention to that this only occurs when the hostname in the redirect is different than the original requests hostname.

@@ -152,7 +156,8 @@ class Retry(object):
def __init__(self, total=10, connect=None, read=None, redirect=None, status=None,
method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None,
backoff_factor=0, raise_on_redirect=True, raise_on_status=True,
history=None, respect_retry_after_header=True):
history=None, respect_retry_after_header=True,
remove_headers_on_redirect=None):
Copy link
Member

@shazow shazow Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer if this were a constant value, like remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST or somesuch.

Otherwise, it's hard to figure out what headers get removed by default without reading the source code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, that would be an easy addition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off, I don't like None meaning default here. I assume you did this to avoid making the default parameter a mutable value - in which case, you can make the constant a frozenset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is flake8 smart enough to know frozenset isn't a mutable value/doesn't report mutable default value for frozenset? I honestly don't know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, duh, it's used above this parameter as a default so of course it works! I'll make this change.

@@ -335,6 +336,12 @@ def urlopen(self, method, url, redirect=True, **kw):
if not isinstance(retries, Retry):
retries = Retry.from_int(retries, redirect=redirect)

# Strip headers marked as unsafe to forward to the redirected location.
if (retries.remove_headers_on_redirect
Copy link
Member

@shazow shazow Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: Would be equivalent to have

        if not conn.is_same_host(redirect_location):
            for header in retries.remove_headers_on_redirect:
                kw['headers'].pop(header, None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reason I chose to do this is because of the TODO: socket.gethostname() within ConnectionPool.is_same_host() in order to short-circuit a potential future network call if not needed. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In that case, a comment would be wise so a future contributor doesn't simplify it. :)

@@ -19,6 +19,10 @@

log = logging.getLogger(__name__)


DEFAULT_REDIRECT_HEADERS_BLACKLIST = ['Authorization']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put this under the Retry object next to RETRY_AFTER_STATUS_CODES etc as a frozenset too?

(Sorry I should have mentioned that before)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for globals and didn't find them! Didn't think to look at the bottom of the module. :)

@theacodes
Copy link
Member

In my day, we erred on the side of breaking things for the sake of better security defaults. (Seems just about the most justifiable reason for breaking backwards compatibility.)

That's reasonable, I just want us to treat it with the weight that it carries. :)

Do other projects do env vars? I feel like by the time you looked up how to fix it via env var, it would be the same amount of work to just fix the problem properly.

The problem is that urllib3 is used a lot indirectly, so users may not be in control of what constructs the PoolManager/Redirect. However, making the default list a module-level constant gives us some recourse for users in that situation as they can monkey patch the list if absolutely necessary.

Definitely worth putting this change at the top of the CHANGES list for the next release.

Agreed.

@sethmlarson
Copy link
Member Author

@jonparrott So once Travis macOS builders catch up this looks good to you in its current state?

@@ -152,7 +156,8 @@ class Retry(object):
def __init__(self, total=10, connect=None, read=None, redirect=None, status=None,
method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None,
backoff_factor=0, raise_on_redirect=True, raise_on_status=True,
history=None, respect_retry_after_header=True):
history=None, respect_retry_after_header=True,
remove_headers_on_redirect=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off, I don't like None meaning default here. I assume you did this to avoid making the default parameter a mutable value - in which case, you can make the constant a frozenset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants