-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Remove Authorization header when redirecting cross-host #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1346 +/- ##
======================================
Coverage 100% 100%
======================================
Files 21 21
Lines 2014 2019 +5
======================================
+ Hits 2014 2019 +5
Continue to review full report at Codecov.
|
281d675 to
1d2eaed
Compare
1d2eaed to
5e9c6b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't believe this to be there quite yet.
To start out with, the header is Authorization, not Authentication, but that's a relatively minor nit.
I would prefer an implementation where the set of headers that are allowed to pass between hosts is explicitly defined - this will allow for cases where different headers are used to pass equally sensitive information - for example, someone might use X-Vender-Api-Key-Token and we should be able to take advantage of the same logic.
I would suggest the following, although @jonparrott may have a different thought and I'd like to defer to him on this:
- I can pass a keyword argument
forward_headers_across_hoststhat is a dict to either instantiate or make a request using a PoolManager, with explicitly-passed values in the request overriding the PM's instance state. - The dict I pass contains a set of header names mapping to booleans defining whether those headers should be forwarded to any new host, possibly defaulting to universally
False, or possibly only defaulting to false for certain well-defined sensitive headers likeAuthorization. - There is an overridable method on the PoolManager, receiving as arguments
original_host,redirect_host, andheader_namethat defines the behavior around whether a given header should be forwarded for a given request/redirect. By default, it implements the above, but can be overridden (for example, a user could set up an overridden method that allows a certain authorization header to be forwarded between different hosts only with certain domain names).
|
Hah, I should read MDN instead of assuming header names. And I was definitely thinking about allowing user-defined header stripping. That sounds very useful. I'll add it using whatever logic is decided to be best. Also quick question: Is PoolManager the only place for this sort of logic or is me adding this to ConnectionPool also necessary? |
|
A ConnectionPool should, IIRC, only ever connect to a single host, so it makes sense for this logic to be either in PoolManager or Redirect. |
|
This is a good start, but I would ask for a slightly more robust implementation (I'm happy to help with that as well).
Probably useful to reference https://curl.haxx.se/libcurl/c/CURLOPT_UNRESTRICTED_AUTH.html as that's where this behavior comes from. It is not formally specified in any RFC. |
|
Thanks for the reference! I wasn't sure about whether cross-host requests would be handled in ConnectionPool as I saw the |
d6c49b9 to
3a9bb62
Compare
- Default to Authorization header. - Allow different settings on the Retry object. - Remove logic from ConnectionPool.
3a9bb62 to
9c9dd6f
Compare
|
Still trying to shake transient macOS SSL tests failing. |
urllib3/poolmanager.py
Outdated
| if (retries.remove_headers_on_redirect | ||
| and not conn.is_same_host(redirect_location)): | ||
| for header in retries.remove_headers_on_redirect: | ||
| if header in headers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use headers.pop(header, None)
urllib3/poolmanager.py
Outdated
| if 'headers' not in kw: | ||
| kw['headers'] = self.headers | ||
| kw['headers'] = headers = self.headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you set kw['headers'] below, do you need to set it here?
Also, you should probably copy if using self.headers so you don't modify that dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I've implemented this.
|
This is mostly looking good, I just have one major concern: This is a very big behavioral change that's likely to break some users. Some users won't be able to directly control how PoolManager is constructed. Should we consider some sort of environment variable to change this back to the old behavior? |
|
Failing PyOpenSSL test on AppVeyor. |
Functionality has been implemented.
|
I'm +0 on an environment variable to toggle this behavior, others will have a better history of experience with this type of feature. :) |
|
In my day, we erred on the side of breaking things for the sake of better security defaults. (Seems just about the most justifiable reason for breaking backwards compatibility.) Do other projects do env vars? I feel like by the time you looked up how to fix it via env var, it would be the same amount of work to just fix the problem properly. Definitely worth putting this change at the top of the CHANGES list for the next release. What do @haikuginger @Lukasa think? |
CHANGES.rst
Outdated
| @@ -18,6 +18,10 @@ dev (master) | |||
|
|
|||
| * Lazily load `uuid` to boost performance on imports (Pull #1270) | |||
|
|
|||
| * Allow providing a list of headers to strip from requests when redirecting | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding another entry at the top of the CHANGES list that explicitly mentions the breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add this, I also want to make mention to that this only occurs when the hostname in the redirect is different than the original requests hostname.
urllib3/util/retry.py
Outdated
| @@ -152,7 +156,8 @@ class Retry(object): | |||
| def __init__(self, total=10, connect=None, read=None, redirect=None, status=None, | |||
| method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None, | |||
| backoff_factor=0, raise_on_redirect=True, raise_on_status=True, | |||
| history=None, respect_retry_after_header=True): | |||
| history=None, respect_retry_after_header=True, | |||
| remove_headers_on_redirect=None): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nicer if this were a constant value, like remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST or somesuch.
Otherwise, it's hard to figure out what headers get removed by default without reading the source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, that would be an easy addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off, I don't like None meaning default here. I assume you did this to avoid making the default parameter a mutable value - in which case, you can make the constant a frozenset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is flake8 smart enough to know frozenset isn't a mutable value/doesn't report mutable default value for frozenset? I honestly don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh, it's used above this parameter as a default so of course it works! I'll make this change.
urllib3/poolmanager.py
Outdated
| @@ -335,6 +336,12 @@ def urlopen(self, method, url, redirect=True, **kw): | |||
| if not isinstance(retries, Retry): | |||
| retries = Retry.from_int(retries, redirect=redirect) | |||
|
|
|||
| # Strip headers marked as unsafe to forward to the redirected location. | |||
| if (retries.remove_headers_on_redirect | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: Would be equivalent to have
if not conn.is_same_host(redirect_location):
for header in retries.remove_headers_on_redirect:
kw['headers'].pop(header, None)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reason I chose to do this is because of the TODO: socket.gethostname() within ConnectionPool.is_same_host() in order to short-circuit a potential future network call if not needed. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In that case, a comment would be wise so a future contributor doesn't simplify it. :)
urllib3/util/retry.py
Outdated
| @@ -19,6 +19,10 @@ | |||
|
|
|||
| log = logging.getLogger(__name__) | |||
|
|
|||
|
|
|||
| DEFAULT_REDIRECT_HEADERS_BLACKLIST = ['Authorization'] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to put this under the Retry object next to RETRY_AFTER_STATUS_CODES etc as a frozenset too?
(Sorry I should have mentioned that before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for globals and didn't find them! Didn't think to look at the bottom of the module. :)
That's reasonable, I just want us to treat it with the weight that it carries. :)
The problem is that urllib3 is used a lot indirectly, so users may not be in control of what constructs the
Agreed. |
|
@jonparrott So once Travis macOS builders catch up this looks good to you in its current state? |
urllib3/util/retry.py
Outdated
| @@ -152,7 +156,8 @@ class Retry(object): | |||
| def __init__(self, total=10, connect=None, read=None, redirect=None, status=None, | |||
| method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None, | |||
| backoff_factor=0, raise_on_redirect=True, raise_on_status=True, | |||
| history=None, respect_retry_after_header=True): | |||
| history=None, respect_retry_after_header=True, | |||
| remove_headers_on_redirect=None): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off, I don't like None meaning default here. I assume you did this to avoid making the default parameter a mutable value - in which case, you can make the constant a frozenset.
This change makes the default behavior of an HTTP redirect when the
Locationspecifies a different host than the original request to remove theAuthenticationheader if present. This behavior can be turned off by usingRetry(forward_auth_headers_across_hosts=True)which defaults toFalse. Closes #1316.