Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to get the final request URL after redirects, like the function urllib2.urlopen().geturl(). #1272

Closed
bosihu opened this issue Oct 2, 2017 · 8 comments

Comments

@bosihu
Copy link

bosihu commented Oct 2, 2017

No description provided.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Oct 2, 2017

You cannot: this functionality is not supported at this time. Patches to add it would be welcome. 😄

@crwilcox
Copy link
Contributor

crwilcox commented Apr 6, 2018

I was investigating this and have a possible solution, but it may make more sense to have this as a utility than to put it off of urlopen. HttpResponse, which is returned by urllib3.PoolManager().request and urllib3.PoolManager().urlopen, doesn't seem to have the url that the data came from. However, HttpResponse does have retry history which, in the case there was a redirect, the url can be discovered. A field containing the location could be added but I am not sure it makes sense to add an additional field to each HttpResponse object. This field would be needed to handle the case that the response was not from a redirect.

If you try

r = http.request('GET', 'http://github.com/shazow/urllib3')
r.retries.history

You will get the following:

RequestHistory(method='GET', url='http://github.com/shazow/urllib3', error=None, status=301, redirect_location='https://github.com/shazow/urllib3')
RequestHistory(method='GET', url='https://github.com/shazow/urllib3', error=None, status=301, redirect_location='https://github.com/urllib3/urllib3')

This will be empty in the case of no redirects.

The following code seems to do what you want:

def geturl(request_type, url):
    http = urllib3.PoolManager()
    response = http.request(request_type, url)
    if len(response.retries.history):
        return response.retries.history[-1].redirect_location
    else:
        return url

@theacodes theacodes changed the title How can I get the finally url after several times redirect, like the function urllib2.urlopen().geturl(). Add functionality to get the final request URL after redirects, like the function urllib2.urlopen().geturl(). Apr 9, 2018
@sethmlarson
Copy link
Member

sethmlarson commented Apr 23, 2018

Hey @crwilcox could you turn that into a method of Response? I'd gladly accept that change. :)

@crwilcox
Copy link
Contributor

@SethMichaelLarson sure. feel free to assign this to me and I can probably do it Friday :)

@sethmlarson
Copy link
Member

Consider yourself officially assigned! (GitHub isn't letting me assign non-collaborators?)

@crwilcox
Copy link
Contributor

@SethMichaelLarson,
I looked at this again and recall why I didn't just make a method and PR. From what I could tell earlier the request url isn't recorded in the response object. So adding response.geturl() would require additional data to be stored on response in the case there wasn't a redirect. Holding this extra data for, essentially, a helper method seemed heavy-handed. Did I perhaps miss a field on response that would allow me to retrieve the requested url?

@sethmlarson
Copy link
Member

If you've got time I'd still like to see a PR for this functionality even if it adds a field to response objects. It's genuinely useful functionality we're missing compared to httplib.

crwilcox added a commit to crwilcox/urllib3 that referenced this issue Apr 27, 2018
sethmlarson pushed a commit that referenced this issue Apr 30, 2018
Implements #1272 by adding a geturl method to HTTPResponse objects
@sethmlarson
Copy link
Member

Closed by #1382

Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this issue Mar 16, 2022
Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this issue Mar 16, 2022
Implements urllib3#1272 by adding a geturl method to HTTPResponse objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants