Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[generic] fragment indicators confuse redirect logic #12501

Open
johnhawkinson opened this issue Mar 19, 2017 · 1 comment
Open

[generic] fragment indicators confuse redirect logic #12501

johnhawkinson opened this issue Mar 19, 2017 · 1 comment

Comments

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Mar 19, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.03.20

Presence of a fragment indicator (#) at the end of a URL causes youtube-dl to think it is a different URL, and the redirect logic in extractor/generic.py gets confused:

pb3:extractor jhawk$ youtube-dl -vs --write-pages 'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf#'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-vs', u'--write-pages', u'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf#']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.03.20
[debug] Python version 2.7.10 - Darwin-14.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg git-2017-02-28-7f62368, ffprobe git-2017-02-28-7f62368, rtmpdump 2.4
[debug] Proxy map: {}
[generic] thewolf: Requesting header
[redirect] Following redirect to http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf
[generic] thewolf: Requesting header
WARNING: Falling back on generic information extractor.
[generic] thewolf: Downloading webpage
WARNING: URL could be a direct video link, returning it as such.

That'd be this check at 1695, I guess:

(Pdb) l
1691 	
1692 	        if head_response is not False:
1693 	            # Check for redirect
1694 	            new_url = head_response.geturl()
1695 	            if url != new_url:
1696 ->	                self.report_following_redirect(new_url)
1697 	                if force_videoid:
1698 	                    new_url = smuggle_url(
1699 	                        new_url, {'force_videoid': force_videoid})
1700 	                return self.url_result(new_url)
1701 	
(Pdb) p url
u'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf#'
(Pdb) p new_url
u'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf'
(Pdb) 

And just for the record, no Location: header here:

pb3:extractor jhawk$ youtube-dl -s 'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf#' --print-traffic
[generic] thewolf: Requesting header
send: u'HEAD /us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf HTTP/1.1\r\nHost: www8.hp.com\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache
header: Last-Modified: Sun, 19 Mar 2017 15:25:54 GMT
header: X-Powered-By: Servlet/2.5 JSP/2.1
header: Content-Encoding: gzip
header: Content-Type: text/html; charset=UTF-8
header: Content-Length: 11065
header: Date: Sun, 19 Mar 2017 23:46:09 GMT
header: Connection: close
header: Vary: Accept-Encoding
[redirect] Following redirect to http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf
...

(Apparently this doesn't have much to do with why this video doesn't download, but that'd be another issue.)

@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Mar 24, 2017

It appears that 54b960f gives another instance of this problem, since it compares URLs with string equality

+                if new_url != url:
+                    self.report_following_redirect(new_url)

and generally speaking URLs can have different #fragments and thus not compare as equal, but not merit a redirect. I haven't actually verified this problem in the commit from yesterday, I'm just concluding that based on inspection.

But the original problem I raised in this issue is certainly still present, as shown by python -m youtube_dl -vs 'http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf#'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.