Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-ASCII URLs (was: IDN with proxy) #8890

Closed
nlevitt opened this issue Mar 18, 2016 · 5 comments
Closed

Support non-ASCII URLs (was: IDN with proxy) #8890

nlevitt opened this issue Mar 18, 2016 · 5 comments
Labels

Comments

@nlevitt
Copy link

@nlevitt nlevitt commented Mar 18, 2016

$ youtube-dl --proxy=wbgrp-svc035:8000 'http://þ.com/'
[generic] þ: Requesting header
Traceback (most recent call last):
  File "/1/brzl/brozzler-ve34/bin/youtube-dl", line 9, in <module>
    load_entry_point('youtube-dl==2015.09.22', 'console_scripts', 'youtube-dl')()
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/__init__.py", line 411, in main
    _real_main(argv)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/__init__.py", line 401, in _real_main
    retcode = ydl.download(all_urls)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1659, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 661, in extract_info
    ie_result = ie.extract(url)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/extractor/common.py", line 287, in extract
    return self._real_extract(url)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/extractor/generic.py", line 1161, in _real_extract
    fatal=False)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/extractor/common.py", line 326, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1871, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 463, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 481, in _open
    '_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/1/brzl/brozzler-ve34/lib/python3.4/site-packages/youtube_dl/utils.py", line 669, in http_open
    req)
  File "/usr/lib/python3.4/urllib/request.py", line 1182, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib/python3.4/http/client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1116, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.4/http/client.py", line 973, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfe' in position 12: ordinal not in range(128)

I suppose this happens because the full url goes on the request line. Maybe it needs to be puny-coded here. Works fine without a proxy.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Mar 18, 2016

Post the full output of youtube-dl when run with -v, i.e. add -v flag to your command line, copy the whole output and post it in the issue body wrapped in ``` for better formatting. It should look similar to this:

$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...

Do not post screenshots of verbose log only plain text is acceptable.

The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.

@dstftw dstftw closed this Mar 18, 2016
@nlevitt
Copy link
Author

@nlevitt nlevitt commented Mar 19, 2016

$ youtube-dl -v --proxy=wbgrp-svc035:8000 'http://þ.com/'
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '--proxy=wbgrp-svc035:8000', 'http://þ.com/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.03.18
[debug] Git HEAD: a43b501
[debug] Python version 3.5.0 - Darwin-15.3.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {'http': 'wbgrp-svc035:8000', 'https': 'wbgrp-svc035:8000'}
[generic] þ: Requesting header
Traceback (most recent call last):
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/bin/youtube-dl", line 11, in <module>
    sys.exit(main())
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/__init__.py", line 412, in main
    _real_main(argv)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/__init__.py", line 402, in _real_main
    retcode = ydl.download(all_urls)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 1719, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 668, in extract_info
    ie_result = ie.extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 320, in extract
    return self._real_extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 1226, in _real_extract
    fatal=False)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 365, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 1929, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 465, in open
    response = self._open(req, data)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 483, in _open
    '_open', req)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 443, in _call_chain
    result = func(*args)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/utils.py", line 746, in http_open
    req)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1240, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1083, in request
    self._send_request(method, url, body, headers)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1118, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 960, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfe' in position 12: ordinal not in range(128)
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Mar 19, 2016

Non-ASCII domain names are invalid. See http://bugs.python.org/issue17214.

@nlevitt
Copy link
Author

@nlevitt nlevitt commented Mar 20, 2016

youtube-dl doesn't have this problem when not using a proxy.

$ youtube-dl -v 'http://þ.com/'
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', 'http://þ.com/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.03.18
[debug] Git HEAD: a43b501
[debug] Python version 3.5.0 - Darwin-15.3.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[generic] þ: Requesting header
WARNING: Falling back on generic information extractor.
[generic] þ: Downloading webpage
[generic] þ: Extracting information
ERROR: Unsupported URL: http://þ.com/
Traceback (most recent call last):
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 668, in extract_info
    ie_result = ie.extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 320, in extract
    return self._real_extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 1961, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://þ.com/

I guess it needs to punycode the hostname before issuing the request to the proxy.

@yan12125 yan12125 reopened this Mar 20, 2016
@yan12125 yan12125 added the request label Mar 20, 2016
@yan12125 yan12125 changed the title IDN with proxy Support non-ASCII URLs (was: IDN with proxy) Mar 20, 2016
@yan12125 yan12125 closed this in efbed08 Mar 23, 2016
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Mar 23, 2016

Thanks for the report. IDN with proxies will work in the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.