Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash when website sends wrong charset information #2721

Closed
anisse opened this issue Apr 7, 2014 · 2 comments
Closed

Don't crash when website sends wrong charset information #2721

anisse opened this issue Apr 7, 2014 · 2 comments

Comments

@anisse
Copy link
Contributor

@anisse anisse commented Apr 7, 2014

Some (badly written) webpage might send wrong encoding information, for exemple with header Content-Type: text/html; charset=ISO-8859
This generates this traceback :

Traceback (most recent call last):
File "./bin/youtube-dl", line 6, in
youtube_dl.main()
File "/home/data/dev/youtube-dl/youtube_dl/init.py", line 837, in main
_real_main(argv)
File "/home/data/dev/youtube-dl/youtube_dl/init.py", line 827, in _real_main
retcode = ydl.download(all_urls)
File "/home/data/dev/youtube-dl/youtube_dl/YoutubeDL.py", line 1033, in download
self.extract_info(url)
File "/home/data/dev/youtube-dl/youtube_dl/YoutubeDL.py", line 511, in extract_info
ie_result = ie.extract(url)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 161, in extract
return self._real_extract(url)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/generic.py", line 377, in _real_extract
webpage = self._download_webpage(url, video_id)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 270, in _download_webpage
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 254, in _download_webpage_handle
content = webpage_bytes.decode(encoding, 'replace')
LookupError: unknown encoding: ISO-8859

This could be fixed by falling back to ISO-8859-1 (HTTP 1.1's mandated behaviour), or (better IMO) to UTF-8.

@phihag
Copy link
Contributor

@phihag phihag commented Apr 7, 2014

Can you name an example URL? That would allow us to test this very easily.

@anisse
Copy link
Contributor Author

@anisse anisse commented Apr 7, 2014

Yup. Look at http://www.mangaxd.ws/streaming-Animes-2979-nisekoi-001-VOSTFR.html

I'm preparing a patch for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.