Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Don't crash when website sends wrong charset information #2721
Comments
|
Can you name an example URL? That would allow us to test this very easily. |
|
Yup. Look at http://www.mangaxd.ws/streaming-Animes-2979-nisekoi-001-VOSTFR.html I'm preparing a patch for this. |
Some (badly written) webpage might send wrong encoding information, for exemple with header Content-Type: text/html; charset=ISO-8859
This generates this traceback :
Traceback (most recent call last):
File "./bin/youtube-dl", line 6, in
youtube_dl.main()
File "/home/data/dev/youtube-dl/youtube_dl/init.py", line 837, in main
_real_main(argv)
File "/home/data/dev/youtube-dl/youtube_dl/init.py", line 827, in _real_main
retcode = ydl.download(all_urls)
File "/home/data/dev/youtube-dl/youtube_dl/YoutubeDL.py", line 1033, in download
self.extract_info(url)
File "/home/data/dev/youtube-dl/youtube_dl/YoutubeDL.py", line 511, in extract_info
ie_result = ie.extract(url)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 161, in extract
return self._real_extract(url)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/generic.py", line 377, in _real_extract
webpage = self._download_webpage(url, video_id)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 270, in _download_webpage
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
File "/home/data/dev/youtube-dl/youtube_dl/extractor/common.py", line 254, in _download_webpage_handle
content = webpage_bytes.decode(encoding, 'replace')
LookupError: unknown encoding: ISO-8859
This could be fixed by falling back to ISO-8859-1 (HTTP 1.1's mandated behaviour), or (better IMO) to UTF-8.