Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 decode issue for title #734

Closed
maximeg opened this issue Mar 8, 2013 · 4 comments
Closed

UTF8 decode issue for title #734

maximeg opened this issue Mar 8, 2013 · 4 comments

Comments

@maximeg
Copy link

@maximeg maximeg commented Mar 8, 2013

Hi there,

I noticed that when a webpage has non ascii chars in <title>, youtube-dl fails with "ERROR: invalid system charset or erroneous output template" (GenericIE).

I'm not a python dev, but I investigated. This is where I came :

  • In prepare_filename you could split from :
        except (ValueError, KeyError) as err:
            self.trouble(u'ERROR: invalid system charset or erroneous output template')
            return None

to

        except (KeyError) as err:
            self.trouble(u'ERROR: erroneous output template')
            return None
        except (ValueError) as err:
            self.trouble(u'ERROR: invalid system charset') # or maybe a clearer msg
            return None

as a little sugar (it will help you debugging in the future).

  • I pointed out the first thing, because after searching for those ValueError and KeyError in python, I discovered UnicodeError. I added this as a debug helper :
        except (UnicodeError) as err:
            self.trouble(u'ERROR: unicode encoding error for "' + unicode(err.object, errors='replace') + u'": ' + err.reason )
            return None

that throw me a ERROR: unicode encoding error for "R��gis blabla": ordinal not in range(128).

My python skills are spent. As it is a unicode conversion issue, I let you figure out what to do :-)

@phihag
Copy link
Contributor

@phihag phihag commented Mar 8, 2013

Sorry for the problem, but could include a the full command line log of a problematic session, with the -v option? Like this:

$ youtube-dl -tv http://www.youtube.com/watch?v=BaW_jenozKc
[debug] youtube-dl version 2013.02.25
[debug] Git HEAD: c2e21f2
[debug] Python version 2.7.3 - Linux-3.4-trunk-amd64-x86_64-with-debian-7.0
[debug] Proxy map: {}
[youtube] Setting language
[youtube] BaW_jenozKc: Downloading video webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[download] Destination: youtube-dl test video ''_ä↭-BaW_jenozKc.mp4
[download] 100.0% of 1.90M at    2.13M/s ETA 00:00

(Note: ironically, github's handling of non-BMP characters seems to have been broken, but your issue should still reproducible with ä, shouldn'it?)

Also, what filesystem are you using?

phihag added a commit that referenced this issue Mar 8, 2013
@maximeg
Copy link
Author

@maximeg maximeg commented Mar 8, 2013

Here is an url that I use for testing with this issue :

$ youtube-dl -o "video.%(ext)s" "http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html" -v
[debug] youtube-dl version 2013.02.25
[debug] Python version 2.7.3 - Linux-3.2.0-38-generic-x86_64-with-Ubuntu-12.04-precise
[debug] Proxy map: {}
WARNING: Falling back on generic information extractor.
[generic] regis-plante-sa-jeep.html: Downloading webpage
[generic] regis-plante-sa-jeep.html: Extracting information
ERROR: invalid system charset or erroneous output template
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/maxime/bin/youtube-dl/__main__.py", line 17, in <module>
    youtube_dl.main()
  File "/home/maxime/bin/youtube-dl/youtube_dl/__init__.py", line 516, in main
    _real_main()
  File "/home/maxime/bin/youtube-dl/youtube_dl/__init__.py", line 500, in _real_main
    retcode = fd.download(all_urls)
  File "/home/maxime/bin/youtube-dl/youtube_dl/FileDownloader.py", line 525, in download
    self.process_info(video)
  File "/home/maxime/bin/youtube-dl/youtube_dl/FileDownloader.py", line 400, in process_info
    filename = self.prepare_filename(info_dict)
  File "/home/maxime/bin/youtube-dl/youtube_dl/FileDownloader.py", line 364, in prepare_filename
    self.trouble(u'ERROR: invalid system charset or erroneous output template')
  File "/home/maxime/bin/youtube-dl/youtube_dl/FileDownloader.py", line 230, in trouble
    tb_data = traceback.format_list(traceback.extract_stack())

Taking account of the error split from e5edd51 , I get : ERROR: Insufficient system charset 'UTF-8'.

Further: the issue doesn't happen with Youtube (https://www.youtube.com/watch?v=3Pg1T66t68w). For now, I can see the error only with the GenericIE.

@phihag phihag closed this in 3d34235 Mar 8, 2013
@phihag
Copy link
Contributor

@phihag phihag commented Mar 8, 2013

Thanks for the report, fixed.

@maximeg
Copy link
Author

@maximeg maximeg commented Mar 8, 2013

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.