Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: must be unicode, not str" when using --write-description #7178

Closed
sebma opened this issue Oct 14, 2015 · 10 comments
Closed

"TypeError: must be unicode, not str" when using --write-description #7178

sebma opened this issue Oct 14, 2015 · 10 comments
Labels
bug

Comments

@sebma
Copy link

@sebma sebma commented Oct 14, 2015

Hi, I have the following error :

"TypeError: must be unicode, not str"

The command I typed is the following :

youtube-dl --verbose --ignore-config --write-description http://www.veoh.com/watch/v46093745wbEGkakh
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'--ignore-config', u'--write-description', u'http://www.veoh.com/watch/v46093745wbEGkakh']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.10.13
[debug] Python version 2.7.8 - Linux-3.16.0-44-generic-x86_64-with-Ubuntu-14.10-utopic
[debug] exe versions: avconv 11.2-6, avprobe 11.2-6, ffmpeg 2.6.2, ffprobe 2.6.2, rtmpdump 2.4
[debug] Proxy map: {}
[Veoh] v46093745wbEGkakh: Downloading video XML
[info] Writing video description to: Pirates Of Silicon Valley-46093745.description
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/init.py", line 410, in main
real_main(argv)
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/_init.py", line 400, in _real_main
retcode = ydl.download(all_urls)
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1665, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 671, in extract_info
return self.process_ie_result(ie_result, download, extra_info)
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 717, in process_ie_result
return self.process_video_result(ie_result, download=download)
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1335, in process_video_result
self.process_info(new_info)
File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1465, in process_info
descfile.write(info_dict['description'])
TypeError: must be unicode, not str

Any idea ?

Seb.

@gaming-hacker
Copy link

@gaming-hacker gaming-hacker commented Oct 25, 2015

there is probably some strange non-ascii characters in the string not being recognized

@yan12125 yan12125 added the bug label Oct 25, 2015
@sebma
Copy link
Author

@sebma sebma commented Oct 25, 2015

Ok,

Is there an option to strip or ignore these non-ascii characters ?

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Oct 25, 2015

The problem is that on python 2.x xml.etree.ElementTree uses str instead of unicode (like python 3.x) for the attribute values of the xml nodes. The issue is also present in other extractors (like niconico) and other fields are also not unicode (username, title ...). Probably the simplest fix is to make --write-description accept non unicode values, but ideally we should make sure that all fields in the info_dict are unicode.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Oct 25, 2015

What if we just add a recursive auto decoding for all bytestrings in info_dict in YoutubeDL.process_video_result?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Oct 25, 2015

I prefer what @jaimeMF said, that is, requiring all string fields in info_dict be unicode. Changing things in extractors is simpler and less prone to bugs. I guess XML issues can be resolved in compat.py or util.py.

By the way, for VeohIE, seems v.* videos are handled via XML and yapi-.* videos are delegated to YoutubeIE. The JSON part is never reached, is it?

@sebma As of current the simplest workaround is running youtube-dl with Python 3. For example:

python3 /path/to/youtube-dl --verbose --ignore-config --write-description "http://www.veoh.com/watch/v46093745wbEGkakh"
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Oct 25, 2015

Conventionally we require it, but don't check it anywhere. Fixing this in extractors will involve fixing almost every extractor that uses xml.etree.ElementTree directly.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Oct 25, 2015

In #7296 I have wrapped every call to xml.etree.ElementTree.fromstring with compat_etree_fromstring, which converts to unicode object the attributes.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Oct 25, 2015

By the way, for VeohIE, seems v.* videos are handled via XML and yapi-.* videos are delegated to YoutubeIE. The JSON part is never reached, is it?

I don't know if there are more video types, since @dstftw made b540697 he may know better.

@gaming-hacker
Copy link

@gaming-hacker gaming-hacker commented Oct 26, 2015

why can't you force UTF-8/16? when grabbing the file? this is what i use with wget to get rid of non iso-8859-1 characters.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Oct 31, 2015

Closing since #7296 landed. This functionality will work on both Python 2 and Python 3 in the next version.

@yan12125 yan12125 closed this Oct 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.