Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

character encoding #11436

Closed
xczheng opened this issue Dec 13, 2016 · 2 comments
Closed

character encoding #11436

xczheng opened this issue Dec 13, 2016 · 2 comments

Comments

@xczheng
Copy link

@xczheng xczheng commented Dec 13, 2016

This is not an issue but can somebody help me understand why we have the encoding difference here:
After I get the info here:

info = ydl.extract_info('https://www.youtube.com/watch?t=4&v=BaW_jenozKc', download=True)
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading MPD manifest
WARNING: "id" field is not a string - forcing string conversion
[download] youtube-dl test video ''_ä↭𝕐-BaW_jenozKc.mp4 has already been downloaded
[download] 100% of 1.74MiB

info['title']
u'youtube-dl test video "'/\\xe4\u21ad\U0001d550'
encodeFilename(info['title'])
'youtube-dl test video "'/\\xc3\xa4\xe2\x86\xad\xf0\x9d\x95\x90'

And the file downloaded on the disk has this:

os.listdir('.')
["youtube-dl test video ''_\xc3\xa4\xe2\x86\xad\xf0\x9d\x95\x90-BaW_jenozKc.mp4"]

Why do we have the encoding difference here? I am asking because I can not find the downloaded file by the filename '%(title)s-%(id)s.%(ext)s' % info.

'%(title)s-%(id)s.%(ext)s' % info
u'youtube-dl test video "'/\\xe4\u21ad\U0001d550-BaW_jenozKc.mp4'
os.path.getsize('%(title)s-%(id)s.%(ext)s' % info)
Traceback (most recent call last):
File "", line 1, in
File "/home2/goyoutub/python27/lib/python2.7/genericpath.py", line 57, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: 'youtube-dl test video "'/\\xc3\xa4\xe2\x86\xad\xf0\x9d\x95\x90-BaW_jenozKc.mp4'

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Dec 14, 2016

Try adding this line in the first line of your source code:

from __future__ import unicode_literals
@xczheng
Copy link
Author

@xczheng xczheng commented Dec 14, 2016

Thanks. However, they are already unicode but they are just different unicode.
By looking into the code, I find that the filename youtube-dl creates is from this function call:
ydl = youtube_dl.YoutubeDL(ydl_opts)
info = ydl.extract_info(special_url, download=True)
filename = ydl.prepare_filename(info)
the prepare_filename replaces some characters somehow. Now I can get the filename for the downloaded file when there is special characters in the title.

@xczheng xczheng closed this Dec 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.