Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Add support for accented characters in output settings #7661

Closed
HASJ opened this issue Nov 27, 2015 · 12 comments
Closed

[Request] Add support for accented characters in output settings #7661

HASJ opened this issue Nov 27, 2015 · 12 comments

Comments

@HASJ
Copy link

@HASJ HASJ commented Nov 27, 2015

They aren't supported, could you add them, please?
You know

á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü
Á É Í Ó Ú Â Ê Î Ô Û Ã Õ À È Ì Ò Ù Ä Ë Ï Ö Ü

They are pretty important in some latin-based languages.
This is mostly because my videos folder is named Vídeos (like the word is in my language) therefore, it becomes Ví-deos in Y-DL, which then I have to make a junction link between the two folders and hide Vídeos.

@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

Can you post the output you get when you run youtube-dl with the -v option? Non-ASCII characters work fine for me, so it may be a system-specific problem:

$ youtube-dl -o 'Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s' test:youtube -v
[debug] System config: []
...
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
*Snip for length: This part is the output we need from you to diagnose the problem!*
...
Deleting original file Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f141.m4a (pass -k to keep)
$ ls Vídeos/
á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.mp4
...
@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

Oh, I missed the Junction link part. We'll have to test on Windows with your settings. These settings are output by -v.

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 27, 2015

youtube-dl -o 'Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s' test:youtube -v
[debug] System config: []
[debug] User config: [u'-f', u'298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0', u'-o', u'D:\\V\xc3\xaddeos\\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s']
**[debug] Command-line args: [u'-o', u"'V\xeddeos/\xe1", u'\xe9', u'\xed', u'\xf3', u'\xfa', u'\xe2', u'\xea', u'\xee', u'\xf4', u'\xfb', u'\xe3', u'\xf5', u'\xe0', u'\xe8', u'\xec', u'\xf2', u'\xf9', u'\xe4', u'\xeb', u'\xef', u'\xf6', u"\xfc.%(ext)s'", u'test:youtube', u'-v']**
**[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252**
[debug] youtube-dl version 2015.11.24
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-75499-g7179add, ffprobe N-73696-g8250943, rtmpdump 2.4
[debug] Proxy map: {}
ERROR: fixed output name but more than one file to download

I think something very dumb is happening and it's because of Windows 10... Not sure if the same thing happened with Win8, but I don't think so.
Never even seen characters using that encoding: "\xed", "\xe9?" Wtf?
EDIT: The same thing happened with the python version.

@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

Don't be afraid of this debug output. It shows characters in a strange fashion, but that's ok and not a problem in itself. On Python 3, the output would look nicer, but that's just debugging information. It is also very unlikely that Windows 8 vs Windows 10 is the problem. Much more likely is that the strange filesystem encoding is to blame.

In this example, youtube-dl crashed rightfully though, since you must quote the -o value with double quotes (") instead of single quotes. Can you post the output you get when you correctly quote the -o value? What locale of Windows is this? cp1252 and cp850 makes me think something something in Western Europe.

@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

One more query: Can you reproduce the problem when you run youtube-dl with --encoding cp1252, --encoding UTF-8 or --encoding mbcs? That should force one encoding to be used.

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 27, 2015

Fixed quotations

youtube-dl -o "Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s" test:youtube -v
[debug] System config: []
[debug] User config: [u'-f', u'298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0', u'-o', u'D:\\V\xc3\xaddeos\\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s']
[debug] Command-line args: [u'-o', u'V\xeddeos/\xe1 \xe9 \xed \xf3 \xfa \xe2 \xea \xee \xf4 \xfb \xe3 \xf5 \xe0 \xe8 \xec \xf2 \xf9 \xe4 \xeb \xef \xf6 \xfc.%(ext)s', u'test:youtube', u'-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2015.11.24
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-75499-g7179add, ffprobe N-73696-g8250943, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading DASH manifest
[youtube] BaW_jenozKc: Downloading DASH manifest
[debug] Invoking downloader on u'https://r5---sn-bg07dnl7.googlevideo.com/videoplayback?id=05a5bf8de9e8cca7&itag=136&source=youtube&requiressl=yes&ms=au&mv=m&mn=sn-bg07dnl7&mm=31&pl=23&nh=IgpwcjAyLmdydTA2KgkxMjcuMC4wLjE&ratebypass=yes&mime=video/mp4&gir=yes&clen=1673012&lmt=1387961826998447&dur=9.800&mt=1448631481&upn=zqgTRjvs4rE&sver=3&key=dg_yt0&signature=1963C411840C9B0B96A303A97941D2E85423B289.970FC777D94DBD71326FF1973E3A19529F23E170&fexp=9407118,9408710,9413140,9414803,9415983,9416126,9416916,9417683,9418144,9418203,9419451,9420452,9422596,9422618,9422970,9423419,9423662,9424217,9424964&ip=179.110.227.229&ipbits=0&expire=1448653216&sparams=ip,ipbits,expire,id,itag,source,requiressl,ms,mv,mn,mm,pl,nh,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f136.mp4
[download] 100% of 1.60MiB in 00:05
[debug] Invoking downloader on u'https://r5---sn-bg07dnl7.googlevideo.com/videoplayback?id=05a5bf8de9e8cca7&itag=141&source=youtube&requiressl=yes&ms=au&mv=m&mn=sn-bg07dnl7&mm=31&pl=23&nh=IgpwcjAyLmdydTA2KgkxMjcuMC4wLjE&ratebypass=yes&mime=audio/mp4&gir=yes&clen=315992&lmt=1387961817988214&dur=9.891&mt=1448631481&upn=zqgTRjvs4rE&sver=3&key=dg_yt0&signature=0DA116E90078E0EBFF866FF759BF7848D3F5FFCA.91EA7CB6F14058E3B84325C8DA39831DCFB6C46E&fexp=9407118,9408710,9413140,9414803,9415983,9416126,9416916,9417683,9418144,9418203,9419451,9420452,9422596,9422618,9422970,9423419,9423662,9424217,9424964&ip=179.110.227.229&ipbits=0&expire=1448653216&sparams=ip,ipbits,expire,id,itag,source,requiressl,ms,mv,mn,mm,pl,nh,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f141.m4a
[download] 100% of 308.59KiB in 00:00
[ffmpeg] Merging formats into "Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.mp4"
[debug] ffmpeg command line: ffmpeg -y -i 'file:Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f136.mp4' -i 'file:Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f141.m4a' -c copy -map 0:v:0 -map 1:a:0 'file:Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.temp.mp4'
Deleting original file Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f136.mp4 (pass -k to keep)
Deleting original file Vídeos\á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.f141.m4a (pass -k to keep)

--encoding UTF-8

youtube-dl --encoding UTF-8 -o "Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s" test:youtube -v
[debug] System config: []
[debug] User config: [u'-f', u'298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0', u'-o', u'D:\\V\xc3\xaddeos\\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s']
[debug] Command-line args: [u'--encoding', u'UTF-8', u'-o', u'V\xeddeos/\xe1 \xe9 \xed \xf3 \xfa \xe2 \xea \xee \xf4 \xfb \xe3 \xf5 \xe0 \xe8 \xec \xf2 \xf9 \xe4 \xeb \xef \xf6 \xfc.%(ext)s', u'test:youtube', u'-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref UTF-8
[debug] youtube-dl version 2015.11.24
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-75499-g7179add, ffprobe N-73696-g8250943, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading DASH manifest
[youtube] BaW_jenozKc: Downloading DASH manifest
[download] V├¡deos\├í ├® ├¡ ├│ ├║ ├ó ├¬ ├« ├┤ ├╗ ├ú ├Á ├á ├¿ ├¼ ├▓ ├╣ ├ñ ├½ ├» ├ ├╝.mp4 has already been downloaded and merged

--encoding cp1252


youtube-dl --encoding cp1252 -o "Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s" test:youtube -v
[debug] System config: []
[debug] User config: [u'-f', u'298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0', u'-o', u'D:\\V\xc3\xaddeos\\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s']
[debug] Command-line args: [u'--encoding', u'cp1252', u'-o', u'V\xeddeos/\xe1 \xe9 \xed \xf3 \xfa \xe2 \xea \xee \xf4 \xfb \xe3 \xf5 \xe0 \xe8 \xec \xf2 \xf9 \xe4 \xeb \xef \xf6 \xfc.%(ext)s', u'test:youtube', u'-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2015.11.24
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-75499-g7179add, ffprobe N-73696-g8250943, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading DASH manifest
[youtube] BaW_jenozKc: Downloading DASH manifest
[download] VÝdeos\ß Ú Ý ¾ · Ô Û ¯ ¶ ¹ Ò § Ó Þ ý ‗ ¨ õ Ù ´ ÷ ³.mp4 has already been downloaded and merged

--encoding mbcs


youtube-dl --encoding mbcs -o "Vídeos/á é í ó ú â ê î ô û ã õ à è ì ò ù ä ë ï ö ü.%(ext)s" test:youtube -v
[debug] System config: []
[debug] User config: [u'-f', u'298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0', u'-o', u'D:\\V\xc3\xaddeos\\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s']
[debug] Command-line args: [u'--encoding', u'mbcs', u'-o', u'V\xeddeos/\xe1 \xe9 \xed \xf3 \xfa \xe2 \xea \xee \xf4 \xfb \xe3 \xf5 \xe0 \xe8 \xec \xf2 \xf9 \xe4 \xeb \xef \xf6 \xfc.%(ext)s', u'test:youtube', u'-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref mbcs
[debug] youtube-dl version 2015.11.24
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-75499-g7179add, ffprobe N-73696-g8250943, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading DASH manifest
[youtube] BaW_jenozKc: Downloading DASH manifest
[download] VÝdeos\ß Ú Ý ¾ · Ô Û ¯ ¶ ¹ Ò § Ó Þ ý ‗ ¨ õ Ù ´ ÷ ³.mp4 has already been downloaded and merged

I won't even pretend to understand what is going on, lol.

Locale

Brasil, PT-BR (Portuguese Brazilian). That's what I don't get. It should be using UTF-8, shouldn't it?

@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

No, you are not using UTF-8, but a multitude of different encodings - that is the problem.

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 27, 2015

So, how do I fix it...? I never messed with these settings.

@phihag
Copy link
Contributor

@phihag phihag commented Nov 27, 2015

We need to detect the correct encoding, and transfer to that. Most likely, we use the encoding your system uses for output, or for display instead of the encoding for the file system.

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 27, 2015

Man, I give up. This seems to be happening exclusively here so it's not Y-DL's thing. Sorry for the wasted time.
W10 sucks more each day :|

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 30, 2015

@phihag Despite the CLI interface having all kinds messed up characters, the output directory and file had the correct characters. Any idea why they're being displayed so weirdly or was that expected?
What I'm not getting is why setting the config.txt to

-f 298+bestaudio/136+bestaudio/best/135+bestaudio/134+bestaudio/133+bestaudio/160+bestaudio/0
-o "D:\Vídeos\[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s"
#--external-downloader aria2c
#--external-downloader-args "--file-allocation=falloc --min-split-size=1M --split=16 --max-connection-per-server=16"

is giving me the Ví-deos output folder.

When I set '-o "D:\Vídeos[%(extractor)s] %(uploader)s-%(upload_date)s-%(playlist)s #%(playlist_index)s - %(title)s [%(id)s].%(ext)s"' directly through the CLI, Vídeos is used correctly.

@HASJ
Copy link
Author

@HASJ HASJ commented Nov 30, 2015

Wow... I discovered the problem.
Had to set my config.txt to ANSI.

@HASJ HASJ closed this May 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.