Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding UTF-8 #11650

Closed
linglung opened this issue Jan 9, 2017 · 17 comments
Closed

Encoding UTF-8 #11650

linglung opened this issue Jan 9, 2017 · 17 comments
Labels

Comments

@linglung
Copy link

@linglung linglung commented Jan 9, 2017

I've read many documentations here about utf8, and still i face this issue.
youtube-dl --version
2017.01.08

Execute youtube-dl.exe for windows in native windows cmd.

youtube-dl -e --encoding utf-8 "https://www.youtube.com/watch?v=E_JXrNAxGzM" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-e', '--encoding', 'utf-8', 'https://www.youtube.com/watch?v=E_JXrNAxGzM', '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref utf8
[debug] youtube-dl version 2017.01.08
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-82966-g6993bb4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
27/12/2016 晚間新聞 楊家駿直播睇手機

it gives strange and wrong output like this
27/12/2016 晚間新聞 楊家駿直播睇手機

inside of the original title :
27/12/2016 晚間新聞 楊家駿直播睇手機

While using Git Bash, it gives different wrong output
27/12/2016 æéæ°è æ¥å®¶é§¿ç´æ­çææ©

youtube-dl -e --encoding utf-8 "https://www.youtube.com/watch?v=E_JXrNAxGzM" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-e', '--encoding', 'utf-8', 'https://www.youtube.com/watch?v=E_JXrNAxGzM', '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref utf-8
[debug] youtube-dl version 2017.01.08
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-82966-g6993bb4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
27/12/2016 æéæ°è æ¥å®¶é§¿ç´æ­çææ©

To solve this in git bash, i have to set the locale options first to utf 8 (Options - text- locale), but not for windows cmd.

So how to solve this in most generic windows OS without git bash window? or without set git bash options setting?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

Does this work? youtube-dl -e "https://www.youtube.com/watch?v=E_JXrNAxGzM" -v

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

@yan12125 nope. I already tried that. All unicode title were stripped.
Here is output from Git bash window:
youtube-dl -e "https://www.youtube.com/watch?v=E_JXrNAxGzM" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-e', 'https://www.youtube.com/watch?v=E_JXrNAxGzM', '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref cp1252
[debug] youtube-dl version 2017.01.08
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-82966-g6993bb4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
27/12/2016

git output log

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Actually seems the output of cmd Windows is correct, tried to change the font of cmd windows to Lucida console. It Just couldn't display the actual character.

correct

But when the logs were copied into clipboard and paste here, it's correct.
So the problem is my CMD windows ?

youtube-dl -e "https://www.youtube.com/watch?v=E_JXrNAxGzM" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-e', 'https://www.youtube.com/watch?v=E_JXrNAxGzM', '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2017.01.08
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-82966-g6993bb4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
27/12/2016 晚間新聞 楊家駿直播睇手機

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

Can you use Chinese fonts in CMD?

Forget git bash - such UNIX emulators never work perfectly in Windows.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

By the way, could you also try Python 3.6? It's multi-language support is better than youtube-dl's built-in version (3.4.4)

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Python installed already 3.6, but didn't know why youtube-dl log said it as 3.4.4
python version

Can you use Chinese fonts in CMD?

Nope. It shows only box square character. Tried with chcp 6500/1, 1252. Same result

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

How to use my python installed rather than youtube-dl built in?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

Nope. It shows only box square character. Tried with chcp 6500/1, 1252. Same result

I mean the font dialog in CMD properties. For example, on my machine I can use MingLiU(細明體) to display Chinese characters.

How to use my python installed rather than youtube-dl built in?

Download the UNIX version at https://github.com/rg3/youtube-dl/releases/download/2017.01.08/youtube-dl and run

C:\>python youtube-dl
@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Tried it with Unix version
unix version

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

out utf-8

This encoding is used for printing things to terminal. As it's already utf-8, the problem is apparently in CMD.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

By "font dialog", I meant this dialog. Do you have other choices than Lucida Console? IIRC it's not designed for Chinese so characters may be missing.
font

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Sadly there is no font like yours.
not

btw, thank you so much @yan12125 for your great assistance. I'll try to explore more possibility based on your nice enlightment

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

Some people say that Consolas can display Chinese. If still broken you may want to find how to install additional fonts to CMD.

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Yes. Great idea. I've already added more fonts into CMD. And my screenshot above shows it with courier new font added , while it's not available before.
This done after i read your comment, at this point:

I mean the font dialog in CMD properties. For example, on my machine I can use MingLiU(細明體) to display Chinese characters.

Now will try it with chinese font.

@linglung
Copy link
Author

@linglung linglung commented Jan 9, 2017

Again, thank you very much @yan12125 after adding chinese font. It works like a charm !! .
Works also with youtube-dl python built in.

works

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 9, 2017

Glad to see that works.

Windows sucks as usual, and this time you overcome it :)

@yan12125 yan12125 closed this Jan 9, 2017
@Hrxn
Copy link

@Hrxn Hrxn commented Jan 9, 2017

Windows CMD normally uses codepage 437 or 850, defined by your locale.

You can change it to unicode with chcp 65001. Or CMD with CMD /U.

For reference: http://ss64.com/nt/chcp.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.