Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
CJK characters are missing when using youtube-dl -e to get the title #2046
Comments
|
Thank you very much for the extremely detailed bug report. As I don't own a Mac, I cannot confirm the issue, and it works fine on all my Linux boxes. What we changed in that commit is that we now always encode output strings ourselves instead of letting some pass through Python's default stdout (that often broke the experience, particularly for Windows users). Can you update to 2013.12.26 and post the output of |
|
Thank you phi, I found the root cause. In Terminal the locale is UTF-8, but in program, locale is ASCII, because LANG is not set. After I set LANG to UTF-8 in my program, everything goes fine. |
|
Reopening, we should be able to at least detect this. |
|
I have a idea that, since the incoming data's charset is known, as long as the webpage has set the charset. (nowadays it is usually UTF-8). If we detect that locale does not match the incoming data's charset, it is possible that the characters mess up. However, if the locals is not a super set of the incoming data, character could possibly still be missing after all. |
|
@niltsh Internally, we deal with characters. This allows the end-user to not have to care about which encoding the webpage happens to use. Can you post the precise output you get for |
|
the output in my App before modification [debug] User config: [] the output in my App after modification [debug] User config: [] the output in Terminal [debug] System config: [] |
Version: 2013.12.09.1 ~ 2013.12.23.4
Platform: Mac OS X 10.9
pre-condition:
0. website: youtube.com
1. command is ./youtube-dl -e xxxx
2. NOT in terminal but programmatically using NSTask/NSPipe
I made an application which will call youtube-dl and grab the result, by using NSTask/NSPipe.
I found if the title of the video has CJK characters, the CJK characters will just be missing, only the alphabets and numbers outputted.
But with same command and options, executing youtube-dl in Terminal is OK, no CJK chars is missed and I confirmed all characters come from stdout, not stderr.
So the only problem is with calling it by NSTask/NSPipe, sounds like a Mac's issue.
However, I did a binary search to find out which release has brought this problem.
and I found the last GOOD version 2013.12.09, from 2013.12.09.1 this issue happens.
Luckily there is only one commit between these two releases.
Add a workaround for terminals without bidi support (Fixes #1912)
0783b09
I wish I could debug more, but sorry I am not python guy.
Since the commit is character treating related, would you please confirm it?
BRs,
Zongyao Qu