Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Character corruption #10927
Character corruption #10927
Comments
|
Not sure what you want. For me Korean characters are correctly parsed when
|
|
The problem isn't fixed in youtube-dl (2016.10.16), which is downloaded by youtube-dl -U.
|
|
Is there a reason not using common JSON tools like |
|
Thank for your reply. Thank you for reading. I wish the progress of youtube-dl. |
|
First, I'd like to clarify that those characters are not "corrupted". Instead, |
Character corruption generates in json of youtube-dl.
Its cause is that python don't well treat other than English with only .encode(utf-8) and there is the rests of improvements of python codes.
A concrete example
Namely, import codecs is needed, not open( but codecs.open( has to be used, and ensure_ascii=False has to be added as a parameter. The last replace has to be applied to not json.dumps(foo).encode(utf-8) but all of json.dumps(foo).
I have done these in all descriptions of all files found with
Only ensure_ascii=False may be enough. I don't know if doing these in all the descriptions of all the files is appreciate or not because I don't read all the codes of youtube-dl.
But I get good results with the below command
Also, I recommend using explicitly
because character corruption generates when LANG isn't set to foo_bar.UTF-8.
Thank you for reading.