Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Retrieve JSON data in unicode (Encoding UTF-8) #11696
Comments
|
Well the second time people are looking forward to unescaped strings (#10927). It might worth an option. Here's a quick hack: diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
index 5d654f55f..d7374e820 100755
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1535,7 +1535,7 @@ class YoutubeDL(object):
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
- self.to_stdout(json.dumps(info_dict))
+ self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
# Do nothing else if in simulate mode
if self.params.get('simulate', False): |
|
Using git shell, got like this:
I try to configure it manually. Edit Same result. |
|
Well, --write-info-json uses a different function. diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index 12863e74a..6ded34832 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -231,7 +231,7 @@ def write_json_file(obj, fn):
try:
with tf:
- json.dump(obj, tf)
+ json.dump(obj, tf, ensure_ascii=False)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.On Linux/Mac/... you can use |
|
Great..!. It works as expected.
|
|
Sadly if i used your first approach with dump json and the logs:
|
|
Most likely there are tabs - replace them all with spaces. |
|
@yan12125 Perfect. Fix now. Thank you so much
|
|
youtube-dl --encoding utf-8 --write-info-json https://www.youtube.com/watch?v=VA0rAN0GRY4 |
|
why this didn't applied as the default setting in every YouTube-dl released version? |
|
actually @yan12125 you can apply the patch on windows if you use git for windows (git bash). Well at least I can. Also to do it on Windows I am affraid you have to write the diffs to file |
|
To @linglung: It may sound silly, but not all environments supports raw (not-encoded) UTF-8. youtube-dl aims to keep compatibility with most systems, so it can't be the default. |
|
hmm you could in this case use |
|
Linux does not indicate full UTF-8 support. If one uses LC_ALL=C or LC_ALL=POSIX, UTF-8 strings can break the console. Such a setting is common in containers like Docker. (http://bugs.python.org/issue28180) On the other hand, since Python 3.6 UTF-8 support seems quite fine on Windows. (PEP528, PEP529) The logic for determining UTF-8 can be rather complicated. |
|
which is why you could have it like this on both diffs. diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
index 5d654f55f..d7374e820 100755
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1535,7 +1535,7 @@ class YoutubeDL(object):
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
- self.to_stdout(json.dumps(info_dict))
+ if sys.platform == 'win32':
+ self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
+ else:
+ self.to_stdout(json.dumps(info_dict))
# Do nothing else if in simulate mode
if self.params.get('simulate', False):diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index 12863e74a..6ded34832 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -231,7 +231,7 @@ def write_json_file(obj, fn):
try:
with tf:
- json.dump(obj, tf)
+ if sys.platform == 'win32':
+ json.dump(obj, tf, ensure_ascii=False)
+ else:
+ json.dump(obj, tf)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError. |
===================================================
I need JSON data containing unicode (utf-8) from Youtube-dl, sadly it couldn't retrieve JSON data from YouTube video in UTF-8 (?).
Trying to print JSON info with
-j, --dump-jsonor-J, --dump-single-json,--print-jsonand or wrote directly into JSON file with--write-info-json. All results were printed in non unicode data string like originally of video source.The paramaters which were used with/out
--encoding utf-8youtube-dl --write-info-json --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -vyoutube-dl -j --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -vyoutube-dl -J --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -vyoutube-dl --print-json --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -vThe log output:
Below is log of JSON data (this is only a part of full logs - but it represent the essential of this issue) as JSON data contains a huge string data.
For example: Title, tags and descriptions :