Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--write-comments doesn't support unicode characters #2139

Closed
6 tasks done
Galgamins opened this issue Dec 27, 2021 · 2 comments
Closed
6 tasks done

--write-comments doesn't support unicode characters #2139

Galgamins opened this issue Dec 27, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@Galgamins
Copy link

Galgamins commented Dec 27, 2021

Checklist

  • I'm reporting a bug unrelated to a specific site
  • I've verified that I'm running yt-dlp version 2021.12.27. (update instructions)
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I've read the guidelines for opening an issue

Description

The --write-comments option does not support unicode characters, causing non-Ascii based comments to be written as unicode numbers (e.g. "text": "\u3070\u3076\u3061\u3001\u3053\u3093\u306a\u306b\u304b\u308f\u3044\u304b\u3063\u305f\u3093\u3060\u3001\u597d\u304d")

This adds an additional to step to anyone wanting to grab video comments in languages that are not ascii based.

There is already unicode support in many other commands, so I imagine the only change needed would be to utilitize whatever method of writing unicode is being used elsewhere and use it in --write-comments as well.

[info.json output. Appended .txt for upload to github](url
初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].info.json.txt
)

Verbose log

G:\VTuber\Tools\yt-dlp
λ yt-dlp -Uv --write-comments https://www.youtube.com/watch?v=wJF-Lr-vNV0
[debug] Command-line config: ['-Uv', '--write-comments', 'https://www.youtube.com/watch?v=wJF-Lr-vNV0']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, err utf-8, pref cp1252
[debug] yt-dlp version 2021.12.27 [6223f67] (win_exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] exe versions: ffmpeg 2021-10-07-git-b6aeee2d8b-full_build-www.gyan.dev (setts), ffprobe 2021-10-07-git-b6aeee2d8b-full_build-www.gyan.dev
[debug] Optional libraries: Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2021.12.27, Current version: 2021.12.27
yt-dlp is up to date (2021.12.27)
[debug] [youtube] Extracting URL: https://www.youtube.com/watch?v=wJF-Lr-vNV0
[youtube] wJF-Lr-vNV0: Downloading webpage
[youtube] wJF-Lr-vNV0: Downloading android player API JSON
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, codec:vp9.2, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), acodec, lang, proto, filesize, fs_approx, tbr, vbr, abr, asr, vext, aext, hasaud, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] wJF-Lr-vNV0: Downloading 1 format(s): 248+251
[youtube] Downloading comment section API JSON
[youtube] Downloading ~857 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/857)
[youtube] Downloading comment API JSON page 2 (20/857)
[youtube] Downloading comment API JSON page 3 (40/857)
[youtube]     Downloading comment API JSON reply thread 1 (57/857)
[youtube] Downloading comment API JSON page 4 (61/857)
[youtube] Downloading comment API JSON page 5 (81/857)
[youtube]     Downloading comment API JSON reply thread 1 (90/857)
[youtube]     Downloading comment API JSON reply thread 2 (96/857)
[youtube] Downloading comment API JSON page 6 (105/857)
[youtube] Downloading comment API JSON page 7 (125/857)
[youtube] Downloading comment API JSON page 8 (145/857)
[youtube]     Downloading comment API JSON reply thread 1 (154/857)
[youtube] Downloading comment API JSON page 9 (169/857)
[youtube]     Downloading comment API JSON reply thread 1 (174/857)
[youtube] Downloading comment API JSON page 10 (190/857)
[youtube] Downloading comment API JSON page 11 (210/857)
[youtube] Downloading comment API JSON page 12 (230/857)
[youtube]     Downloading comment API JSON reply thread 1 (234/857)
[youtube] Downloading comment API JSON page 13 (253/857)
[youtube] Downloading comment API JSON page 14 (273/857)
[youtube] Downloading comment API JSON page 15 (293/857)
[youtube]     Downloading comment API JSON reply thread 1 (296/857)
[youtube] Downloading comment API JSON page 16 (315/857)
[youtube] Downloading comment API JSON page 17 (335/857)
[youtube]     Downloading comment API JSON reply thread 1 (354/857)
[youtube] Downloading comment API JSON page 18 (356/857)
[youtube] Downloading comment API JSON page 19 (376/857)
[youtube]     Downloading comment API JSON reply thread 1 (390/857)
[youtube] Downloading comment API JSON page 20 (397/857)
[youtube] Downloading comment API JSON page 21 (417/857)
[youtube] Downloading comment API JSON page 22 (437/857)
[youtube] Downloading comment API JSON page 23 (457/857)
[youtube] Downloading comment API JSON page 24 (477/857)
[youtube]     Downloading comment API JSON reply thread 1 (490/857)
[youtube] Downloading comment API JSON page 25 (499/857)
[youtube] Downloading comment API JSON page 26 (519/857)
[youtube] Downloading comment API JSON page 27 (539/857)
[youtube]     Downloading comment API JSON reply thread 1 (542/857)
[youtube] Downloading comment API JSON page 28 (560/857)
[youtube] Downloading comment API JSON page 29 (580/857)
[youtube] Downloading comment API JSON page 30 (600/857)
[youtube] Downloading comment API JSON page 31 (620/857)
[youtube] Downloading comment API JSON page 32 (640/857)
[youtube] Downloading comment API JSON page 33 (660/857)
[youtube] Downloading comment API JSON page 34 (680/857)
[youtube]     Downloading comment API JSON reply thread 1 (682/857)
[youtube]     Downloading comment API JSON reply thread 2 (685/857)
[youtube] Downloading comment API JSON page 35 (702/857)
[youtube] Downloading comment API JSON page 36 (722/857)
[youtube] Downloading comment API JSON page 37 (742/857)
[youtube] Downloading comment API JSON page 38 (762/857)
[youtube]     Downloading comment API JSON reply thread 1 (782/857)
[youtube] Downloading comment API JSON page 39 (783/857)
[youtube] Downloading comment API JSON page 40 (803/857)
[youtube] Downloading comment API JSON page 41 (823/857)
[youtube] Downloading comment API JSON page 42 (843/857)
[youtube] Extracted 857 comments
[info] Writing video metadata as JSON to: 初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].info.json
[debug] Invoking downloader on "https://rr1---sn-ab5l6nzr.googlevideo.com/videoplayback?expire=1640663540&ei=lDXKYYiHAsn08gSBp6_4DQ&ip=8.21.13.13&id=o-ANbRmKnHcZ7DfK94SY1-xperSS45ZjXwO0yMjCbqzJCu&itag=248&source=youtube&requiressl=yes&mh=6m&mm=31%2C26&mn=sn-ab5l6nzr%2Csn-vgqsrne6&ms=au%2Conr&mv=m&mvi=1&pl=24&initcwndbps=2197500&vprv=1&mime=video%2Fwebm&gir=yes&clen=22977258&dur=269.936&lmt=1633754162081500&mt=1640641755&fvip=1&keepalive=yes&fexp=24001373%2C24007246&c=ANDROID&txp=5432432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIgRm--Fa41XsrASESoNQtsn8jSxIL7DDJKdRcGIyUfZzMCIQCp97i7pXYLXDQXUZmiHdpfGFshNicqGms0uAitG79okw%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIhALjyelM9Kd_bMbIAJUbz2fRY1WG3g8zMesVb_0qwqDPYAiBWUFcixRx_39v1UejcGzNhaJ5iCEr6fOZSLmemLefZJw%3D%3D" [download] Destination: 初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f248.webm
[download] 100% of 21.91MiB in 00:01
[debug] Invoking downloader on "https://rr1---sn-ab5l6nzr.googlevideo.com/videoplayback?expire=1640663540&ei=lDXKYYiHAsn08gSBp6_4DQ&ip=8.21.13.13&id=o-ANbRmKnHcZ7DfK94SY1-xperSS45ZjXwO0yMjCbqzJCu&itag=251&source=youtube&requiressl=yes&mh=6m&mm=31%2C26&mn=sn-ab5l6nzr%2Csn-vgqsrne6&ms=au%2Conr&mv=m&mvi=1&pl=24&initcwndbps=2197500&vprv=1&mime=audio%2Fwebm&gir=yes&clen=4592164&dur=269.961&lmt=1633753662243778&mt=1640641755&fvip=1&keepalive=yes&fexp=24001373%2C24007246&c=ANDROID&txp=5431432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRAIgJItgGy-KvQYfjZphtgR7sE6qSX6hRTVHJFT5kEXAcR8CIAuN2MEypD-sDaJyRz5_cTB2Z8ekCqsq4ej6-AMARs_7&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIhALjyelM9Kd_bMbIAJUbz2fRY1WG3g8zMesVb_0qwqDPYAiBWUFcixRx_39v1UejcGzNhaJ5iCEr6fOZSLmemLefZJw%3D%3D"
[download] Destination: 初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f251.webm
[download] 100% of 4.38MiB in 00:00
[Merger] Merging formats into "初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].webm"
[debug] ffmpeg command line: ffmpeg -y -loglevel "repeat+info" -i "file:初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f248.webm" -i "file:初動画はこ
のくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f251.webm" -c copy -map "0:v:0" -map "1:a:0" -movflags "+faststart" "file:初動画はこのくらいシンプルなほうがいい
んじゃないかと思って [wJF-Lr-vNV0].temp.webm"
Deleting original file 初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f248.webm (pass -k to keep)
Deleting original file 初動画はこのくらいシンプルなほうがいいんじゃないかと思って [wJF-Lr-vNV0].f251.webm (pass -k to keep)
@Galgamins Galgamins added bug Bug that is not site-specific triage Untriaged issue labels Dec 27, 2021
@pukkandan pukkandan added enhancement New feature or request and removed bug Bug that is not site-specific triage Untriaged issue labels Dec 27, 2021
@pukkandan
Copy link
Member

This is correct encoding for JSON files. https://docs.python.org/3/library/json.html#character-encodings

That said, I think it makes sense for us to write the characters unencoded

@Galgamins
Copy link
Author

Interesting, I never knew that!

Ashish0804 added a commit to Ashish0804/yt-dlp that referenced this issue Jan 1, 2022
commit 8efffaf
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Sat Jan 1 13:12:33 2022 +0700

    [XVideos] Check HLS formats (yt-dlp#2193)

    Closes yt-dlp#1823
    Authored by; MinePlayersPE

commit 26f2aa3
Author: Ashish Gupta <39122144+Ashish0804@users.noreply.github.com>
Date:   Sat Jan 1 02:32:23 2022 +0530

    [hotstar] Add extractor args to ignore tags (yt-dlp#2116)

    Authored by: Ashish0804

commit 3464a27
Author: pgaig <87302379+pgaig@users.noreply.github.com>
Date:   Fri Dec 31 21:58:23 2021 +0100

    [VrtNU] Handle empty title (yt-dlp#2147)

    Closes yt-dlp#2146
    Authored by: pgaig

commit 497d77e
Author: Ashish Gupta <Ashish08@protonmail.com>
Date:   Fri Dec 31 10:41:42 2021 +0530

    [KelbyOne] Add extractor (yt-dlp#2181)

    Closes yt-dlp#2170
    Authored by: Ashish0804

commit 9040e2d
Author: LE <llacb47@users.noreply.github.com>
Date:   Fri Dec 31 15:11:35 2021 -0500

    [mixcloud] Detect restrictions (yt-dlp#2169)

    Authored by; llacb47

commit 6134fbe
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Sat Jan 1 03:10:46 2022 +0700

    [TikTok] Pass cookies to formats (yt-dlp#2171)

    Closes yt-dlp#2166
    Authored by: MinePlayersPE

commit cfcf60e
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Sat Jan 1 03:09:30 2022 +0700

    [BiliIntl] Add login (yt-dlp#2172)

    and misc improvements

    Authored by: MinePlayersPE

commit 4afa3ec
Author: Felix S <felix.von.s@posteo.de>
Date:   Fri Dec 31 20:06:45 2021 +0000

    [extractor] Detect more subtitle codecs in MPD manifests (yt-dlp#2174)

    Authored by: fstirlitz

commit 11aa91a
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Thu Dec 30 11:20:17 2021 +0700

    [TikTok] Fix extraction for sigi-based webpages (yt-dlp#2164)

    Fixes: yt-dlp#2133
    Authored by: MinePlayersPE

commit abbeeeb
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Thu Dec 30 08:43:40 2021 +0530

    [outtmpl] Alternate form for `D` and fix suffix's case

    Fixes: yt-dlp#2085 (comment), https://github.com/yt-dlp/yt-dlp/pull/2132/files#r775729811

commit 2c539d4
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Thu Dec 30 08:15:48 2021 +0530

    [cookies] Fix bug when keyring is unspecified

    Closes yt-dlp#2167

commit 042931a
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Thu Dec 30 08:15:07 2021 +0530

    Allow escaped `,` in `--extractor-args`

    Closes yt-dlp#2152

commit 96f13f0
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Thu Dec 30 05:00:44 2021 +0700

    [TikTok] Change app version (yt-dlp#2161)

    Closes yt-dlp#2133, yt-dlp#2135
    Authored by: MinePlayersPE, llacb47

commit 4b93532
Author: u-spec-png <54671367+u-spec-png@users.noreply.github.com>
Date:   Tue Dec 28 20:42:14 2021 +0000

    [Drooble] Add extractor (yt-dlp#1547)

    Closes yt-dlp#1527
    Authored by: u-spec-png

commit dd5e60b
Author: u-spec-png <54671367+u-spec-png@users.noreply.github.com>
Date:   Tue Dec 28 18:58:06 2021 +0000

    [Instagram] Add story/highlight extractor (yt-dlp#2006)

    Fixes ytdl-org/youtube-dl#25575
    Authored by: u-spec-png

commit e540c56
Author: MinePlayersPE <mineplayerspealt@gmail.com>
Date:   Tue Dec 28 09:38:23 2021 +0700

    [TikTok] Fallback to feed API endpoint (yt-dlp#2142)

    Authored by: MinePlayersPE
    Workaround for yt-dlp#2133

commit 45d86ab
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Tue Dec 28 04:21:13 2021 +0530

    Allow unicode characters in `info.json`

    Closes yt-dlp#2139

commit f02d24d
Author: Pierre Mdawar <pierre@mdawar.dev>
Date:   Tue Dec 28 03:38:31 2021 +0530

    [utils] Fix `format_bytes` output for Bytes (yt-dlp#2132)

    Authored by: pukkandan, mdawar

commit ceb9832
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Tue Dec 28 02:52:11 2021 +0530

    Don't treat empty containers as `None` in `sanitize_info`

commit 7537e35
Author: pukkandan <pukkandan.ytdlp@gmail.com>
Date:   Tue Dec 28 02:49:02 2021 +0530

    [gfycat] Fix `uploader`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants