Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Metadata for youtube music upload #9913

Open
9 of 11 tasks
Trickfilm400 opened this issue May 13, 2024 · 2 comments
Open
9 of 11 tasks

Incorrect Metadata for youtube music upload #9913

Trickfilm400 opened this issue May 13, 2024 · 2 comments
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@Trickfilm400
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

Germany

Provide a description that is worded well enough to be understood

Issue Description

Example URL: https://www.youtube.com/watch?v=NUocE858BEc

If you download this song, yt-dl only lists "Papi Pepe" as an artist, but if you check other sources or even the video description, "le Shuuk" is an artist as well, but is missing in any output (the json file, the file metadata, the file name -- everywhere only one artist is written)

In the Example Output, the File Name should have both artists listed, as with the other video

How it may be fixed

I don't know where the metadata information is gathered from, but it may be that the video description parsing is incorrect, because it says

Artist, Producer: le Shuuk
Artist: Papi Pepe

It may be confused because of the "Producer"? (Maybe it can't be, because with another video https://youtu.be/UcCmJba6ERA, there is no Artist at all, so it seems it cannot be the issue)

Proposed solution

The remediation / expectation of this issue would be a fix which extracts all artists of this song (so both, "le Shuuk" as well as "Papi Pepe" and the information should be saved in the JSON file (--write-info-json and the file metadata, if enabled as well as the file name, if using the template output)

Feel free to ask further questions

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--output', '%(artist)s %(album)s %(release_year)s', 'https://www.youtube.com/watch?v=NUocE858BEc']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version master@2024.05.12.230604 from yt-dlp/yt-dlp-master-builds [85ec2a337] (zip)
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.8.0-76060800daily20240311-generic-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: certifi-2020.06.20, requests-2.25.1, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-1.26.5
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1803 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-master-builds/releases/latest
Latest version: master@2024.05.12.230604 from yt-dlp/yt-dlp-master-builds
yt-dlp is up to date (master@2024.05.12.230604 from yt-dlp/yt-dlp-master-builds)
[youtube] Extracting URL: https://www.youtube.com/watch?v=NUocE858BEc
[youtube] NUocE858BEc: Downloading webpage
[youtube] NUocE858BEc: Downloading ios player API JSON
[youtube] NUocE858BEc: Downloading android player API JSON
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "NUocE858BEc")
[debug] [youtube] Extracting signature function js_17fd9675_109
[debug] Loading youtube-sigfuncs.js_17fd9675_109 from cache
[debug] Loading youtube-nsig.17fd9675 from cache
[debug] [youtube] Decrypted nsig SCjGFsJITxOQo5 => wfappVn6fSKj9g
[debug] Loading youtube-nsig.17fd9675 from cache
[debug] [youtube] Decrypted nsig 6ZRXz1tLbs1LiY => dM-d_WL0a3h-kQ
[youtube] NUocE858BEc: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] NUocE858BEc: Downloading 1 format(s): 248+251
[debug] Invoking http downloader on "https://rr4---sn-4g5lzney.googlevideo.com/videoplayback?expire=1715598881&ei=waFBZqCBA6jRi9oPoPiu2AM&ip=2a01%3A599%3A40b%3A5826%3A9823%3A3434%3Ab678%3A2340&id=o-ACZyO-dor4ZaQGx7zUfB9AnOpCjdnve_4dXTLs5yMVwI&itag=248&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=rX&mm=31%2C29&mn=sn-4g5lzney%2Csn-4g5ednds&ms=au%2Crdu&mv=m&mvi=4&pl=39&gcr=de&initcwndbps=1897500&vprv=1&svpuc=1&mime=video%2Fwebm&rqh=1&gir=yes&clen=2551381&dur=129.440&lmt=1629302040573525&mt=1715576922&fvip=1&keepalive=yes&c=IOS&txp=2316222&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cgcr%2Cvprv%2Csvpuc%2Cmime%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRAIgBhjwBufutfyFBWLd6x6yth8xdnIPy6SAneVBlqC3A9ICIDlio7wX16sYtz-foEn8gCA9IjO2Xr9XaEXitO-abi8m&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AHWaYeowRAIgP9s4nN31KKqx7P8C02yY5-x4o1qMIDsFnAX6RVrh6g4CIBDLdV0z-99_yhwWmbdRHSw_A0yvXKSRaPzW2_TYmnWo"
[download] Destination: Papi Pepe Macarena NA.f248.webm
[download] 100% of    2.43MiB in 00:00:00 at 3.04MiB/s
[debug] Invoking http downloader on "https://rr1---sn-4g5ednds.googlevideo.com/videoplayback?expire=1715598880&ei=wKFBZvrZBuLi6dsPpLqm8Ac&ip=2a01%3A599%3A40b%3A5826%3A9823%3A3434%3Ab678%3A2340&id=o-AHCUW7182ABC53AZVFVUd0FDFXuXVWfcBEadN_xHb2-1&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=rX&mm=31%2C26&mn=sn-4g5ednds%2Csn-5hnednss&ms=au%2Conr&mv=m&mvi=1&pl=39&gcr=de&initcwndbps=1897500&bui=AWRWj2RDlyvwy0ANcgW0DQrBMbgLAfSgsTiyqqS1gJKAaimw0GzPl9WuBGLI88Yn8033V34sHVWBjgnt&spc=UWF9f7YsPqr9CoA9_GwxAIFrkjc6CIrsUH4e44xREEzzmfAbSlzttq8&vprv=1&svpuc=1&mime=audio%2Fwebm&ns=kAFY4z3bvswxd6-oNx2dZGkQ&rqh=1&gir=yes&clen=2240909&dur=129.501&lmt=1714612367744390&mt=1715576922&fvip=5&keepalive=yes&c=WEB&sefc=1&txp=2318224&n=dM-d_WL0a3h-kQ&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cgcr%2Cbui%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AHWaYeowRQIgNC4FARtlH66nIE0yMVZ48ukj3CU6cEjUFVeFJ8mv-6MCIQC7f772EeNlYHzd0wKWRJIn5CjQIr48ri4lINE9BaFELg%3D%3D&sig=AJfQdSswRQIgYfBesSaAVly2A69wYdOq7aZIuQ8rxc7fp4JURo9IpfICIQC5xICMNc_Hy_rcbPan1WUi4m2TkCvJqWz_zJsedXNJQg%3D%3D"
[download] Destination: Papi Pepe Macarena NA.f251.webm
[download] 100% of    2.14MiB in 00:00:00 at 3.70MiB/s
[Merger] Merging formats into "Papi Pepe Macarena NA.webm"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:Papi Pepe Macarena NA.f248.webm' -i 'file:Papi Pepe Macarena NA.f251.webm' -c copy -map 0:v:0 -map 1:a:0 -movflags +faststart 'file:Papi Pepe Macarena NA.temp.webm'
Deleting original file Papi Pepe Macarena NA.f248.webm (pass -k to keep)
Deleting original file Papi Pepe Macarena NA.f251.webm (pass -k to keep)
@Trickfilm400 Trickfilm400 added site-bug Issue with a specific website triage Untriaged issue labels May 13, 2024
@bashonly
Copy link
Member

bashonly commented May 13, 2024

The description parsing code currently only looks for one "Artist:" line, and if it doesn't find one, it falls back to looking for multiple artists following the track name separated by · (which is why your 2nd link has both of its artists extracted despite its description not having any "Artist:" lines).

Something like the patch below might work? (It does work for this particular case.) Someone who's more familiar with Youtube descriptions/metadata should weigh in on this though.

diff --git a/yt_dlp/extractor/youtube.py b/yt_dlp/extractor/youtube.py
index a5fe179c2..643f9dd41 100644
--- a/yt_dlp/extractor/youtube.py
+++ b/yt_dlp/extractor/youtube.py
@@ -4390,9 +4390,7 @@ def process_language(container, base_url, lang_code, sub_name, query):
                     (?=(?P<album>[^\n]+))(?P=album)\n
                     (?:.+?℗\s*(?P<release_year>\d{4})(?!\d))?
                     (?:.+?Released on\s*:\s*(?P<release_date>\d{4}-\d{2}-\d{2}))?
-                    (.+?\nArtist\s*:\s*
-                        (?=(?P<clean_artist>[^\n]+))(?P=clean_artist)\n
-                    )?.+\nAuto-generated\ by\ YouTube\.\s*$
+                    .+\nAuto-generated\ by\ YouTube\.\s*$
                 ''', video_description)
             if mobj:
                 release_year = mobj.group('release_year')
@@ -4403,8 +4401,8 @@ def process_language(container, base_url, lang_code, sub_name, query):
                         release_year = release_date[:4]
                 info.update({
                     'album': mobj.group('album'.strip()),
-                    'artists': ([a] if (a := mobj.group('clean_artist'))
-                                else [a.strip() for a in mobj.group('artist').split('·')]),
+                    'artists': (re.findall(r'\nArtist[^:]*:\s*([^\n]+)', mobj.group(0))
+                                or [a.strip() for a in mobj.group('artist').split('·')]),
                     'track': mobj.group('track').strip(),
                     'release_date': release_date,
                     'release_year': int_or_none(release_year),

@bashonly bashonly removed the triage Untriaged issue label May 13, 2024
@bashonly bashonly added the patch-available There is patch available that should fix this issue. Someone needs to make a PR with it label Jul 30, 2024
@bashonly
Copy link
Member

bashonly commented Aug 8, 2024

Related: #10247, #10576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

No branches or pull requests

2 participants