Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Youtube title for video metadata instead of the Content ID track title #241

Closed
anonamouslyginger opened this issue May 3, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@anonamouslyginger
Copy link
Contributor

tldr;

Note that this is a known issue on the yt-dlp repository. See yt-dlp/yt-dlp#3146.

The suggested solution is to add --parse-metadata "title:%(meta_title)s" to the yt-dlp command.

Issue Explanation

When downloading a YouTube video containing music which has been detected and linked by YouTube (see the description of the video) the downloaded metadata for that video contains the metadata of the music video instead of the archived video.

This issue is only visible when the "metadata embed" configuration option in the UI is set to True.
The UI and Elasticsearch data archived is correct for these videos however when importing these videos into another player (e.g. Plex), Plex displays the incorrect metadata which was embedded into the video.

Issue Example

Archived/Downloaded YouTube Video: Animation Supercut - Neebs Gaming
Contains Music Video: Call to Adventure - Comedy

As seen below, Movie name, Album and Performer are incorrect.

$ mediainfo "/raid-pool/yt/Neebs Gaming/20220325_NbBoN7J4n48_Animation Supercut - Neebs Gaming.mp4"
General
Complete name                            : /raid-pool/yt/Neebs Gaming/20220325_NbBoN7J4n48_Animation Supercut - Neebs Gaming.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 292 MiB
Duration                                 : 57 min 45 s
Overall bit rate                         : 706 kb/s
Movie name                               : Call to Adventure - Comedy
Album                                    : Call to Adventure - Comedy
Performer                                : Kevin Macleod
Description                              : Neebs Gaming Animation Supercut.  / ► SUBSCRIBE -------  http://bit.ly/1NOKqlU /  / ►SUPPORT us on PATREON / https://www.patreon.com/neebsgaming /  / ► Neebs Gaming is powered by Xidax PCs, check them out here! /      http://mbsy.co/gFZJH /  / -----------Follow us on Social Media-------------------------- / ► https://www.Facebook.com/NeebsGaming / ► https://Twitter.com/Neebsofficial / ► http://www.neebsgaming.net / ------------------------------------------------------------------------------- / Looking for more videos from NEEBS?  Check out these playlist! / CARTOONS ----- https://bit.ly/2RASc99 / 7Days to Die ----- https://bit.ly/32Hb9gV / Scrap Mechanic ----- https://bit.ly/2RDhdQZ / Minecraft ----- https://bit.ly/35R51ol / GTA V ----- https://bit.ly/33CxwmP / Subnautica ----- https://bit.ly/2E9g0xJ
Recorded date                            : 20220325
Writing application                      : Lavf59.22.100
Comment                                  : https://www.youtube.com/watch?v=NbBoN7J4n48
LongDescription                          : Neebs Gaming Animation Supercut.  / ► SUBSCRIBE -------  http://bit.ly/1NOKqlU /  / ►SUPPORT us on PATREON / https://www.patreon.com/neebsgaming /  / ► Neebs Gaming is powered by Xidax PCs, check them out here! /      http://mbsy.co/gFZJH /  / -----------Follow us on Social Media-------------------------- / ► https://www.Facebook.com/NeebsGaming / ► https://Twitter.com/Neebsofficial / ► http://www.neebsgaming.net / ------------------------------------------------------------------------------- / Looking for more videos from NEEBS?  Check out these playlist! / CARTOONS ----- https://bit.ly/2RASc99 / 7Days to Die ----- https://bit.ly/32Hb9gV / Scrap Mechanic ----- https://bit.ly/2RDhdQZ / Minecraft ----- https://bit.ly/35R51ol / GTA V ----- https://bit.ly/33CxwmP / Subnautica ----- https://bit.ly/2E9g0xJ
...

Desired Behavior

Given that I am downloading the target video which contains the music for only a small portion of the video, I would like the metadata to reflect the downloaded video. However, I can predict that some other users would like the opposite behavior. I would argue that as this tool is archiving the target video, the metadata should reflect that.

If a change to the default behavior is not acceptable, then a boolean configuration option, or a free form 'extra yt-dlp' configuration option would be fine.

If no change is desired at all, then a warning in the UI would be great.

@anonamouslyginger
Copy link
Contributor Author

I have found an additional issue yt-dlp/yt-dlp#904 which describes the issue better than I have.

The suggested solution is to add the following arguments for the CLI yt-dlp:

--parse-metadata "%(title)s:%(meta_title)s"
--parse-metadata "%(uploader)s:%(meta_artist)s"

Which based on my reading is equal to adding the following here:

            postprocessors.append(
                {
                    "key": "MetadataFromField",
                    "formats": [
                        "%(title)s:%(meta_title)s",
                        "%(uploader)s:%(meta_artist)s",
                    ],
                    'when': 'pre_process',
                }
            )

I have tested this locally with success and I can create an MR if you agree with the overrides.

I however do not yet have a solution for the Content ID Album appearing. Something like --parse-metadata "%(playlist_title|Unknown)s:%(meta_album)s" might work. I'm still testing that locally.

@anonamouslyginger anonamouslyginger changed the title Wrong YouTube video embedded metadata fields Use Youtube title for video metadata instead of the Content ID track title May 5, 2022
@anonamouslyginger
Copy link
Contributor Author

Success with help from yt-dlp/yt-dlp#1350 (comment)! To remove the Album which is related to the Content ID media (and causes auto Collection creation in Plex) you need one more parse-metadata option:

--parse-metadata ":(?P<album>)"

or all together:

            postprocessors.append(
                {
                    "key": "MetadataFromField",
                    "formats": [
                        "%(title)s:%(meta_title)s",
                        "%(uploader)s:%(meta_artist)s",
                        ":(?P<album>)",
                    ],
                    'when': 'pre_process',
                }
            )

As before, I have tested this locally with success and I can create an MR if you agree with the overrides.

@bbilly1
Copy link
Member

bbilly1 commented May 12, 2022

Sorry, for some reason, your issue only just now showed up in the feed. No idea, did GitHub have a hiccup?

I think I understand the problem. How does that effect videos without music ids? Is that going to change anything for that?

@bbilly1 bbilly1 added the question Further information is requested label May 12, 2022
@anonamouslyginger
Copy link
Contributor Author

Apologies @bbilly1 my account was incorrectly automatically flagged for being 'spammy' when I raised this issue, so you only saw this when it was unblocked by an GitHub admin.
I have been automatically flagged again, so you won't see this response until I am unblocked again.

I took some random YouTube videos:

https://www.youtube.com/watch?v=wP9TzdeLHvc
https://www.youtube.com/watch?v=69DZ8Dsm2vA
https://www.youtube.com/watch?v=TyahPGDr4-E
https://www.youtube.com/watch?v=0Kl8NjfrZ8I
https://www.youtube.com/watch?v=Vr2NXHf3P3A
https://www.youtube.com/watch?v=KB7VLV_w9Zc
https://www.youtube.com/watch?v=XeyI4y7zn7k
https://www.youtube.com/watch?v=wvfR3XLXPvw
https://www.youtube.com/watch?v=DAqGcy4kmtQ
https://www.youtube.com/watch?v=oCNkJEAMkXM
https://www.youtube.com/watch?v=-ZmPjb5-u6Y
https://www.youtube.com/watch?v=ThNrHfWKQvs
https://www.youtube.com/watch?v=vWAC8Wkt9ok
https://www.youtube.com/watch?v=Wqi0NtRLQxY
https://www.youtube.com/watch?v=hQA8kfY88fc
https://www.youtube.com/watch?v=28JJ4FmVPkA
https://www.youtube.com/watch?v=-YrK6IY-4-o
https://www.youtube.com/watch?v=7yYacZ9BXWk

and I downloaded them with the current latest docker container (bbilly1/tubearchivist:latest 9c839a0562f5) resulting in 15 (unsure where 3 went) videos downloaded:

'youtube/Bachelor Nation/20201028_0Kl8NjfrZ8I_Yosefs Outburst At Clare Over Red Flags The Bachelorette.mp4'
'youtube/CL Official Channel/20201029_wP9TzdeLHvc_CL HA Dance Performance Video.mp4'
'youtube/ESPN MMA/20201024_XeyI4y7zn7k_Georges St-Pierre reacts to Khabib Nurmagomedovs retirement at UFC 254 Ariel Helwanis MMA Show.mp4'
'youtube/GrantTheGoat/20201028_-YrK6IY-4-o_I Quit Fortnite to do this....mp4'
'youtube/JustJordan33/20201026_KB7VLV_w9Zc_Bringing Home My New Puppy Puppy Reveal.mp4'
'youtube/PBS Eons/20201028_wvfR3XLXPvw_Why Do Things Keep Evolving Into Crabs.mp4'
'youtube/STUDIO CHOOM /20201027_vWAC8Wkt9ok_BE ORIGINAL TWICE() I CANT STOP ME (4K).mp4'
'youtube/TrippieReddVEVO/20201030_hQA8kfY88fc_Trippie Redd - Weeeeee (Lyric Video).mp4'
'youtube/TWICE/20201027_-ZmPjb5-u6Y_TWICE Special Live Replay I CANT STOP ME.mp4'
'youtube/Ubisoft North America/20201026_7yYacZ9BXWk_Rainbow Six Siege Sugar Fright Event Trailer Ubisoft NA.mp4'
'youtube/UCDVYQ4Zhbm3S2dlz7P1GBDg/20201025_oCNkJEAMkXM_Buccaneers vs. Raiders Week 7 Highlights NFL 2020.mp4'
'youtube/UCYBetkejPxK8t_41zueazcA/20201026_TyahPGDr4-E_15 Among Us SECRETS You MISSED.mp4'
'youtube/Vanity Fair/20201024_69DZ8Dsm2vA_Billie Eilish Same Interview The Fourth Year (Teaser 1) Vanity Fair.mp4'
'youtube/Xbox/20201026_Wqi0NtRLQxY_Xbox Series XS Official Next-Gen Walkthrough Full Demo 4K.mp4'
'youtube/Yoatzi/20201027_Vr2NXHf3P3A_THE NEWS I WANTED TO SHARE... Yoatzi.mp4'

Then (in a new cloned repo) I changed the docker-compose.yml to build a local image with the following Dockerfile:

FROM bbilly1/tubearchivist:latest
COPY yt_dlp_handler_custom.py /app/home/src/download/yt_dlp_handler.py

Where yt_dlp_handler_custom.py is created by running the following on the running container (to ensure the file is otherwise the same):

docker cp tubearchivist:/app/home/src/download/yt_dlp_handler.py ./yt_dlp_handler_custom.py

then adding in the lines:

            postprocessors.append(
                {
                    "key": "MetadataFromField",
                    "formats": [
                        "%(title)s:%(meta_title)s",
                        "%(uploader)s:%(meta_artist)s",
                        ":(?P<album>)",
                    ],
                    'when': 'pre_process',
                }
            )

Once again I downloaded the list of YouTube videos above resulting in 15 (I guess it's consistently missing 3) videos downloaded.

Then I ran the following in both cloned repos:

for f in ./youtube/*/*.mp4; do mediainfo "$f" > "$f.mediainfo"; done

And ran a diff:

$ diff before/youtube after/youtube/ -r

Binary files before/youtube/CL Official Channel/20201029_wP9TzdeLHvc_CL HA Dance Performance Video.mp4 and after/youtube/CL Official Channel/20201029_wP9TzdeLHvc_CL HA Dance Performance Video.mp4 differ
diff -r "before/youtube/CL Official Channel/20201029_wP9TzdeLHvc_CL HA Dance Performance Video.mp4.mediainfo" "after/youtube/CL Official Channel/20201029_wP9TzdeLHvc_CL HA Dance Performance Video.mp4.mediainfo"
10,12c10,11
< Movie name                               : +H₩A+
< Album                                    : +H₩A+
< Performer                                : CL
---
> Movie name                               : CL +H₩A Dance Performance Video+
> Performer                                : CL Official Channel
Binary files before/youtube/TrippieReddVEVO/20201030_hQA8kfY88fc_Trippie Redd - Weeeeee (Lyric Video).mp4 and after/youtube/TrippieReddVEVO/20201030_hQA8kfY88fc_Trippie Redd - Weeeeee (Lyric Video).mp4 differ
diff -r "before/youtube/TrippieReddVEVO/20201030_hQA8kfY88fc_Trippie Redd - Weeeeee (Lyric Video).mp4.mediainfo" "after/youtube/TrippieReddVEVO/20201030_hQA8kfY88fc_Trippie Redd - Weeeeee (Lyric Video).mp4.mediainfo"
10,12c10,11
< Movie name                               : Weeeeee
< Album                                    : Weeeeee
< Performer                                : Trippie Redd
---
> Movie name                               : Trippie Redd - Weeeeee (Lyric Video)
> Performer                                : TrippieReddVEVO

tldr;

Of the 18 YouTube links:

@anonamouslyginger
Copy link
Contributor Author

Ah, this issue was marked as “spammy” so each time I commented I was blocked. It should be visible again @bbilly1

@bbilly1
Copy link
Member

bbilly1 commented May 19, 2022

Thank you for looking into this. Sounds good, please make a PR with your changes.

resulting in 15 (unsure where 3 went) videos downloaded

Tube Archivist will skip videos if yt-dlp can't read it, from your list that would be:

That's why you only see the 15 out of 18 in your queue.

anonamouslyginger added a commit to anonamouslyginger/tubearchivist that referenced this issue May 21, 2022
@bbilly1 bbilly1 added enhancement New feature or request and removed question Further information is requested labels Jun 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants