Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is there a way to ignore YT shorts? #262

Closed
lue30499 opened this issue Jun 18, 2022 · 4 comments
Closed

Question: Is there a way to ignore YT shorts? #262

lue30499 opened this issue Jun 18, 2022 · 4 comments
Labels
duplicate This issue or pull request already exists

Comments

@lue30499
Copy link

As per the title.

@PhuriousGeorge
Copy link
Contributor

PhuriousGeorge commented Jun 18, 2022

Adding the filter --match-filter 'original_url!*=/shorts/' is supposed to ignore shorts in base yt-dlp. I don't have a test environment setup, but don't think adding to the default download format entry would work.

EDIT: Appears this may or may not work... yt-dlp/yt-dlp#3165

@bbilly1
Copy link
Member

bbilly1 commented Jun 18, 2022

It's tricky. Checkout out our previous discussion on the topic: #163, basically ytdlp doesn't detect that. The only feasible approach would be ignore by keyword, that will catch some if not most of the shorts. Its on the roadmap with:

Auto ignore videos by keyword

@bbilly1 bbilly1 added the duplicate This issue or pull request already exists label Jun 18, 2022
@PhuriousGeorge
Copy link
Contributor

The only feasible approach would be ignore by keyword

The only issue with key_words_ only is if the word "short" or "#short" aren't used as tags, title or description. I'm hoping yt-dlp comes up with something

Looking further into the issue linked from yt-dlp, appears they added checks for shorts to default to the /short/ url yt-dlp/yt-dlp#3168, so the --match-filter 'original_url!*=/shorts/' filter should work?

@bbilly1
Copy link
Member

bbilly1 commented Jun 20, 2022

Thanks for linking that. Looks like that was a recent change in yt-dlp. I've tested it with some videos, looks like the original_url is always set to the shorts link if you get a list of videos, for example:

yt-dlp 'https://www.youtube.com/c/RealEngineering' --playlist-end 2 --dump-json

This returns two videos with original_url:

"original_url": "https://www.youtube.com/watch?v=8Oi8ZO-2Kvc"
...
"original_url": "https://www.youtube.com/shorts/br1p4fcqa4s"

But this doesn't work if you request the page with watch?v=, e.g. this:

yt-dlp 'https://www.youtube.com/watch?v=br1p4fcqa4s' --dump-json

will return

"original_url": "https://www.youtube.com/watch?v=br1p4fcqa4s"

even though it is a shorts video. So yt-dlp can't detect it in this case. When Tube Archivist extracts channel videos for example, this extracts only ids first, then as a second step it that video needs to be added to the queue, will Tube Archivist download the full metadata. This significantly improves the rescan speed and reduces the requests to YouTube, so that's not something we should change just to be able to extract shorts.

TLDR: As long as yt-dlp can't detect shorts in the watch?v= url, the only approach I see would be keyword based.

@bbilly1 bbilly1 closed this as completed Jul 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants