Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
YouTube pagination limit and metadata churn? #22650
Comments
|
Adding |
|
So… no answer as to the first part, regarding downloading unnecessary playlist pages? In my archival case, the first page is truly the only one that needs to be requested. 10-100 (averaging 24) extra pages, times 157 channels… a little shy of 4,000 extra HTTP requests, on each pass through, plus all the comparisons against the archive of IDs for videos guaranteed to be there. Adds up in time, and request limits. |
|
Ah, #3794 from 2014, which does replicate the title of this request, has nothing to do with actual limitation on the number of pages being requested, and more to do with a bug regarding a seeming upper bound on the number of videos collected in total. (A "limitation" in "YouTube channel pagination", not "channel pagination limit". ;) |
|
Checklist
Question
Is it possible to limit the scope of the
youtubebackend's paged search for new videos?The sheer number of paged requests is quite substantial in comparison to the number of new videos discovered on each run (1-4), which are always present on the first page. I ask not because an actual problem is being exhibited because of this (though I do have rate limit concerns), but because it's actually spending more time fetching pages than fetching video content.
If not, could this be added?
--max-pagesor similar? When archiving a still-living YouTube channel, I'd like to keep the amount of churn to a minimum. (Why pull in 14 pages, when 1 will do? ;)Question
Is it normal to spend large amounts of time re-writing metadata, thumbnails, and subtitles on already-downloaded videos?
I've noted that all discovered videos are re-written on-disk to re-apply metadata and subtitles, it seems, even if they already have metadata and subtitles present. Orders of magnitude more time is spent doing rewrites of already tagged media than both paged and media fetching combined. You can see this for yourself by running the example invocation below over any channel or playlist with more than one page.
I am re-testing with the
--download-archiveoption, to see if this alters the rewriting behavior in any way. (Maybe actual tracking is needed, as it isn't detecting metadata presence?)Example Invocation
I'm using the following invocation for the purpose of local archiving:
youtube-dl --no-call-home --ignore-errors --restrict-filenames \ --no-mark-watched --yes-playlist \ --continue --no-overwrites \ --write-description --write-info-json --write-thumbnail --write-sub \ --add-metadata --embed-thumbnail --embed-subs \ --merge-output-format mp4 --sub-format best --youtube-skip-dash-manifest \ --format 137+140/bestvideo[ext=mp4]+bestaudio[ext=m4a] \ -o "%(playlist)s/%(upload_date)s--%(id)s--%(title)s--%(resolution)s.%(ext)s" \ $@Example Log