-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to determine extractors ahead of time #30081
base: master
Are you sure you want to change the base?
Conversation
Sometimes I just want to know whether youtube-dl can be expected to handle a given URL. This option accomplishes that quickly. Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning. The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds. You can use it, for example, to handle arbitrary URLs intelligently: If --determine-extractor indicates success, run youtube-dl (or mpv or whatever). Otherwise, run $BROWSER. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to tweak YoutubeDL.extract_info
to recognise your proposed option. Then you can be sure that the same IE selection algorithm is being used, and avoid extra code that might get out of sync. Eg:
+ determine_extractors = self.params.get('determine_extractors')
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
for ie in ies:
if not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
if not ie.working():
+ if determine_extractors:
+ continue
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
+ if determine_extractors:
+ self.to_stdout('%s %s\n' % (ie.IE_NAME, url))
return
return self.__extract_info(url, ie, download, extra_info, process)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
The option could be named |
Thank you for the helpful feedback. I will look into your suggestions and resubmit. However, I am a little confused by your idea of falling back to the generic extractor with |
That's just the existing logic in the method, and I didn't think worth substituting. If you ask whether the URL can be handled by yt-dl and force the generic extractor, the answer is yt-dl can handle the URL using the generic extractor ("if you say so"), but I don't expect that anyone would ever ask that. |
Something to consider when trying to implement such a feature: https://github.com/ytdl-org/youtube-dl#how-can-i-detect-whether-a-given-url-is-supported-by-youtube-dl |
As usual, a very good point. So the |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Sometimes I just want to know whether youtube-dl can be expected to handle a given URL. This option accomplishes that quickly. Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning. The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds. You can use it, for example, to handle arbitrary URLs intelligently: If --determine-extractors indicates success, run youtube-dl (or mpv or whatever). Otherwise, run $BROWSER.
Thank you.