Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to determine extractors ahead of time #30081

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NoSuck
Copy link

@NoSuck NoSuck commented Oct 11, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Sometimes I just want to know whether youtube-dl can be expected to handle a given URL. This option accomplishes that quickly. Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning. The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds. You can use it, for example, to handle arbitrary URLs intelligently: If --determine-extractors indicates success, run youtube-dl (or mpv or whatever). Otherwise, run $BROWSER.

Thank you.

Sometimes I just want to know whether youtube-dl can be expected to handle a given URL.  This option accomplishes that quickly.  Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning.  The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds.  You can use it, for example, to handle arbitrary URLs intelligently:  If --determine-extractor indicates success, run youtube-dl (or mpv or whatever).  Otherwise, run $BROWSER.

Thank you.
Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to tweak YoutubeDL.extract_info to recognise your proposed option. Then you can be sure that the same IE selection algorithm is being used, and avoid extra code that might get out of sync. Eg:

+        determine_extractors = self.params.get('determine_extractors')
         if not ie_key and force_generic_extractor:
             ie_key = 'Generic'
 
         if ie_key:
             ies = [self.get_info_extractor(ie_key)]
         else:
             ies = self._ies
 
         for ie in ies:
             if not ie.suitable(url):
                 continue
 
             ie = self.get_info_extractor(ie.ie_key())
             if not ie.working():
+                if determine_extractors:
+                    continue
                 self.report_warning('The program functionality for this site has been marked as broken, '
                                     'and will probably not work.')
 
+            if determine_extractors:
+                self.to_stdout('%s %s\n' % (ie.IE_NAME, url))
                 return
             return self.__extract_info(url, ie, download, extra_info, process)
         else:
             self.report_error('no suitable InfoExtractor for URL %s' % url)

@dirkf
Copy link
Contributor

dirkf commented Oct 11, 2021

The option could be named --get-extractor rather than --determine-extractor, which would align with the other simulation options, as long as the result matches the other similarly named options: they all work like --get-filename --output '%(key_name)s' for whichever key_name in --get-key_name. So if this option behaves like --get-filename --output '%(extractor)s' (I guess), it could be renamed; the output would have to be just the extractor name.

@NoSuck
Copy link
Author

NoSuck commented Oct 11, 2021

Thank you for the helpful feedback. I will look into your suggestions and resubmit. However, I am a little confused by your idea of falling back to the generic extractor with --force-generic-extractor. Isn't the generic extractor's check guaranteed to succeed here?

@dirkf
Copy link
Contributor

dirkf commented Oct 11, 2021

That's just the existing logic in the method, and I didn't think worth substituting. If you ask whether the URL can be handled by yt-dl and force the generic extractor, the answer is yt-dl can handle the URL using the generic extractor ("if you say so"), but I don't expect that anyone would ever ask that.

@pukkandan
Copy link
Contributor

Something to consider when trying to implement such a feature: https://github.com/ytdl-org/youtube-dl#how-can-i-detect-whether-a-given-url-is-supported-by-youtube-dl

@dirkf
Copy link
Contributor

dirkf commented Oct 13, 2021

As usual, a very good point. So the _report_error() line is never reached as long as the generic extractor matches every URL, and therefore it's only possible to determine if a specific extractor is matched vs the generic extractor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants