Option to determine extractors ahead of time #30081

NoSuck · 2021-10-11T02:51:00Z

Please follow the guide below

You will be asked some questions, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your pull request (like that [x])
Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Sometimes I just want to know whether youtube-dl can be expected to handle a given URL. This option accomplishes that quickly. Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning. The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds. You can use it, for example, to handle arbitrary URLs intelligently: If --determine-extractors indicates success, run youtube-dl (or mpv or whatever). Otherwise, run $BROWSER.

Thank you.

Sometimes I just want to know whether youtube-dl can be expected to handle a given URL. This option accomplishes that quickly. Previously, I would run youtube-dl with --simulate or --skip-download, but these would take 5+ seconds on my system before returning. The --determine-extractors option of this PR, however, only takes 1.3 to 2.6 seconds. You can use it, for example, to handle arbitrary URLs intelligently: If --determine-extractor indicates success, run youtube-dl (or mpv or whatever). Otherwise, run $BROWSER. Thank you.

dirkf

It should be possible to tweak YoutubeDL.extract_info to recognise your proposed option. Then you can be sure that the same IE selection algorithm is being used, and avoid extra code that might get out of sync. Eg:

+        determine_extractors = self.params.get('determine_extractors')
         if not ie_key and force_generic_extractor:
             ie_key = 'Generic'
 
         if ie_key:
             ies = [self.get_info_extractor(ie_key)]
         else:
             ies = self._ies
 
         for ie in ies:
             if not ie.suitable(url):
                 continue
 
             ie = self.get_info_extractor(ie.ie_key())
             if not ie.working():
+                if determine_extractors:
+                    continue
                 self.report_warning('The program functionality for this site has been marked as broken, '
                                     'and will probably not work.')
 
+            if determine_extractors:
+                self.to_stdout('%s %s\n' % (ie.IE_NAME, url))
                 return
             return self.__extract_info(url, ie, download, extra_info, process)
         else:
             self.report_error('no suitable InfoExtractor for URL %s' % url)

dirkf · 2021-10-11T17:02:13Z

The option could be named --get-extractor rather than --determine-extractor, which would align with the other simulation options, as long as the result matches the other similarly named options: they all work like --get-filename --output '%(key_name)s' for whichever key_name in --get-key_name. So if this option behaves like --get-filename --output '%(extractor)s' (I guess), it could be renamed; the output would have to be just the extractor name.

NoSuck · 2021-10-11T18:26:13Z

Thank you for the helpful feedback. I will look into your suggestions and resubmit. However, I am a little confused by your idea of falling back to the generic extractor with --force-generic-extractor. Isn't the generic extractor's check guaranteed to succeed here?

dirkf · 2021-10-11T18:49:29Z

That's just the existing logic in the method, and I didn't think worth substituting. If you ask whether the URL can be handled by yt-dl and force the generic extractor, the answer is yt-dl can handle the URL using the generic extractor ("if you say so"), but I don't expect that anyone would ever ask that.

pukkandan · 2021-10-13T02:32:46Z

Something to consider when trying to implement such a feature: https://github.com/ytdl-org/youtube-dl#how-can-i-detect-whether-a-given-url-is-supported-by-youtube-dl

dirkf · 2021-10-13T22:53:13Z

As usual, a very good point. So the _report_error() line is never reached as long as the generic extractor matches every URL, and therefore it's only possible to determine if a specific extractor is matched vs the generic extractor.

dirkf reviewed Oct 11, 2021

View reviewed changes

dirkf force-pushed the master branch from 01bf89e to 4c6fba3 Compare August 26, 2022 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to determine extractors ahead of time #30081

Option to determine extractors ahead of time #30081

NoSuck commented Oct 11, 2021 •

edited

dirkf left a comment

dirkf commented Oct 11, 2021 •

edited

NoSuck commented Oct 11, 2021

dirkf commented Oct 11, 2021

pukkandan commented Oct 13, 2021

dirkf commented Oct 13, 2021

Option to determine extractors ahead of time #30081

Are you sure you want to change the base?

Option to determine extractors ahead of time #30081

Conversation

NoSuck commented Oct 11, 2021 • edited

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

dirkf left a comment

Choose a reason for hiding this comment

dirkf commented Oct 11, 2021 • edited

NoSuck commented Oct 11, 2021

dirkf commented Oct 11, 2021

pukkandan commented Oct 13, 2021

dirkf commented Oct 13, 2021

NoSuck commented Oct 11, 2021 •

edited

dirkf commented Oct 11, 2021 •

edited