Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Having youtube-dl Grab From Source Instead of URL #10511
Comments
|
Duplicate of #5768. |
|
I went through the #5768 and what @jaimeMF wrote doesn't completely make sense and I'll explain why. Reason 1) It would unnecessarily complicate the code, we would need to modify the extractors so that they work both by giving an url and the webpage. (I gess that you are problably only interested on youtube, but we support more sites). Answer 1) Not really, it would be extremely simple to add as it would only be a matter of adding an if-statement checking if the
The main page source is only downloaded once so this if-statement would also only have to be used once. Most hosts also don't restrict the direct video link to the video so youtube-dl would have no issue working with the source as a usual request after that point. Reason 2) What happens if we decide to use another page for getting the info? You would get error messages and we would need to explain what has changed, and this is internal behaviour so we shouldn't have to. Answer 2) Like I explained in my previous answer, nothing would change other than adding that if-statement so no additional errors would have to be caught. Everything would be exactly as a normal request only that the user was allowed to input the source. Reason 3) You would need to specify an extractor, which is another source of bug reports because people would use the wrong extractor and we sometimes split extractors or add a special one for handling some urls. Answer 3) There's a point here, but it would be very easy to solve as well. The user would simply add the URL along with the parameter just like any other command. For example |
|
|
You're not understanding how this would work and once again I'll explain why.
I think we can agree that currently youtube-dl usually works something like this: User provide a URL -> Youtube-dl -> Downloads URL source With the new page source parameter you would simply only change the "Downloads URL source" part which leaves the rest of the process untouched. It would look like this: User provide a URL -> Youtube-dl -> page source PROVIDED THROUGH PARAMETER Now as I said in my PREVIOUS answer, youtube-dl would in most cases get the direct link RIGHT AWAY (this is the case for Facebook and Instagram and many more as most pages have the video URL right on the page) from the page source WITHOUT having to download any additional assets or it would download a few assets first. Now like I said, the assets usually are NOT restricted by the user session as they're on static file servers which in very rare cases just check the IP. |
we don't always request the url passed by the user, sometimes we use other sources to extract information, these are some examples of the extractors that do this: CBSIE, CWTVIE, CTVIE, TouTvIE, WatIE... |
|
@remitamine: I was expecting this to be mentioned sooner or later. My answer is that this is very, very rare and almost never happens. Take a look at the size of the sites that you mentioned for example and besides 1 extra request for these rare cases won't really cause any performance issues. |
Do you even read my post? Core does not and should not be aware of which URL or set of URLs extractor needs and which actions this extractor should take to obtain them, which headers and cookies to provide and so on.
No it's not. And I did not say that.
No it's not. Even if this subset of dumb extractors that pull only the original URL without any HTTP request tweaking, without custom headers and cookies is generalized there are still extractors that does not follow this pattern and can't be generalized this way.
No, not necessarily.
How come? What if somebody wants to provide two first pages? Solution should be generic.
Fetching page source sets some cookies required for further extraction that are not available as bare file => extraction won't work.
No it's not. Even assuming this to be true for a second, what about other extractors? Say, those that don't download source URL at all. Then the way you suggest it to be implemented does not fit them at all and this feature won't be available for them that leads to even more confusion.
Everything else may require. So this does not worth the effort. |
|
@dstftw has explained things well. Here are just some interesting statistics:
|
|
@yan12125: no reason to continue discussing this then as editing 900 files would take way too much time. |
|
Well, the point is not that we have lots of extractors, but your solution applies to few websites. If you want some website supporting logging in, open new issues for each one. |
|
This is still completely doable and the solution does actually apply 475 websites/extractors as you mentioned, but then again you'd have to add the if-statement to every single one of those extractors. |
|
I count extractors that downloads the web page, not extractors that only downloads the web page. I didn't find a quick way to count the latter, so I skip it. It should be much less than 475. |
|
@dstftw: now over to your reply:
Your post mentioned the core AFTER I wrote that, but it doesn't matter much as the solution could be applied to each extractor individually although it would be quite some amount of work.
The solution is generic, the only difference is that this would only work with a single URL.
I've explained this part twice now. There is no reason for the page source to contain an expired hash as the source would be recently taken. `Everything else may require. So this does not worth the effort.` "I've explained this part twice now. There is also no reason for there Again, this is completely doable, the only thing in the way is that the if statement would have to be added to every single extractor that downloads the page source and like @yan12125 pointed out it's a few hundred files. |
This is the key point whether this is doable or not. Please give one example URL that you think this is doable.
Likely, it's your work to give examples to prove that your proposal works, not ours. |
|
It is indeed my duty to provide examples, but I didn't bother as I don't think anyone is interested in editing hundreds of files just to add this parameter (although it'd be extremely useful to have). |
I pointed out that extraction process is an implementation detail that should be known only to extractor itself in the very first reply. You are just speculating with terms in order to continue stubborn arguing with facts.
Generic solution offers arbitrary control. By definition. With generic solution one should be able to control any depth of extraction process not only special case with N=1.
"almost always" and "usually" does not mean always. These are concrete extraction scenarios of concrete extractors in youtube-dl codebase.
No, you can't make such assumptions. You are also not allowed to make any assumptions on IP and way this page was obtained thus making it again impossible in case of IP-bound source page.
The point is not adding this parameter but adding support for these unsupported URLs (that you haven't even provided so far) directly, adding support for authentication for extractors that does not support it, adding support for password protected videos and so on. |
Before submitting an issue make sure you have:
What is the purpose of your issue?
Description of your issue, suggested solution and other information
This might be the second or third time that I request this feature because it would be so extremely useful to have.
There are tons of reasons to why this would be useful to have, but the main reason is that this way I could have youtube-dl extract video URLs on member restricted pages such as Facebook videos set to private.
YouTube videos are fine as I can use the username and password parameters to log in, but for other sites (like Facebook and Instagram among others) I'm shit out of luck so I have to resort to installing software to sniff the URLs.
If there was just a simple parameter like
--get-from-source C:\location\to\website_source.htmlwhich then dumped the video links then I might be free from bundled adware at the end of the day.