Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube:user] make user/c regex a separate info IE #10126

Closed
yuri-sevatz opened this issue Jul 19, 2016 · 5 comments
Closed

[youtube:user] make user/c regex a separate info IE #10126

yuri-sevatz opened this issue Jul 19, 2016 · 5 comments

Comments

@yuri-sevatz
Copy link

@yuri-sevatz yuri-sevatz commented Jul 19, 2016

Just out of curiosity... can we make the recent user/c patch a separate IE? (Or any objection if I created a pull request to do this?). I noticed a build breakage after this.

9558dce

I have some code that's been leveraging the 1:1 relation between an IE and its _VALID_URL _TEMPLATE_URL to create some quick and dirty automated scrapers for a bunch of IE's. Best part of this is that when you do this, the "unique identity" of a video can be determined while offline for a lot of IE's, and you can reverse the "unique identity" back to a usable url when traversing.

Here's the project and the way we're using these for your consideration:

https://github.com/yuri-sevatz/youtube-sync/blob/master/youtube_sync/__init__.py#L420

I know I should have tried to merge some of this logic to youtube-dl a while ago, but I've been too lazy and shy :)

@yuri-sevatz
Copy link
Author

@yuri-sevatz yuri-sevatz commented Jul 19, 2016

I suppose alternatively I can make the "unique identity" of a video take an array of arguments (json, etc), but this is not ABI-maintainable between youtube-dl upgrades, unless we start to version them.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 19, 2016

In 3rd party code you should neither rely on _VALID_URL nor on _TEMPLATE_URL. As well as you should not make any assumptions about particular extractor implementation details. I don't see much sense to apply such changes just in order to make some 3rd party code relying on that work since some another code may require the opposite.

@dstftw dstftw closed this Jul 19, 2016
@yuri-sevatz
Copy link
Author

@yuri-sevatz yuri-sevatz commented Jul 19, 2016

So I understand the argument and I agree if we don't go into any detail about what these could do.

What I'm saying here is that "some 3rd party code" happens to handle the notion of video identity a little better than youtube-dl, gets Google accounts banned far less often than youtube-dl, and can perform analysis of what a playlist (or any source for the matter) has and doesn't have offline, and can cross-reference any updates with what you want -- without re-running the IE's and harassing the remote servers for everything else that you already have.

Because that's what gets accounts banned!

... and it does this better than youtube-dl, using mostly the information you already have inside your IE's. If the only thing you logically need API-wise is the capacity to make video identity both be:

  1. Extracted from an IE.
  2. Re-Inserted into an IE

-- Then why not aim for it because it's easily low-hanging fruit from what you've already got in IE definitions?

I can do it myself, I really just wanted to see what you guys thought.

@yuri-sevatz
Copy link
Author

@yuri-sevatz yuri-sevatz commented Jul 19, 2016

Even if you don't want to go into any detail on this, there's simple reason why this looks like a contradiction to me:

+        # Only available via https://www.youtube.com/c/12minuteathlete/videos
+        # but not https://www.youtube.com/user/12minuteathlete/videos
+        'url': 'https://www.youtube.com/c/12minuteathlete/videos',
+        'playlist_mincount': 249,
+        'info_dict': {
+            'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
+            'title': 'Uploads from 12 Minute Athlete',
+        }
+    }, {

So, by this logic, a www.youtube.com/channel/* could very well go into the YoutubeUserIE too -- even though the set of videos it returns is mutually exclusive to both www.youtube.com/user/* and www.youtube.com/c/*. What is the meaning of the YoutubeUserIE? What is the meaning of the YoutubeChannelIE?

If they're just talking about vague concepts that happen to be convenient to the maintainers of the regexes, and that they mean nothing to userspace, then why even have them exposed to userspace at all? Why does the user-level api even have different IE's, if there's no distinguishing between the things they're allowed to refer to?

@yuri-sevatz
Copy link
Author

@yuri-sevatz yuri-sevatz commented Jul 19, 2016

In the example they give:

c/12minuteathlete and user/12minuteathlete are not interchangeable, therefore it begs the question why are they in the same IE?

The underlying user seems to be user/the12minuteathlete when I click on the title on the page for c/12minuteathlete, so whatever c/12minuteathlete is, it's certainly not a user!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.