-
-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some more url formats would be nice #40
Comments
I was under the impression that using the user name like that is not going to be unique, as a user can have multiple channels. Or am I mistaken there? At the moment I detect the type of link out of the length of the ID, as a way to separate between video, channel and playlist. veritasium is a good example, that is 11 characters, same as a video ID, not saying that it couldn't be done to separate between these things, but I think that will only be more confusing. Additional as you can also just add the video ID alone without the whole URL, this will make things more complicated and potentially ambiguous. So for the playlist link, yes, I chose that by design, when you open that page, you land on the video not on the playlist. I don't think that it would make sense to download the whole playlist by giving a URL to a video. While giving the link to the playlist will download the whole playlist. Btw: I have started writing these things down in the wiki. |
The channel name url is unique https://support.google.com/youtube/answer/2657968 Finding the parts of the url can be done by checking for the query parameters and start of url: Shared links have the id directly after the domain name: A channel url could be identified by the first part of the url: Links like this could be difficult (but nowher linked on youtube so i think they are not needed but nice to have): Formatting links in markdown is a little bit tricky but I hope you can see what I mean :) |
Ah I see, I was under the impression that's the username and not the channel name, my mistake. So as I understand it, even though the channel name is unique, it can change. So I think the solution here would be to convert the channel name to the channel ID at that point in time and keep using the channel ID for everything there after. Additionally to the link formats you have mentioned, I have also found: I think this is the old format, so some of the channels I'm subscribed to have been around for some time like it's: Strangely A little bit confusing but the solution above still stands. So yes, I think your way of identifying that is better than my current solution. This will require a rewrite of the process_url_list. Any chance you want to take a look at it? |
I have a rough outline of how this could work using only the modules that are already included. I will get a pull requested either later today or tomorrow. Outline of tasks: I'll reference this when I get the pull request. |
ChannelId can also be found in a meta field: <meta itemprop="channelId" content="UCHnyfMqiRRG1u-2MsSQLbXA"> |
I don't have much experience with Python, but wouldn't it be easier to parse the links with RegEx? You could define patterns for videos, playlists and channels and extract the relevant parts from the URL. Maybe I have time next weekend and can try to implement something like that. Quick prototype: (https?:\/\/)(www\.)?(youtube|youtu)\.([\w]{2,3})\/(watch\?v=)?(?<videoId>[\w\d]{11}) Group "videoId" would hold watch IDs of these urls:
The RegEx would have to be adapted further so that other possible cases are also covered. For example, this regex does not yet cover links that have the v parameter in the first position. |
Based on my understanding of the code, the specific function provides the following: My proposed solution attempts to get the channel data from youtube directly (could also call the youtubedl function, but it wasn't already imported within this script); I could adjust it to use the meta tag, I just found the tag via my own search for the ID and found that it provided it via a link tag, with the full canonical URL, which would then be able to be passed to the function recursively and provide the same sanitized results. There are more clean ways to do it, and you're welcome to submit your own PR that does it differently. This was my quick way to get it to work so we could show that it functions, then we can enhance it afterward. Import a parser, like Beautiful Soup, or sending a javascript request as part of the Requests call to get the specific tag, or we can look at utilizing the youtubedl function to maintain consistency. |
Thanks for everybody looking into it! I do think that should be passed to yt-dlp to extract instead. I mean all of the solutions above will work but from a future maintainability standpoint, if yt-dlp already provides the needed information, that's going to make things easier going forward. I have no qualms importing yt-dlp there, that's a good reason. So the example using yt-dlp to extract the channel_id for "/c/" or "/user/" urls could look something like this: obs = {
"default_search": "ytsearch",
"quiet": True,
"skip_download": True,
"extract_flat": True,
"playlistend": 0,
}
chan = youtube_dl.YoutubeDL(obs).extract_info(url, download=False)
channel_id = chan["channel_id"] Maybe that's where this process_url_list function should be convert to a class. "If it doesn't fit in a screen, it's too big" or something, I dont' know, I don't always follow these rules as well... :-) So pseudo code: class UrlListParser:
"""take a multi line string and detect valid youtube ids"""
def __init__(self, url_str):
self.url_list = # split the string by newline
def process_list(self):
"""loop through the list"""
youtube_ids = []
for url in self.url_list:
if "/c/" in url or "/user/" in url:
# dedect /c/ and /user/
youtube_id = self.extract_channel_name(url)
youtube_ids.append({"url": youtube_id, "type": "channel"})
else:
# dedect the rest
youtube_id, id_type = self.find_valid_id(url)
youtube_ids.append({"url": youtube_id, "type": id_type})
return youtube_ids
def extract_channel_name(url):
"""extract channel_id from channel name using yt-dlp"""
# do the extraction
return youtube_id
def find_valid_id(url):
"""extract the id and detect the type"""
# do the extraction
# id_type can be channel, video or playlist
return youtube_id, id_type Then the class can be called like that from anywhere needed: youtube_ids = UrlListParser(url_str).process_list() @lamusmaser Do you want to flesh it out? It still needs to raise a ValueError if extraction fails... Then I'll change it everywhere in the project where it needs to be called and we have a nice reusable and extendable class there! :-) |
@bbilly1 I will take a swing at updating it and any of the references based on pseudo above. I'll see if I can get more in this weekend, otherwise it will be next weekend. |
Nice! I've planned to make a new release later today to get some of the new UI and some other changes out. But no pressure, that's important to take the time needed, and can also be in the release after. |
Today I tried to include some playlists in my download que and it looks like not every playlist id is 34 characters long. Can this also be fixed by using yt-dlp? As example this URL has a playlist ID with 18 characters: |
This has not been forgotten, I just haven't had the time to devote to recoding this section. If someone else gets to it first, please say so here. It is in my to-do list, but I have a few other items ahead of it (unrelated to this project). |
OK, I took a swing at that, I think I could get all of the requirements working as expected, even www.youtube.com/veritasium works. :-) Look forward to the improvements in the next release! |
OK, this is now merged in v0.0.7. Thanks for everybody looking into it! Closing this for now. |
I'm on v0.0.7 and I can't add YouTube Video URLs like these https://youtu.be/2tdiKTSdE9Y to the download queue (that url is taken directly from the wiki). If I try adding it, it tells me:
|
Oh man, of course you are right... Thanks for taking the time, that's very unfortunate... |
Channel links with the Channel name don't work:
As a workaround I currently copy the link from the channel name when watching a video. This link has the neded channel id.
Playlist links in this format: https://www.youtube.com/watch?v=aFPJf-wKTd0&list=UUHnyfMqiRRG1u-2MsSQLbXA&index=2 are parsed like one video but not as list.
The text was updated successfully, but these errors were encountered: