Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add playlist downloads for ytsearch query. #16627

Open
TrevorLChaney opened this issue Jun 3, 2018 · 6 comments
Open

Request to add playlist downloads for ytsearch query. #16627

TrevorLChaney opened this issue Jun 3, 2018 · 6 comments

Comments

@TrevorLChaney
Copy link

@TrevorLChaney TrevorLChaney commented Jun 3, 2018

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.06.02. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • [x ] I've verified and I assure that I'm running youtube-dl 2018.06.02

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

When I query as follows:
$ youtube-dl ytsearch:"funny dogs, playlist"
all results of this query will be playlists. youtube-dl will download the first video in the playlist and not the others. I would ask that there be an option to download the entire playlist as there is already an implementation to download playlists in youtube-dl and all it would need is an 'if' statement to check if the URL of the first video in a query was of a playlist and if so you would just regex the playlist link from the URL.

Example:
The first video in the query "funny dogs, playlist" is this URL.
https://www.youtube.com/watch?v=wRqft4Rb4UU&list=PLGTScjq3z9bjBXGEUADkq27kmtfO-s25G
And the playlist link is:
https://www.youtube.com/playlist?list=PLGTScjq3z9bjBXGEUADkq27kmtfO-s25G
just regex out the video id and add 'playlist' where 'watch' is.

Therefore, if the query is of a playlist then youtube-dl could either prompt to download or ytsearch could have a flag to download playlists or even just printing the URL and saying that it is a playlist would be helpful.

I have looked at the code and seen that the pieces are there already there. They just need to be linked.

Respectfully,
Trevor

@Crypto90
Copy link

@Crypto90 Crypto90 commented Jun 13, 2020

Any plans to get this done? I require this too.
This feature request is open since: 3 Jun 2018

I will also look into the code, what needs to get changed.

@TrevorLChaney
Copy link
Author

@TrevorLChaney TrevorLChaney commented Jun 13, 2020

Go to the extractor for youtube, (youtube_dl/extractor/youtube.py) and in the function that uses 'ytsearch' as its _Search_Key you have to regex the url passed to it and if its a playlist you would have to then call the module that downloads playlists or copy that functionality over. I was hoping someone with knowledge of the code could do this and forgot about it.

Hope this was helpful

Respectfully,
Trevor

@Crypto90
Copy link

@Crypto90 Crypto90 commented Jun 13, 2020

Thanks, this was helpful, I will look at the code to get it changed and add a patch for it.
For me, I just want to ytsearch and list the results (a list of playlists) with dump-json. For my case its also printing only the video id. Its basically the same issue. So we have to get it corrected on the right place.
I think its a major feature here which is missing/not correctly working.

@Crypto90
Copy link

@Crypto90 Crypto90 commented Jun 14, 2020

As far as I can see, this regular expression:

class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'

in youtube.py which gets used here:

class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
    def _process_page(self, content):
        for video_id, video_title in self.extract_videos_from_page(content):
            yield self.url_result(video_id, 'Youtube', video_id, video_title)

    def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
        for mobj in re.finditer(video_re, page):
            # The link with index 0 is not the first video of the playlist (not sure if still actual)
            if 'index' in mobj.groupdict() and mobj.group('id') == '0':
                continue
            video_id = mobj.group('id')
            video_title = unescapeHTML(
                mobj.group('title')) if 'title' in mobj.groupdict() else None
            if video_title:
                video_title = video_title.strip()
            if video_title == '► Play all':
                video_title = None
            try:
                idx = ids_in_page.index(video_id)
                if video_title and not titles_in_page[idx]:
                    titles_in_page[idx] = video_title
            except ValueError:
                ids_in_page.append(video_id)
                titles_in_page.append(video_title)

    def extract_videos_from_page(self, page):
        ids_in_page = []
        titles_in_page = []
        self.extract_videos_from_page_impl(
            self._VIDEO_RE, page, ids_in_page, titles_in_page)
        return zip(ids_in_page, titles_in_page)

at the extract_videos_from_page function.

This regular expression checks only for the /watch?"v=XXXXXX" part and ignores the "&list=YYYYYYYYYYYYYYYYYYYY" from the href urls.

Im not too incorporated with the whole youtube-dl code to change something without probably breaking an other function, so maybe someone else which is more worked in in the code should look at this too.

@Crypto90
Copy link

@Crypto90 Crypto90 commented Jun 30, 2020

Got it working:

class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(&amp;list=(?P<plid>[0-9A-Za-z_-]+))?(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
        for mobj in re.finditer(video_re, page):
            # The link with index 0 is not the first video of the playlist (not sure if still actual)
            if 'index' in mobj.groupdict() and mobj.group('id') == '0':
                continue
            video_id = mobj.group('id')
            playlist_id = mobj.group('plid') if 'plid' in mobj.groupdict() else None
            if playlist_id != None:
                video_id = playlist_id

Result:

python youtube-dl -4 --no-warnings --no-check-certificate "ytsearch10:nightcore,playlist" --dump-json --playlist-start 1 --playlist-end 10 --flat-playlist
{"url": "PL0wqt_um4x0bsdViTJBmnl6KGMoSqxfZy", "_type": "url", "ie_key": "Youtube", "id": "PL0wqt_um4x0bsdViTJBmnl6KGMoSqxfZy", "title": "Most Viewed Nightcore"}
{"url": "PLckeMyCaCCIN_JU1V4oADW50DlGOREoLj", "_type": "url", "ie_key": "Youtube", "id": "PLckeMyCaCCIN_JU1V4oADW50DlGOREoLj", "title": "Ultimatum [Nightcore playlist]"}
{"url": "PLT5Nlf1NmV4Mqfha4j1YuCTk0GCL20bdN", "_type": "url", "ie_key": "Youtube", "id": "PLT5Nlf1NmV4Mqfha4j1YuCTk0GCL20bdN", "title": "BoxBox Nightcore COMPLETE"}
{"url": "PLT2qINU1sbUh0ECl94rEWddz9dLA10t6H", "_type": "url", "ie_key": "Youtube", "id": "PLT2qINU1sbUh0ECl94rEWddz9dLA10t6H", "title": "Nightcore Playlist"}
{"url": "PLyTi20i8cwsPyiwvjei-K6F04Rzm0uRpT", "_type": "url", "ie_key": "Youtube", "id": "PLyTi20i8cwsPyiwvjei-K6F04Rzm0uRpT", "title": "Old School Nightcore"}
{"url": "PLOTApxgUwuOKLxLhKAs8IVY7gPO7jBzD2", "_type": "url", "ie_key": "Youtube", "id": "PLOTApxgUwuOKLxLhKAs8IVY7gPO7jBzD2", "title": "Nightcore - Gacha Songs"}
{"url": "PL_2-M0cHcLO4PYOoiyq0PxOeHqGlhlnmQ", "_type": "url", "ie_key": "Youtube", "id": "PL_2-M0cHcLO4PYOoiyq0PxOeHqGlhlnmQ", "title": "The Ultimate Nightcore Playlist"}
{"url": "PLNNA4xoj8rtiyGCKKW-ZJAgE07puDN9wb", "_type": "url", "ie_key": "Youtube", "id": "PLNNA4xoj8rtiyGCKKW-ZJAgE07puDN9wb", "title": "Upbeat Nightcore"}
{"url": "PLac4ZIsVrXZpTiOg-aZTOv7cNupZRYwM6", "_type": "url", "ie_key": "Youtube", "id": "PLac4ZIsVrXZpTiOg-aZTOv7cNupZRYwM6", "title": "A sad nightcore songs"}
{"url": "PLW04mEa616kVRXV41POavEB7WXrcsrK4g", "_type": "url", "ie_key": "Youtube", "id": "PLW04mEa616kVRXV41POavEB7WXrcsrK4g", "title": "Nightcore \u27ff Why Don't We"}

I changed the regex to also search for the list=XXXXXXX value, if the href also has the list= attribute, the video_id get replaced with the found playlist_id.

Im currently working on the regex to fetch more informations like the video durations and if we find a playlist the video count from the list which is requested here (which I also need):
#25720

This also fixes the issue of not always resulting titles for playlists, as far as I can see.

@Crypto90
Copy link

@Crypto90 Crypto90 commented Jun 30, 2020

Update:
Some improvements how to return the results. Now checking if video id has length of 11 chars, if so, return result in Youtube Video format.
If video_id is larger than 11 chars, return result in the youtube playlist format:

Video duration extraction is already implemented too, but the

yield self.url_result

part needs a new argument to pass the duration over.

CURRENT CODE (youtube.py):

class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(&amp;list=(?P<plid>[0-9A-Za-z_-]+))?(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?(.*Duration:\s*(?P<duration>([0-1]?[0-9]|2[0-3]):[0-5][0-9]))?'

class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
    def _process_page(self, content):
        for video_id, video_title, video_duration in self.extract_videos_from_page(content):
            if len(video_id) == 11:
                #youtube video id found
                yield self.url_result(video_id, 'Youtube', video_id, video_title)
            elif len(video_id) > 11:
                #youtube playlist id found
                yield self.url_result('https://www.youtube.com/playlist?list=%s' % video_id, 'YoutubePlaylist', video_id, video_title)

    def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page, durations_in_page):
        for mobj in re.finditer(video_re, page):
            # The link with index 0 is not the first video of the playlist (not sure if still actual)
            if 'index' in mobj.groupdict() and mobj.group('id') == '0':
                continue
            video_id = mobj.group('id')
            playlist_id = mobj.group('plid') if 'plid' in mobj.groupdict() else None
            if playlist_id != None:
                video_id = playlist_id
            video_title = unescapeHTML(mobj.group('title')) if 'title' in mobj.groupdict() else None
            if video_title:
                video_title = video_title.strip()
            if video_title == '► Play all':
                video_title = None
            video_duration = mobj.group('duration') if 'duration' in mobj.groupdict() else None
            if video_duration:
                video_duration = video_duration.strip()
            try:
                idx = ids_in_page.index(video_id)
                
                if video_title and not titles_in_page[idx]:
                    titles_in_page[idx] = video_title
                    
                if video_duration and not durations_in_page[idx]:
                    durations_in_page[idx] = video_duration
                
            except ValueError:
                ids_in_page.append(video_id)
                titles_in_page.append(video_title)
                durations_in_page.append(video_duration)
                

Current results for PLAYLISTS:

python youtube-dl -4 --no-warnings --no-check-certificate "ytsearch10:nightcore,playlist" --dump-json --playlist-start 1 --playlist-end 10 --flat-playlist
{"url": "https://www.youtube.com/playlist?list=PL0wqt_um4x0bsdViTJBmnl6KGMoSqxfZy", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PL0wqt_um4x0bsdViTJBmnl6KGMoSqxfZy", "title": "Most Viewed Nightcore"}
{"url": "https://www.youtube.com/playlist?list=PLckeMyCaCCIN_JU1V4oADW50DlGOREoLj", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLckeMyCaCCIN_JU1V4oADW50DlGOREoLj", "title": "Ultimatum [Nightcore playlist]"}
{"url": "https://www.youtube.com/playlist?list=PLyTi20i8cwsPyiwvjei-K6F04Rzm0uRpT", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLyTi20i8cwsPyiwvjei-K6F04Rzm0uRpT", "title": "Old School Nightcore"}
{"url": "https://www.youtube.com/playlist?list=PLT5Nlf1NmV4Mqfha4j1YuCTk0GCL20bdN", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLT5Nlf1NmV4Mqfha4j1YuCTk0GCL20bdN", "title": "BoxBox Nightcore COMPLETE"}
{"url": "https://www.youtube.com/playlist?list=PLOTApxgUwuOKLxLhKAs8IVY7gPO7jBzD2", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLOTApxgUwuOKLxLhKAs8IVY7gPO7jBzD2", "title": "Nightcore - Gacha Songs"}
{"url": "https://www.youtube.com/playlist?list=PL_2-M0cHcLO4PYOoiyq0PxOeHqGlhlnmQ", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PL_2-M0cHcLO4PYOoiyq0PxOeHqGlhlnmQ", "title": "The Ultimate Nightcore Playlist"}
{"url": "https://www.youtube.com/playlist?list=PLT2qINU1sbUh0ECl94rEWddz9dLA10t6H", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLT2qINU1sbUh0ECl94rEWddz9dLA10t6H", "title": "Nightcore Playlist"}
{"url": "https://www.youtube.com/playlist?list=PLNNA4xoj8rtiyGCKKW-ZJAgE07puDN9wb", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLNNA4xoj8rtiyGCKKW-ZJAgE07puDN9wb", "title": "Upbeat Nightcore"}
{"url": "https://www.youtube.com/playlist?list=PLac4ZIsVrXZpTiOg-aZTOv7cNupZRYwM6", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLac4ZIsVrXZpTiOg-aZTOv7cNupZRYwM6", "title": "A sad nightcore songs"}
{"url": "https://www.youtube.com/playlist?list=PLEYzYW48QAoi63Z8vC2CfBjh4rOm2O_ny", "_type": "url", "ie_key": "YoutubePlaylist", "id": "PLEYzYW48QAoi63Z8vC2CfBjh4rOm2O_ny", "title": "Nightcore"}

Current results for NORMAL VIDEOS:

python youtube-dl -4 --no-warnings --no-check-certificate "ytsearch10:nightcore" --dump-json --playlist-start 1 --playlist-end 10 --flat-playlist
{"url": "https://www.youtube.com/playlist?list=RDQMHxp3ukc1BX4", "_type": "url", "ie_key": "YoutubePlaylist", "id": "RDQMHxp3ukc1BX4", "title": "Nightcore"}
{"url": "fSlOHZXhcMk", "_type": "url", "ie_key": "Youtube", "id": "fSlOHZXhcMk", "title": "Nightcore - Dracula - (Lyrics)"}
{"url": "z6lJlIp9Y50", "_type": "url", "ie_key": "Youtube", "id": "z6lJlIp9Y50", "title": "Nightcore - Tick Tick Tick - (Lyrics)"}
{"url": "SyqWF7nDS48", "_type": "url", "ie_key": "Youtube", "id": "SyqWF7nDS48", "title": "Nightcore - Salty & Sweet - (Lyrics)"}
{"url": "nuTsXUYUavw", "_type": "url", "ie_key": "Youtube", "id": "nuTsXUYUavw", "title": "Nightcore - What You Made Me - (Lyrics)"}
{"url": "hjGZLnja1o8", "_type": "url", "ie_key": "Youtube", "id": "hjGZLnja1o8", "title": "Nightcore - Rockefeller Street"}
{"url": "9xG5aPvrS-k", "_type": "url", "ie_key": "Youtube", "id": "9xG5aPvrS-k", "title": "Nightcore - No Friends (Lyrics)"}
{"url": "E3FfwK81OsU", "_type": "url", "ie_key": "Youtube", "id": "E3FfwK81OsU", "title": "Nightcore - Savage Love - (Lyrics)"}
{"url": "p-HtwjeFoZA", "_type": "url", "ie_key": "Youtube", "id": "p-HtwjeFoZA", "title": "Nightcore - Out of Time - (Lyrics)"}
{"url": "ceZSMQ8LFKQ", "_type": "url", "ie_key": "Youtube", "id": "ceZSMQ8LFKQ", "title": "Nightcore - Common Sense - (Lyrics)"}

Pending todo:

  • finish duration pass over to url_result(....), need to figure out where the function arguments are defined.
  • Fetching youtube playlist video counter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.