Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Redesigned search #19650

Closed
forthrin opened this issue Feb 24, 2019 · 5 comments
Closed

Feature request: Redesigned search #19650

forthrin opened this issue Feb 24, 2019 · 5 comments

Comments

@forthrin
Copy link

@forthrin forthrin commented Feb 24, 2019

  • Feature request (request for a new functionality)

Referring to #19311, if an interactive mode, no matter how useful, is considered out of scope, it's fully possible to write a wrapper around youtube-dl that does the suggested functionality (which is more or less what I've already done). Maybe I'll turn this is into an open source project.

Now, it would be nice to rely on youtube-dl for searching (multiple) sites. However, the current search functionality is slow to the point of being useless, because it visits the video page for every search hit, eg. 30 search hits = 30 page loads = 30 seconds = time that no-one is willing to spend waiting.

I'd like to propose that searches in youtube-dl returns immediate hits from the search page with the information that is immediately available there and return a JSON object (faux example below). Syntax: youtube-dl search-engine:search-phrase[:page-number]

$ youtube-dl ytsearch:madonna:1
[
0: {url: https://youtube.com/<id>, title: "Madonna - True Blue", time: 3:50},
1: {url: ...}
...
]

I also noticed that which search engines are supported don't seem to be documented in the manual pages, and they seem to be very few in general (maybe only three?) There should be more, and it's very easy to write this code. I'd be happy to contribute with a couple of sites.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 24, 2019

ytsearch<N>:madonna --dump-json --flat-playlist --playlist-start Y --playlist-end X.

@dstftw dstftw closed this Feb 24, 2019
@forthrin
Copy link
Author

@forthrin forthrin commented Feb 24, 2019

@dstftw: Ah! So --flat-playlist does this. However, I still think, for optimal performance, the search function should be paginated, eg. ytsearch:madonna should return all hits initially returned by the site. Then something like ytsearch:madonna:2 should return the next page, and so on.

You don't know how many hits are returned by the site each time, so if you do a blind guess and do ytsearch30 this could cause youtube-dl to fetch two pages, ie. twice the wait. And if you do ytsearch10 the same page has to be fetched again when you want hits 10-19 since nothing is cached.

Also, available search engines should be documented in the manual pages. Also, let me know if you welcome more search engines or if this is not really a priority. I was surprised there only seems to be three (yt, gv and yb). Is there a particular reason there's so few?

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 24, 2019

Pagination is not technically possible in all cases cause not all services have such notion as a page.

let me know if you welcome more search engines or if this is not really a priority

They are treated as regular extractors no more no less.

There are more search extractors, search for SearchInfoExtractor.

@forthrin
Copy link
Author

@forthrin forthrin commented Feb 24, 2019

OK. I understand why you've chosen such a generic solution.

Seems there is a SoundCloud search too, but that's all I could find. Four sites in total.

PS! I noticed there's an error in the JSON data:

{"url": "GuJQSAiODqI", "_type": "url", "ie_key": "Youtube", "id": "GuJQSAiODqI", "title": "Madonna - Vogue (Official Music Video)"}

The url value should obviously be a full URL. Also, this data could do nicely with a time value, by adding something like (.*: (<?P<time>(\d{1,2}:)?\d{1,2}:\d{2})\.)? to _VIDEO_RE.

{"url": "https://www.youtube.com/watch?v=GuJQSAiODqI", "time": "3:50", ...}
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 24, 2019

No error here, url is allowed to be an URL or a shortcut that matches extractor's _VALID_URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.