-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[youtube] Fix extracting YouTube search URLs and feeds #25734
Conversation
I also have a similar problem when extracting the videos from the subscriptions page. After inspecting the page I think the |
This comment has been minimized.
This comment has been minimized.
That's weird, I have it working here, with cookies. When I run with the subscriptions url and use |
This moves feed extraction from using html content to json metadata. However, loading additional pages no longer works. The _extract_video_info function also returns a continuation object that contains some metadata that - together with an API key that is in the page source - might be used to request the next page.
When I tried extracting cookies using another browser, it worked and logged me in correctly, so there's no issue in that part (probably some issue when using firefox containers and cookie extraction extensions?). I fixed the extraction part for these feeds (history, subscriptions, recommended), but it will only download the first page now.
If you want to try out the current version (and installed over pip):
|
Could be, I extracted mine using Chrome
Awesome! It works now, thank you so much! 💪 At least for me the first page is enough. Don't forget to update the description on the PR to mention this limitation |
If an object looks like a video (it has a `videoId` key), assume that it is.
Downloading youtube subscriptions worked with this yesterday, but today something seems to have broken: [debug] System config: [] |
…change was reverted The old code now works again, but it downloads without limit. This is why a limit of 1000 videos is added, it can be overwritten with the `--max-downloads` option - that way, only so many ids will be extracted as videos downloaded
Yeah, it seems like they changed the format to the old one again. With the old code the extraction will work, but it will extract an seemingly infinite number of video pages instead of downloading any videos. |
almost 2 weeks this problem continues. Is there any alternative method to get a video ID from an arbitrary search query with various parameters? I am beginning to fear that the search function will not be fixed. I heard that it is possible to load a page into a browser object and receive an already processed version of the page with all the necessary links. can this option be used or will it be too cumbersome? |
yes, commit 7a74fed works fine, please rollback
not here - "Downloading 19 videos" i can append |
Yes, it definitely works for me. but why are there no changes in the main program? 20 days have passed :( I hope the work on this application is not abandoned. |
So a quick update here, from my end the current code seems to work fine both for subscriptions and search queries. It looks like introducing the limit with feeds was unnecessary as they end at some point (for me it downloaded about 170 subscription pages) so I removed it again. |
Seems like this is still not fixed in the main release of YTDL? Today I had to update it because extracting of YouTube videos broke (in general). But now I lost the fix from that special branch with fixes for /results?search_query=... When will it get merged? Or how to combine both for now? |
So I guess I sit tight and wait for the extraction to be fixed elsewhere as I guess it cant just be me with this issue. |
If an object looks like a video (it has a `videoId` key), assume that it is.
In order to extract videos from further pages, we need to get various variables that are in an argument to the `ytcfg.set` call in a script on the feed page.
If the markup of the page changes in the future, it might be possible that _FEED_DATA still works, but the other regex does not. SInce it is not necessary for the first page of videos, we make sure the program doesn't exit before extracting them. TL;DR: Extract the first video page even if there are problems
Seems like this attribute is moved every few weeks, so we just extract both and use the one that is present.
This now supports declarations like `window["ytInitialData"] = ...` and `var ytInitialData = ...`
…into fixYTSearch
Just to confirm:
was caused by a stale cookie in my case. The code in this PR holds. |
I think you can use the -u and -p command line flags to log in with your username and password, didn't test if it works though. |
5e26784
to
da2069f
Compare
Closing this pull request because this has already been resolved for some time. |
Not working on git-latest b8b622f
|
Ah, didn't really check the search URL extractor when closing this. The I'm not entirely sure what to do because my fork got taken down and I can't seem to get it back, so I can't make any edits to this pull request. And in its current state it's not mergeable at all. Should I create a new fork & pull request and add both, or only the search url extractor? |
Thanks! IMHO new pulls for each. |
Same problem, search URL still not working:
|
Yes, it still doesn't work. I tried integrating the changes from this pull request into the current code, but I failed at doing so. It seems like no matter what I try, the Youtube Tab extractor is always prefered before the SearchURL extractor is called. And the tab extractor fails. I did uncomment this line and tried a lot of other stuff (such as returning @classmethod
def suitable(cls, url):
if YoutubeIE.suitable(url) or YoutubeSearchURLIE.suitable(url):
return False
return super(YoutubeTabIE, cls).suitable(url) But it didn't seem to work and I gave up since I currently don't have much time. If someone else finds out how to work around the tab extractor, it would be nice to know. The changes I made are here |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description
URLs like
https://www.youtube.com/results?search_query=test
have been broken for a few days since the data is now in a JSON object instead of being embedded into the HTML of the search page.Now the
window["ytInitialData"]
variable is extracted and then searched recursively for any object containing the keyvideoId
. These objects are all over the place in the JSON document.The change also affects feed urls like
https://www.youtube.com/feed/subscriptions
,https://www.youtube.com/feed/history
etc. For those the same logic is implemented.Subscriptions additionally contain a
nextContinuationData
object that is then used (in combination with data from aytcfg.set
call from a script in the page) to make a request to fetch the next pages.TODO / Requests for input
Please feel free to point out anything that doesn't seem right