Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Difficult to debug extractors #6701
Comments
|
It's not too clear, but extractors only work if they have the from youtube_dl import YoutubeDL
from youtube_dl.extractor import YoutubeUserIE
ydl = YoutubeDL()
ie = YoutubeUserIE(ydl)
info = ie.extract("https://www.youtube.com/user/rhettandlink2")But in general you shouldn't use the extractors directly: from youtube_dl import YoutubeDL
ydl = YoutubeDL()
# this resolves redirects and extracts info from playlist import entries
info = ydl.extract_info("https://www.youtube.com/user/rhettandlink2", download=False)Instead of writing python, the method I use (and probably other developers do the same) is to call the program with the correct parameters ( About the problem with all the function calls, most functions do a relatively simple thing (download a webpage, extract some value with a regex ...) which simplifies the process and others are used to reduce the complexity of some extractors (in the case of |
|
Thanks for your kind answer. That helps me understand it a lot better. I'll see what I can do. If I may, I suggest that some of this info be added to the |
|
Thanks to your help here, I was able to fix the bug! |
I know a little bit of Python, so I decided to try to debug the bug I just filed, issue #6699. I cloned the repo and did the following in a terminal (which, by the way, was very non-obvious, because doing what
CONTRIBUTING.mdsuggested doing (python -m youtube_dl) loaded my distro's out-of-date module, not the one in the current directory):The output I get is this:
This doesn't seem to make sense, because the docstring says this:
It would seem that the only thing this method should do is to return a list. But instead, it also generates output to the screen, and fails if not run from...the executable script, I guess?
So, since I had run
youtube-dlwith--dump-pagesearlier, I loaded one of the pages (which was a JSON playlist segment) into Python and tried to extract directly from the file:This makes no sense, because, having looked at
YoutubeChannelIE.extract_videos_from_page, it looks like it should parse out the videos fromtestpage, which looks like this:Now I see that there are double-escaped quotes in there, which will mess with the regexp in
YoutubeChannelIE.extract_videos_from_page. So I try to follow the chain of functions that download pages and parse them and decode them to find out how the corrected HTML gets toextract_videos_from_page()...but I am lost in a maze of functions calling functions calling functions, from one file to another, across directories...I will try to summarize:
CONTRIBUTING.mdare not helpful for trying to debug extractors from a current, cloned repo.extract()method should do only what it says (return a list), not also output to the screen, which fails if not correctly initialized (for which there is no documentation).If these issues could be addressed, I would imagine that more people would be able to contribute by fixing the inevitable broken extractors that happen when sites change.
Thanks for any help and for making
youtube-dl. I don't mean this to be rude or harsh criticism; I'm just trying to document how I tried to debug it and got stuck so that perhaps the process can be improved.