Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Doing a post request #874
Doing a post request #874
Comments
|
is urllib.urlopen(url,data) the best way to do it or is there any other way ? |
|
okay I have got it.
Now heres another question. the main url for vbox7's video is "http://vbox7.com/play:03fbd68d4e" but when we open it , it first redirects us to "http://vbox7.com/show:misscookie?back_to=%2Fplay%3A249bb972c2" and after that it again redirects us to "http://vbox7.com/show:missjavascript?back_to=%2Fplay%3A249bb972c2" and then finally it redirects us to the video page "http://vbox7.com/play:03fbd68d4e". How can i simulate these redirects ? My main IE code is :
Can you guys suggest me a solution ? |
|
Our HTTP handler should already follow HTTP redirects by default. But if you look at the URLs, it seems like we may have to jump through hoops to get the cookie, and simulate active JavaScript. |
|
If i do this using the python interpretter then i have to do this and it works :
I am trying to acheive this result. You can use this code and do a quick check. |
|
I would pay a lot to be able to use My bet is on cookies, let me have a try. |
|
hmm lets see what you can come up with and BTW i think we should stay away from beautifulsoup for as long as possible because it will greatly slow down youtube-dl. On Fri, Jun 7, 2013 at 11:46 PM, Filippo Valsorda
|
|
here is the code without requests this means that it should be compatible with youtube-dl:
|
|
Ok, the cookie protection is easily bypassed with a |
|
(Shouldn't our opener handle cookie transparently?) |
|
Hey why are we thinking about the cookies ? Is there no way to bypass the redirection check in youtube-dl and simply pass the url to the info extractors ? I have made a repository with a working code to download from vbox7. The code can be run with python 2.6 upto 3.3 without any dependency . In my code i havent given any importance to cookies or js. Sometimes there are small sollutions to big problems :) . Repository link https://github.com/yasoob/Vbox7-dl |
|
@yasoob What's your exact problem? it downloads the video for me. If it works without the need to care of cookies or javascript go ahead and implement it in youtube-dl. |
|
the problem is that when i run the code which i mentioned above gives me the following output.
|
|
That means either you have added |
|
OH thanx......:p It was a logical problem. I forgot to add VBox7IE() in the list of IEs .......Sorry :( |
|
Yeah, my usal workflow is, once I have an overall idea of how to extract the video, is to start making sure I match the url and then do the real work. Don't worry ;) |
|
Great, don't worry! (Still, I wonder how it can work, the missjavascript redirection is done with |
|
hey another little problem . In the following code
check the info_response variable. If i use urllib.urlopen(info_url,data) then everything works like a charm but If i use compat_urllib_request.urlopen(info_url,data) then i donot receive the correct response and so the IE breaks. Whats the matter ? Basically i am doing a post request here . |
|
Umh, the Fix: redirect_page, urlh = self._download_webpage_handle(url, video_id)
redirect_url = urlh.geturl() + re.search(r'window\.location = \'(.*)\';', redirect_page).group(1)
webpage = self._download_webpage(redirect_url, video_id, u'Downloading redirect page') |
|
Looking into the |
|
Uff, made it! urllib2 didn't set the class Vbox7IE(InfoExtractor):
"""Information Extractor for hypem"""
_VALID_URL = r'(?:http://)?(?:www\.)?vbox7\.com/play:([^/]+)'
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
video_id = mobj.group(1)
redirect_page, urlh = self._download_webpage_handle(url, video_id)
redirect_url = urlh.geturl() + re.search(r'window\.location = \'(.*)\';', redirect_page).group(1)
webpage = self._download_webpage(redirect_url, video_id, u'Downloading redirect page')
title = re.search(r'<title>(.*)</title>', webpage)
title = (title.group(1)).split('/')[0].strip()
ext = "flv"
info_url = "http://vbox7.com/play/magare.do"
data = compat_urllib_parse.urlencode({'as3':'1','vid':video_id})
info_request = compat_urllib_request.Request(info_url, data)
info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
info_response = self._download_webpage(info_request, video_id, u'Downloading info webpage')
if info_response is None:
raise ExtractorError(u'Unable to extract the media url')
final_url = (info_response.split('&')[0]).split('=')[1]
return [{
'id': video_id,
'url': final_url,
'ext': ext,
'title': title,
}] |
|
Oh so finally with a combined effort we managed to make an IE for Vbox7.com . I ll do a pull request with the tests and the IE shortly. After that you can close this issue as well #284 |
|
Great. Ah, change the docstring: |
|
Ah okay i ll do it..... ;) |
|
i have done the pull request #878 |
Hi, I am making an IE for http://vbox7.com/ . It is near completion. I wanted to know how to do a post request in InfoExtractors using _download_webpage or _download_webpage_handle . Can anyone tell me how to do a post request ?