-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web page 404 #77
Comments
This is because of #8 (urllib3 sends the entire url in the GET line, instead of just the path). Seems the servers running waptt.com don't like that. I would suggest using https://github.com/kennethreitz/requests for now, which is built on top of urllib3 but does a lot of extra things for you like strip out the scheme/host from the GET line before sending it. |
I've tested the grequests (Requests + Gevent https://github.com/kennethreitz/grequests) and Urllib3 the performance comparison and concluded that much better gevent + urllib3 performance than grequests ,so I gevent + urllib3 |
Interesting. Could you share your benchmark methodology and numbers? I'm curious to see where Requests is slow; I'm sure we can speed it up. My other suggestion, for now, would be to implement your own PoolManager which removes the scheme+host from the request url before passing it on. |
Requests speed is slower and more error python gtest.py Test code: import sys
import gevent
from gevent import monkey
gevent.monkey.patch_all(thread=False)
import grequests
import urllib3
http = urllib3.PoolManager()
def call_back(resp):
content = resp.content
def worker(url, use_urllib2=False):
if use_urllib2:
content = http.request('GET', url)
else:
rs = [grequests.get(u) for u in url]
resps = grequests.map(rs)
for resp in resps:
call_back(resp)
urls = ['http://www.baidu.com/']*50
def by_requests():
worker(urls)
def by_urllib2():
jobs = [gevent.spawn(worker, url, True) for url in urls]
gevent.joinall(jobs)
if __name__=='__main__':
from timeit import Timer
t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
print 'by requests: %s seconds'%t.timeit(number=3)
t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
print 'by urllib3: %s seconds'%t.timeit(number=3) |
@kennethreitz, thoughts? |
urllib3 can not access to web pages
import urllib3
http = urllib3.PoolManager()
url = 'http://waptt.com/'
r = http.request('GET', url, retries = 5)
print r.status
404
But I use curl to get to 200 status
curl -I http://waptt.com
HTTP/1.1 200 OK
The text was updated successfully, but these errors were encountered: