Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urllib3 HTTPHeaderDict and proxy_from_url problem #632

Closed
anatasiajp opened this issue May 28, 2015 · 14 comments
Closed

Urllib3 HTTPHeaderDict and proxy_from_url problem #632

anatasiajp opened this issue May 28, 2015 · 14 comments

Comments

@anatasiajp
Copy link

It is a little bit redundant if I reexplain this problem, here is my post about that problem include code of the script: http://www.prxbx.com/forums/showthread.php?tid=2188&pid=18453#pid18453

The main thing is if I do any proxy_from_url or ProxyManager too, that problem appear: http://pastebin.com/g2mK1CW4
And if I use HTTPHeaderDict, i cannot use proxy, and if I use self.headers from http.server, I CAN use proxy, but ALL other page without proxy cannot even load.
The only and best possible at this time I found is use self.headers from http.server with proxy and HTTPHeaderDict with normal page:

            if "ghacks" in self.host:
                self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/')
                headers = self.headers

Code: http://www.prxbx.com/forums/showthread.php?tid=2188&pid=17911#pid17911

Need:
pip install colorama
pip install pyOpenSSL
pip install urllib3

and Python 3.4.2
urllib3 1.10.4

@anatasiajp
Copy link
Author

This is how the author patch that problem, but a patch is a patch, the headers feature become so weird after that patch:

        # Couldn't figure out why but using urllib3._collections.HTTPHeaderDict and apply proxy cause below error:
        #   File "C:\Python34\lib\http\client.py", line 1067, in putheader
        #     value = b'\r\n\t'.join(values)
        #   TypeError: sequence item 0: expected a bytes-like object, tuple found

        # urllib3._collections.HTTPHeaderDict.extend() method may miss some values
        # email.message.Message.keys() vs email.message.Message.items()

        # Instead:
        # headers = urllib3._collections.HTTPHeaderDict(self.headers)

        # Use:
        # headers = urllib3._collections.HTTPHeaderDict()
        # for key, value in self.headers.items():
        #     headers.add(key, value)
        # self.headers, headers = headers, self.headers

        # Below code in connectionpool.py expect the headers to has a copy() and update() method
        # That's why we can't use self.headers directly when call pool.urlopen()
        #
        # Merge the proxy headers. Only do this in HTTP. We have to copy the
        # headers dict so we can safely change it without those changes being
        # reflected in anyone else's copy.
        # if self.scheme == 'http':
        #     headers = headers.copy()
        #     headers.update(self.proxy_headers)
        #

        # This is a ugly hack
        def copy():
            return self.headers
        self.headers.copy = copy
        self.headers.update = urllib3._collections.HTTPHeaderDict.update

@shazow
Copy link
Member

shazow commented May 28, 2015

It is a little bit redundant if I reexplain this problem, here is my post about that problem include code of the script: http://www.prxbx.com/forums/showthread.php?tid=2188&pid=18453#pid18453

Hey, thanks for giving us a pointer but I can't figure out what the problem is. Could you explain what's broken in urllib3 and how you suggest on fixing it?

@anatasiajp
Copy link
Author

The code is quite long so I don't know what is the best way to show you why that happen, so I will try my best.

urllib3.PoolManager.urlopen headers= must be headers format from urllib3._collections.HTTPHeaderDict, right, something like { blah blah }
The problem is when I use "proxy_from_url" like

self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/')
r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=headers, retries=1, redirect=False, preload_content=False, decode_content=False)

The problem pop: http://pastebin.com/g2mK1CW4

I tried to fix it by replacing headers=headers with headers=self.headers, that self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/') work, but problem is even bigger, all page without proxy simply cannot load ( maybe because the format of urllib3.PoolManager.urlopen headers= must be headers format from urllib3._collections.HTTPHeaderDict, right, something like { blah blah } ), but self.headers is the format of http.server headers, something like:

Accept-Encoding: gzip
Referer: qqqqqqqqqqq

But why that self.headers work with self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/') in my case, maybe there is something is missing in ProxyManager that cannot use urllib3._collections.HTTPHeaderDict's header format but only plain format like self.headers from http.server, but PoolManager that mean the pool without self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/') work only with urllib3._collections.HTTPHeaderDict's header format, but not plain format, but repeat ProxyManager or PoolManager with self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/') work with plain format but not urllib3._collections.HTTPHeaderDict's header format.

  File "D:\Downloads\Compressed\AFProxy_py 0.4\AFProxy.py", line 242, in do_METH
OD
    retries=1, redirect=False, preload_content=False, decode_content=False)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\poolmanag
er.py", line 276, in urlopen
    return super(ProxyManager, self).urlopen(method, url, redirect=redirect, **k
w)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\poolmanag
er.py", line 159, in urlopen
    response = conn.urlopen(method, url, **kw)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectio
npool.py", line 544, in urlopen
    body=body, headers=headers)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectio
npool.py", line 349, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Python34\lib\http\client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "C:\Python34\lib\http\client.py", line 1121, in _send_request
    self.putheader(hdr, value)
  File "C:\Python34\lib\http\client.py", line 1067, in putheader
    value = b'\r\n\t'.join(values)
TypeError: sequence item 0: expected a bytes-like object, tuple found

I did a work arround like this, problem is solved but it is weird:

headers = urllib3's HTTPDict format
            if "ghacks" in self.host:
                self.pool = urllib3.proxy_from_url('http://127.0.0.1:7777/')
                headers = self.headers #set the format to http.server format
r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=headers, retries=1, redirect=False, preload_content=False, decode_content=False) # still use urllib3's HTTPDict format with normal request mean no proxy request.

@shazow
Copy link
Member

shazow commented May 28, 2015

So you're saying PoolManager's header argument does not accept a urllib3._collections.HTTPHeaderDict object? What version of urllib3 are you using?

import urllib3
http = urllib3.PoolManager()
headers = urllib3._collections.HTTPHeaderDict()
headers.add('Foo', 'bar')
r = http.request('GET', 'http://httpbin.org/headers', headers=headers)
print r.data
""" ->
{
  "headers": {
    "Accept-Encoding": "identity",
    "Foo": "bar",
    "Host": "httpbin.org"
  }
}
"""

Can you reproduce whatever bug you're experiencing with just urllib3 like above?

@anatasiajp
Copy link
Author

Yeah, with your code, i'm using urllib3 newest version 1.10.4 and Python 3.4.3, the error is the same:

import urllib3
http = urllib3.PoolManager()
http = urllib3.proxy_from_url('http://127.0.0.1:7777/')
headers = urllib3._collections.HTTPHeaderDict()
headers.add('Foo', 'bar')
r = http.request('GET', 'http://httpbin.org/headers', headers=headers)
print (r.data)

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
Traceback (most recent call last):
  File "C:/Python34/Lib/site-packages/urllib3/bugfix.py", line 6, in <module>
    r = http.request('GET', 'http://httpbin.org/headers', headers=headers)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\request.py", line 68, in request
    **urlopen_kw)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\request.py", line 81, in request_encode_url
    return self.urlopen(method, url, **urlopen_kw)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\poolmanager.py", line 276, in urlopen
    return super(ProxyManager, self).urlopen(method, url, redirect=redirect, **kw)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\poolmanager.py", line 159, in urlopen
    response = conn.urlopen(method, url, **kw)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectionpool.py", line 349, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Python34\lib\http\client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "C:\Python34\lib\http\client.py", line 1121, in _send_request
    self.putheader(hdr, value)
  File "C:\Python34\lib\http\client.py", line 1067, in putheader
    value = b'\r\n\t'.join(values)
TypeError: sequence item 0: expected a bytes-like object, tuple found
>>> 

@shazow
Copy link
Member

shazow commented May 28, 2015

Ah perfect, thank you. :) That is a bug indeed, then.

@anatasiajp
Copy link
Author

Thank you, i'm waiting for the next release :)

shazow added a commit that referenced this issue May 28, 2015
@shazow
Copy link
Member

shazow commented May 28, 2015

@cattleyavns I wrote a test for this in bc61ee4 which seems to pass. Are you sure it's not a problem with your proxy returning invalid values/encoding?

@shazow
Copy link
Member

shazow commented May 28, 2015

Actually nevermind, it does seem to encode it incorrectly. Investigating.

@shazow
Copy link
Member

shazow commented May 28, 2015

Right, looks like we're only using HTTPHeaderDict for returning responses—we're not using it for encoding requests. No great reason for this yet.

For now, you can pass in headers as dicts or lists of tuples.

@anatasiajp
Copy link
Author

I still get the same problem, I tried this commit: 838d23a
https://raw.githubusercontent.com/shazow/urllib3/838d23aff90a933cc9e6891dfdce6666f417bed4/urllib3/_collections.py

import urllib3
http = urllib3.PoolManager()
http = urllib3.proxy_from_url('http://127.0.0.1:7777/')
headers = urllib3._collections.HTTPHeaderDict()
headers.add('Foo', 'bar')
r = http.request('GET', 'http://httpbin.org/headers', headers=headers)
print (r.data)

I copy all its content to my _collections.py in my C Python folder.

But still:

TypeError: sequence item 0: expected a bytes-like object, tuple found

The way I update my _collection.py is wrong ?

@shazow
Copy link
Member

shazow commented May 28, 2015

@cattleyavns See latest comments since then, there's no fix yet for allowing HTTPHeaderDict in requests. You'll need to pass in headers as a dict or list of tuples as a work-around for now.

@shazow
Copy link
Member

shazow commented Jul 21, 2015

Should be fixed by #679, please re-open if not. :)

@shazow shazow closed this as completed Jul 21, 2015
@anatasiajp
Copy link
Author

Thanks, I will re-open if this problem still exist :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants