Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepalive.HTTPResponse.read decodes response body as UTF-8 #2

Closed
uholzer opened this issue Nov 22, 2015 · 13 comments
Closed

keepalive.HTTPResponse.read decodes response body as UTF-8 #2

uholzer opened this issue Nov 22, 2015 · 13 comments
Assignees
Labels

Comments

@uholzer
Copy link

uholzer commented Nov 22, 2015

I would expect that the keepalive.HTTPResponse.read (overriding httplib.HTTPResponse.read) method returns str objects. Instead, it returns unicode objects. In fact, it decodes the body as UTF-8 in all cases. It shouldn't do that, because the encoding of the body may be different or the body may not even be text.

@bgw
Copy link

bgw commented Nov 22, 2015

Good point. However, it shouldn't return a str on Python 3. it should return a bytes object, which is an alias for str on Python 2.

@wikier
Copy link
Owner

wikier commented Nov 23, 2015

That's exactly the background of the issue, see commit d202ed8

@wikier
Copy link
Owner

wikier commented Nov 23, 2015

I'd prefer not to have version-specific code there; so any idea, @uholzer or @bgw?

@wikier wikier added the bug label Nov 23, 2015
@wikier wikier self-assigned this Nov 23, 2015
@joernhees
Copy link

as hmm, can't we simply revert d202ed8 and replace all self._buf = '' with self._buf = b'' (similar with data = "" below d202ed8)?

@uholzer
Copy link
Author

uholzer commented Nov 23, 2015

Python 2.7 does know the b prefix for string literals. Python 2.6 does not seem to do so.

@uholzer
Copy link
Author

uholzer commented Nov 23, 2015

If you are very clever, you probably can do without any literals:

        s = self._raw_read(amt) if self._rbuf is None else self._rbuf + self._raw_read(amt)
        self._rbuf = None

Once you do have a s of the correct type, you can also do magic like type(s)().

Any more concrete ideas?

@uholzer
Copy link
Author

uholzer commented Nov 23, 2015

Or you can put some version-specific code at the top to get the right types:

bytes_type = bytes if PYTHON3 else str
unicode_type = str if PYTHON3 else bytes

where PYTHON3 is something Idon't know.

Then use them in the code:

        s = self._rbuf + self._raw_read(amt)
        self._rbuf = bytes_type()

@wikier
Copy link
Owner

wikier commented Nov 24, 2015

@uholzer, personally I'd prefer to avoid such version specific code there. So I like more @joernhees' s proposal of using self._buf = b'' and raising to 2.7 the minimum version. I'm not aware of many people using SPARQLWrapper in 2.6 anyway; @joernhees would be that a problem for RDFLib?

wikier added a commit that referenced this issue Nov 24, 2015
@wikier
Copy link
Owner

wikier commented Nov 24, 2015

Patch available as 0.5.dev0; could you please make your tests, @uholzer and @joernhees? In the meantime I'll see what we can do in RDFLib/sparqlwrapper#70.

@joernhees
Copy link

python 2.6 supports the b"bla" strings: https://docs.python.org/2/whatsnew/2.6.html#pep-3112-byte-literals

Python 2.6.9 (unknown, Jul 14 2015, 19:46:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b'foobar'
'foobar'

so that will only restrict to >py2.5 and still supporting 2.5 seems like madness anyhow...

that said i updated RDFLib/rdflib#550 to use the most recent keepalive package from github: https://travis-ci.org/RDFLib/rdflib/builds/92929930 (fixes py2 but still breaks something on py3)

@joernhees
Copy link

i think it fails on py3 as headers etc. inserted shouldn't be unicode strings either: e569c31#diff-f1079ce199a23a37972b8b644516f133R311

wikier added a commit that referenced this issue Nov 25, 2015
@wikier
Copy link
Owner

wikier commented Nov 25, 2015

@joernhees you test faced just by luck another issue (see commit 018f118 for further context) with POST requests 😞

Since it fetches the HEAD from github, I restarted the build https://travis-ci.org/RDFLib/rdflib/builds/92929930 and now it looks to work for all python versions.

So it looks to be ready. But I would prefer to have @uholzer also verifying the fix before getting out a new release.

@joernhees
Copy link

👍 thanks for digging through all this :-/

wikier added a commit that referenced this issue Nov 27, 2015
@wikier wikier closed this as completed Nov 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants