Open
Description
The Hypothesis strategies now shipping with Hyperlink are producing this error occasionally in Klein:
Traceback (most recent call last):
324
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/klein/test/test_request_compat.py", line 74, in test_uri
325
def test_uri(self, url: DecodedURL) -> None:
326
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hypothesis/core.py", line 1163, in wrapped_test
327
raise the_error_hypothesis_found
328
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hyperlink/hypothesis.py", line 321, in decoded_urls
329
return DecodedURL(draw(encoded_urls()))
330
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hyperlink/_url.py", line 2046, in __init__
331
self.host, self.userinfo, self.path, self.query, self.fragment
332
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hyperlink/_url.py", line 2179, in path
333
for p in self._url.path
334
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hyperlink/_url.py", line 2179, in <listcomp>
335
for p in self._url.path
336
File "/home/runner/work/klein/klein/.tox/coverage-py37-tw192/lib/python3.7/site-packages/hyperlink/_url.py", line 766, in _percent_decode
337
return unquoted_bytes.decode(subencoding)
338
builtins.UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
339
340
klein.test.test_request_compat.HTTPRequestWrappingIRequestTests.test_uri
341
Activity
wsanchez commentedon Jan 25, 2021
It would be helpful to catch this error and print the URL that produced it, so one might see what data is tripping us up.
wsanchez commentedon Jan 29, 2021
Here are some failing examples:
wsanchez commentedon Jan 29, 2021
…which one can reproduce in the REPL:
wsanchez commentedon Jan 29, 2021
@glyph @mahmoud I'm curious if you think this may suggest a bug in Hyperlink… that we have allowed the creation of an EncodedURL which cannot be decoded…?
glyph commentedon Jan 29, 2021
@wsanchez Yes.
glyph commentedon Jan 29, 2021
I think DecodedURL maybe has a bit of leeway with a URL like this to mangle it or make it not completely round-trip-able through every API. Browsers have to cope with this kind of a mess, and they definitely do some mangling. For example, if you try pasting
https://example.com/%80é
into Safari or Chrome, you gethttps://example.com/%80%C3%A9
. Now, granted, that's a bit more like anEncodedURL
, but you can deliver the percent-encoded text directly to the application in that case. Because if you manually delete the %80, you'll notice that you gethttps://example.com/é
back again, visually.glyph commentedon Jan 29, 2021
If you were to manipulate a busted URL like this, or manually create a copy via moving strings with DecodedURL, you'd get
%2580%25C3%25A9
- but I think that's fine. Maybe there should be a switch about whether to raise or mangle on encoding errors when you create the object?