-
Notifications
You must be signed in to change notification settings - Fork 32
Add support for IRIs and RFC 3987 #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -162,6 +162,6 @@ def encode_component(uri_component, encoding): | |||
or (byte_ord < 128 and byte.decode() in misc.NON_PCT_ENCODED)): | |||
encoded_uri.extend(byte) | |||
continue | |||
encoded_uri.extend('%{0:02x}'.format(byte_ord).encode()) | |||
encoded_uri.extend('%{0:02x}'.format(byte_ord).encode().upper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that percent-encoded bytes are normalized by default. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Validations, I think we want a URI
as we can make assertions about that. In that sense I would guess someone instantiates an IRI
, converts it to a URI
and then performs validation.
src/rfc3986/uri.py
Outdated
ref.encoding = encoding | ||
return ref | ||
|
||
class URIMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions:
- Should this be in its own "internal" (e.g.,
_mixin
) namespace to avoid folks trying to use this somehow? - Do we want an old-style class on Python 2? If not should this be
class URIMixin(object):
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the internal namespace, I'll implement that. And I can change this to a new-style class.
encoding, | ||
) | ||
|
||
def encode(self, idna_encoder=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, I like the idea of users being able to provide their own encoder, but I wonder what the value of that is. There's really only one package I'm aware that does idna correctly (https://github.com/kjd/idna) which also handles IDNA 2008.
I admit, it's dreadfully slow to import. But do we really want to start with 2003 and work our way forwards by building our own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the idna module as an optional dependency because urllib3 doesn't directly depend on it, urllib3 only needs to be able to parse IRIs, it doesn't do the encoding itself.
Maybe we can try to import it and if it's not there and we need it to encode the host correctly we error out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
try:
import idna
except ImportError:
idna = None
if idna_encoder is None and idna is None and host_needs_encoding:
raise MissingDependencyError("Need idna installed in order to encode {}".format(host))
I could live with. It's not great but it could work. Also we could fallback on to IDNA 2003 on Py3 if we wanted to. I'd also like to see an extra added for that though named something like with_idna2008
.
if self.port is not None: | ||
authority += ":" + str(self.port) | ||
|
||
return uri.URIReference(self.scheme, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to return this unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What conditions would we be unable to encode an IRI into a URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no encoder is provided? I think I've also seen idna.encode
fail but I could be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, was wondering if you wanted it to fail if we fail to produce a valid URI but that's not necessary as the user can validate after we give the user a URI.
So I think we're just waiting on the reoganization, right? I kind of want |
Yeah, I can try to get to your review comments today. What are your thoughts on putting everything to do with IRIs into a private module until promotion ( |
Eh, I don't care about hiding the module, just adding warning signs of sorts |
Would adding some warnings to users in documentation and the changelog be good enough here? |
Definitely |
|
Yeah this commit is definitely not ready, wish there was a way to put the PR in a state that it wouldn't alert reviewers until you're actually ready. :) I pushed so i could access these changes on a different machine. |
I think draft PRs do that but I don't think you from PR -> Draft PR |
I didn't know about draft PRs, thanks! |
Codecov Report
@@ Coverage Diff @@
## master #50 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 11 13 +2
Lines 711 797 +86
=====================================
+ Hits 711 797 +86
Continue to review full report at Codecov.
|
This looks good to me, @sethmlarson! Feel free to merge/release when you're ready |
Thanks @sigmavirus24! Did you want me to do the release (would need PyPI maintainer) or are you willing to? I'll update the release date to today in the docs before merge. I really appreciate you doing these reviews despite being away from OSS, by the way. |
I'll add you to the package @sethmlarson :) And I really want to see urllib3 succeed so I'm happy to make tiny exceptions |
@sigmavirus24 Thanks much! I'll make a release today. :) |
Closes #21, closes #31.
Adds a separate class
IRIReference
. What's left to be decided:Validator
and normalizers withIRIReference
instances?idna
module?