Skip to content

Add support for IRIs and RFC 3987 #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 20, 2019
Merged

Add support for IRIs and RFC 3987 #50

merged 7 commits into from
Apr 20, 2019

Conversation

sethmlarson
Copy link
Member

Closes #21, closes #31.

Adds a separate class IRIReference. What's left to be decided:

  • What is the best interface for using Validator and normalizers with IRIReference instances?
  • Should we include a default IDNA 2003 encoder or should we rely on users providing their own via the idna module?

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@@ -162,6 +162,6 @@ def encode_component(uri_component, encoding):
or (byte_ord < 128 and byte.decode() in misc.NON_PCT_ENCODED)):
encoded_uri.extend(byte)
continue
encoded_uri.extend('%{0:02x}'.format(byte_ord).encode())
encoded_uri.extend('%{0:02x}'.format(byte_ord).encode().upper())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that percent-encoded bytes are normalized by default. :)

Copy link
Collaborator

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Validations, I think we want a URI as we can make assertions about that. In that sense I would guess someone instantiates an IRI, converts it to a URI and then performs validation.

ref.encoding = encoding
return ref

class URIMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions:

  1. Should this be in its own "internal" (e.g., _mixin) namespace to avoid folks trying to use this somehow?
  2. Do we want an old-style class on Python 2? If not should this be class URIMixin(object):?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the internal namespace, I'll implement that. And I can change this to a new-style class.

encoding,
)

def encode(self, idna_encoder=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, I like the idea of users being able to provide their own encoder, but I wonder what the value of that is. There's really only one package I'm aware that does idna correctly (https://github.com/kjd/idna) which also handles IDNA 2008.

I admit, it's dreadfully slow to import. But do we really want to start with 2003 and work our way forwards by building our own?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the idna module as an optional dependency because urllib3 doesn't directly depend on it, urllib3 only needs to be able to parse IRIs, it doesn't do the encoding itself.

Maybe we can try to import it and if it's not there and we need it to encode the host correctly we error out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

try:
    import idna
except ImportError:
    idna = None


if idna_encoder is None and idna is None and host_needs_encoding:
     raise MissingDependencyError("Need idna installed in order to encode {}".format(host))

I could live with. It's not great but it could work. Also we could fallback on to IDNA 2003 on Py3 if we wanted to. I'd also like to see an extra added for that though named something like with_idna2008.

if self.port is not None:
authority += ":" + str(self.port)

return uri.URIReference(self.scheme,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return this unconditionally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What conditions would we be unable to encode an IRI into a URI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no encoder is provided? I think I've also seen idna.encode fail but I could be wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, was wondering if you wanted it to fail if we fail to produce a valid URI but that's not necessary as the user can validate after we give the user a URI.

@sigmavirus24
Copy link
Collaborator

So I think we're just waiting on the reoganization, right? I kind of want IRI support to be a "beta" feature to allow us to play with the API as we find necessary but 🤷‍♂️ I don't know how to do that appropriately

@sethmlarson
Copy link
Member Author

Yeah, I can try to get to your review comments today. What are your thoughts on putting everything to do with IRIs into a private module until promotion (_iri, _beta?) or doing a pre-release with a beta flag? Either way urllib3 can still bundle and we can make changes to the IRI interface if we find any problems down the line.

@sigmavirus24
Copy link
Collaborator

Eh, I don't care about hiding the module, just adding warning signs of sorts

@sethmlarson
Copy link
Member Author

Would adding some warnings to users in documentation and the changelog be good enough here?

@sigmavirus24
Copy link
Collaborator

Definitely

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@sigmavirus24
Copy link
Collaborator

@sethmlarson
Copy link
Member Author

Yeah this commit is definitely not ready, wish there was a way to put the PR in a state that it wouldn't alert reviewers until you're actually ready. :)

I pushed so i could access these changes on a different machine.

@sigmavirus24
Copy link
Collaborator

I think draft PRs do that but I don't think you from PR -> Draft PR

@sethmlarson
Copy link
Member Author

I didn't know about draft PRs, thanks!

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@codecov-io
Copy link

codecov-io commented Apr 20, 2019

Codecov Report

Merging #50 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #50   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          11     13    +2     
  Lines         711    797   +86     
=====================================
+ Hits          711    797   +86
Impacted Files Coverage Δ
src/rfc3986/normalizers.py 100% <100%> (ø) ⬆️
src/rfc3986/iri.py 100% <100%> (ø)
src/rfc3986/exceptions.py 100% <100%> (ø) ⬆️
src/rfc3986/abnf_regexp.py 100% <100%> (ø) ⬆️
src/rfc3986/__init__.py 100% <100%> (ø) ⬆️
src/rfc3986/uri.py 100% <100%> (ø) ⬆️
src/rfc3986/misc.py 100% <100%> (ø) ⬆️
src/rfc3986/api.py 100% <100%> (ø) ⬆️
src/rfc3986/_mixin.py 100% <100%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a87fd6...f10931d. Read the comment docs.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@sigmavirus24
Copy link
Collaborator

This looks good to me, @sethmlarson! Feel free to merge/release when you're ready

@sethmlarson
Copy link
Member Author

sethmlarson commented Apr 20, 2019

Thanks @sigmavirus24! Did you want me to do the release (would need PyPI maintainer) or are you willing to? I'll update the release date to today in the docs before merge.

I really appreciate you doing these reviews despite being away from OSS, by the way.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@sethmlarson sethmlarson merged commit df21c4b into python-hyper:master Apr 20, 2019
@sethmlarson sethmlarson deleted the iri branch April 20, 2019 14:48
@sigmavirus24
Copy link
Collaborator

I'll add you to the package @sethmlarson :) And I really want to see urllib3 succeed so I'm happy to make tiny exceptions

@sethmlarson
Copy link
Member Author

@sigmavirus24 Thanks much! I'll make a release today. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support idna :) New coverage release breaks the tests on Python 3.2
3 participants