Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4562: add support for internationalized email addresses #5799

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Fix #4562: add support for internationalized email addresses #5799

wants to merge 21 commits into from

Conversation

aphillips
Copy link
Contributor

@aphillips aphillips commented Aug 7, 2020

Changes the ABNF and surrounding text to match what current browser implementations do, which includes support for non-ASCII characters on both the left and right side of the email address. Discussion in #4562 includes the genesis of these changes.

  • At least two implementers are interested (and none opposed):
  • Tests are written and can be reviewed and commented upon at:
  • Implementation bugs are filed:
    • Chrome: …
    • Firefox: …
    • Safari: …

(See WHATWG Working Mode: Changes for more details.)


💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 8:00 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.10.3</center>
</body>
</html>

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

aphillips and others added 16 commits May 12, 2020 07:43
…Infra

Fix #5067

Adds the (new) definition of string equality using terms 'is' or 'identical to' from Infra, replacing the term 'case-sensitive'. The phrase "case sensitive" is retained in the text where it provides clarity (particularly in sections with names like "case-sensitivity of XXX").

I may need to go back and define an inverse term: there are a number of places where "case-sensitive" was used with "unordered set of unique space-separated tokens". A number of these remain in the commit.
- Restore the case-sensitive head
- Add a note about string comparison/case-sensitivity
- Add a reference to charmod-norm
- Fix references to Infra string-is
…t type=email mirroring what browsers currently do.

Note that this change removes the reference regular expression for valid email address, since it would be difficult to write such a regex and have it approximate correctness.
@aphillips
Copy link
Contributor Author

Crumpets. I forgot to rebase first.

@domenic domenic added addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest topic: forms labels Aug 10, 2020
@aphillips
Copy link
Contributor Author

@domenic is it really needs implementer interest if it documents what browsers currently actually do? I see that the issue has this tag, but I don't think it's accurate.

@domenic
Copy link
Member

domenic commented Aug 11, 2020

I wasn't under the impression that this was what browsers do. For example it doesn't seem to match https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/core/html/forms/email_input_type.cc;l=47-53;drc=1dea8d7ba92cfec5e553d3a4699115b1dc8dd707 . Web platform tests would certainly help, in any case.

@annevk
Copy link
Member

annevk commented Aug 14, 2020

As I documented in the issue browsers currently have somewhat different behavior, but none seem to allow non-ASCII before @, so that alone would be a significant change that is subject to https://whatwg.org/working-mode#changes. I do think it's a change worth making however.

@aphillips
Copy link
Contributor Author

@domenic, @annevk You're right. I was having a brain cramp--I spent time testing the right hand side and neglected the left-hand side (which we are changing here).

Would it make sense to break this into two PRs? One to fix the current misalignment of the spec with support for IDNs and one to fix 4562?

@domenic domenic closed this Aug 15, 2020
@domenic
Copy link
Member

domenic commented Aug 15, 2020

(clicked the close button by mistake)I think that would be good, although I'd suggest before writing any spec text, writing web platform tests.

@domenic domenic reopened this Aug 15, 2020
"+" / "-" / "/" / "=" / "?" / "^" /
"_" / "`" / "{" / "|" / "}" / "~" /
%80-D7FF / %E000-10FFFF ; or any non-ASCII Unicode
domain = &lt; a "valid host string", see URL section 3.4 &gt;</code></pre>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid host string rule is not compatible with RFC 5321, section 4.1.2. In SMTP, IPv4 addresses must be wrapped in square brackets, e.g. mailbox@[10.0.0.1].

I just verified that Postfix enforces this rule.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently published spec forbids the use of IP addresses in the domain part anyway. We could just keep recommending that. If we do, then this is sufficient, I think:

Suggested change
domain = &lt; a "valid host string", see URL section 3.4 &gt;</code></pre>
domain = &lt; a "valid domain string", see URL section 3.4 &gt;</code></pre>

https://url.spec.whatwg.org/#valid-domain (which is correct for IDNA, but also see whatwg/url#245).

@hsivonen
Copy link
Member

hsivonen commented May 6, 2024

Currently, the spec text evidently isn't clear enough for the domain part, since Blink, Gecko, and WebKit all do different things. My reading of the current spec text is that it best supports what Blink does. However, https://www.ietf.org/archive/id/draft-ietf-emailcore-as-09.html#name-use-and-validation-in-html- ends up attributing what appears to be just the WebKit interpretation as a general HTML problem.

The current PR does not make things as clear as I'd like from implementor perspective.

I think it would be good as basis for discussion if the upcoming PR update took a position on each of the questions I listed on the issue. (As noted there, I think specifying Blink's general approach of replacing the domain with the ToASCII form when the ToASCII operation does not raise errors is the least risky way forward. I do think tweaks are needed to the Blink specifics. In particular, I believe UTS 46 processing needs to be run in the non-transitional mode like Blink, too, does for URLs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest topic: forms
Development

Successfully merging this pull request may close these issues.

None yet

6 participants