Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not lowercase value of lang attribute #579

Closed
jmooring opened this issue May 22, 2023 · 6 comments · Fixed by #580
Closed

Do not lowercase value of lang attribute #579

jmooring opened this issue May 22, 2023 · 6 comments · Fixed by #580

Comments

@jmooring
Copy link
Contributor

TLDR: Do not lowercase the value of the lang attribute.


The value of the lang attribute is currently converted to lowercase, which is acceptable per RFC 5646 § 2.1.1:

At all times, language tags and their subtags, including private use and extensions, are to be treated as case insensitive

The HTML spec for the lang attribute states that the value must conform to RFC 5646, but does not provide guidance regarding case sensitivity. So we refer back to RFC 5646 (lowercase is acceptable).

Then we get to the MDN documentation for the lang attribute, which also refers to RFC 5646, but the examples are mixed case, which leads people to believe that the lang attribute is case sensitive.

And, unfortunately, RFC 5646 § 2.1.1 includes this recommendation:

The format of subtags in the registry is RECOMMENDED as the form to use in language tags. This format generally corresponds to the common conventions for the various ISO standards from which the subtags are derived.

These conventions include:

o ISO639-1 recommends that language codes be written in lowercase ('mn' Mongolian).

o ISO15924 recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).

o ISO3166-1 recommends that country codes be capitalized ('MN' Mongolia).

So the RFC 5646 requirement and recommendation are different. Apparently there is at least one JS library that performs a case sensitive comparison; it follows the recommendation, not the requirement.

@tdewolff
Copy link
Owner

I understand, my point of view is that the recommendation is to use the ISO formatting (with uppercase), but that the value of the attribute is case-insensitive. In a sense that, even though a value is case-insensitive, it doesn't mean it has to be all lowercase, you could recommend any type of uppercase or mixed-case since it doesn't matter. Are you sure this shouldn't be fixed in the library consuming the HTML? To me it sounds like they don't adhere to the HTML spec...

@jmooring
Copy link
Contributor Author

The HTML spec refers to RFC 5646, which recommends mixed case.

"So you're telling me that we have to modify our JS library because we're following the RFC 5646 recommendations?"

That's an argument that neither of us will win.

@tdewolff
Copy link
Owner

tdewolff commented May 23, 2023

Yes, I agree that the RFC should be followed, but the JS library does not follow the HTML specification (the RFC is used WITHIN the attribute value, but the case information is lost due to the HTML context in which it is used). You can't compare case-sensitively the value of an attribute that is case-insensitive! You're not adhering to the HTML specification, since it explicitly states that it is case-insensitive, i.e. lang=nl-NL is exactly the same as lang=NL-NL and should be treated equal. You can write it with the correct case according to the RFC, but as soon as you use it as the value of a language attribute, the case information is lost.

I mean, you can't say that #FF0000 is not red because the recommendation is to use lowercase #ff0000, it is case insensitive, so both are exactly equal.

From a more pragmatic viewpoint, we can get rid of lowercasing it as it has little benefit, but I feel like I keep losing battles because other libraries don't feel like following the HTML specification...:-(

@jmooring
Copy link
Contributor Author

I hear you, and I don't have a strong opinion either way. Do what you think is best.

@ZhangChengLin
Copy link

Don't fight, these things can be handed over to users or users to make choices, isn't it a better option? There are indeed HTML specifications, and there are certain factual specifications. Why do they exist? There is a certain demand. Tools are designed to solve needs. Are tools that solve various needs comprehensive? Norms are good, and we are willing to abide by the norms, so that everyone's behaviors are consistent and make progress together, instead of sticking to the position and making sacrifices. Whether the future is uppercase or lowercase, that is the job of norm makers. We may only We need to consider whether our tools are easy to use, whether the functions are comprehensive, and whether they meet actual needs. Is it right?

@tdewolff
Copy link
Owner

The thing is that specification are made exactly to make things easier for everybody. But I think pragmatism beats correctness, so I've merged the PR! Thanks for the effort guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants