Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[questions/qa-what-is-encoding] Font and character sets #523

Closed
xfq opened this issue Aug 30, 2023 · 5 comments
Closed

[questions/qa-what-is-encoding] Font and character sets #523

xfq opened this issue Aug 30, 2023 · 5 comments

Comments

@xfq
Copy link
Member

xfq commented Aug 30, 2023

https://www.w3.org/International/questions/qa-what-is-encoding#fonts

A given font will usually cover a single character set, or in the case of a large character set like Unicode, just a subset of all the characters in the set.

Most fonts are now designed for (a subset of) Unicode rather than covering a full character set, so this sentence could be updated (maybe only the second half of the sentence is enough?).

@andjc
Copy link

andjc commented Aug 31, 2023

Although designed for Unicode, could mean following Unicode specification, using PUA for unencoded scripts, overlaying an existing block for unencoded scripts, or changing character assignments within a block. The last two being examples of pseudo-Unicode solutions.

Although content in some languages is still predominately in legacy encodings.

@xfq
Copy link
Member Author

xfq commented Sep 19, 2023

Although designed for Unicode, could mean following Unicode specification, using PUA for unencoded scripts, overlaying an existing block for unencoded scripts, or changing character assignments within a block. The last two being examples of pseudo-Unicode solutions.

Although content in some languages is still predominately in legacy encodings.

Thanks @andjc. I think this situation is not common for most users, and given that this article is aimed at beginners and this sentence says "usually" (instead of "always"), I think it's OK not to mention it here.

@andjc
Copy link

andjc commented Sep 19, 2023

@xfq that is true, and following that user scenario and considering browsers assume that all fonts are Unicode fonts, it probably would be better to rewrite it simply as:

A given font will cover a subset of Unicode characters.

Although considering most web fonts are subsetted anyway, its probably also true the font may not include all characters in a specific script, nor contain all the glyphs needed for all the languages written in that script.

I know on government sites here it is common to see ransom note effects for community languages written in the Latin script, and there are likely only three, maybe four Latin script fonts available capable of near full support for the Latin script.

@r12a
Copy link
Contributor

r12a commented Nov 29, 2023

When referring to a 'character set' in that text the article is NOT referring to a 'charset', nor a 'code page', etc. It refers to a group of characters used for a particular purpose (as defined in the article), such as all the characters needed to write Malayalam. The Unicode character set referred to is the Unicode Standard repertoire as a whole - and fonts usually address the needs of only a subset of Unicode at a time. A typical font will however support all the characters needed for use of, say, Malayalam - ie. the character set required to write that language.

@xfq
Copy link
Member Author

xfq commented Nov 30, 2023

Fair enough. Closing. Thank you!

@xfq xfq closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants