Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a list of invisible characters to support html 5121 discussion #73

Closed
ghurlbot opened this issue Feb 15, 2024 · 2 comments
Closed
Assignees
Labels
action An action item DONE Action that has been completed but not yet reviewed in telecon

Comments

@ghurlbot
Copy link
Collaborator

Opened by @aphillips via IRC channel #i18n on irc.w3.org

Due: 2024-02-22 (Thursday 22 February)

@ghurlbot ghurlbot added the action An action item label Feb 15, 2024
@r12a
Copy link
Collaborator

r12a commented Feb 20, 2024

Priority items

For me, of those that are missing, these are the highest priority. I suggest possible named entities, derived from the standard Unicode abbreviations.

  • U+061C ARABIC LETTER MARK. &alm;
  • U+180E MONGOLIAN VOWEL SEPARATOR &mvs;
  • U+202D LEFT-TO-RIGHT OVERRIDE &lro;
  • U+202E RIGHT-TO-LEFT OVERRIDE &rlo;
  • U+2066 LEFT-TO-RIGHT ISOLATE &lri;
  • U+2067 RIGHT-TO-LEFT ISOLATE. &rli;
  • U+2068 FIRST STRONG ISOLATE. &fsi;
  • U+2069 POP DIRECTIONAL ISOLATE &pdi;
  • U+202F NARROW NO-BREAK SPACE &nnbsp;
  • U+034F COMBINING GRAPHEME JOINER &cgj;
  • U+180B MONGOLIAN FREE VARIATION SELECTOR ONE &fvs1;
  • U+180C MONGOLIAN FREE VARIATION SELECTOR TWO &fvs2;
  • U+180D MONGOLIAN FREE VARIATION SELECTOR THREE &fvs3;
  • U+180F MONGOLIAN FREE VARIATION SELECTOR FOUR &fvs4;

I'd also like to have &zwsp; in addition to ​for U+200B

Full list

It took a while to figure out how to come up with a reasonable list. The following is from From https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3Adi%3A%5D%5B%3Awhite_space%3A%5D-%5B%3ACn%3A%5D&g=&i=
but with some items manually excised because i felt they were not necessary. I marked in bold the ones for which we already have named entities. Ones i'm not sure about are in italics.

Latin 1 Supplement — Latin-1 punctuation and symbols

  • U+00A0 NO-BREAK SPACE  
  • U+00AD SOFT HYPHEN ­

Combining Diacritical Marks — Grapheme joiner

  • U+034F COMBINING GRAPHEME JOINER

Arabic — Format character

  • U+061C ARABIC LETTER MARK

Hangul Jamo — Old initial consonants

  • U+115F HANGUL CHOSEONG FILLER

Hangul Jamo — Medial vowels

  • U+1160 HANGUL JUNGSEONG FILLER

Ogham — Space

  • U+1680 OGHAM SPACE MARK

Mongolian — Format controls

  • U+180B MONGOLIAN FREE VARIATION SELECTOR ONE
  • U+180C MONGOLIAN FREE VARIATION SELECTOR TWO
  • U+180D MONGOLIAN FREE VARIATION SELECTOR THREE
  • U+180E MONGOLIAN VOWEL SEPARATOR
  • U+180F MONGOLIAN FREE VARIATION SELECTOR FOUR

General Punctuation — Spaces

  • U+2000 EN QUAD
  • U+2001 EM QUAD
  • U+2002 EN SPACE  
  • U+2003 EM SPACE  
  • U+2004 THREE-PER-EM SPACE  
  • U+2005 FOUR-PER-EM SPACE  
  • U+2006 SIX-PER-EM SPACE
  • U+2007 FIGURE SPACE  
  • U+2008 PUNCTUATION SPACE  
  • U+2009 THIN SPACE   AND  
  • U+200A HAIR SPACE   AND   AND part of    (U+0205F U+200A)

General Punctuation — Format character

  • U+200B ZERO WIDTH SPACE ​ AND ​ AND ​ AND ​ AND ​
  • U+200C ZERO WIDTH NON-JOINER ‌
  • U+200D ZERO WIDTH JOINER ‍
  • U+200E LEFT-TO-RIGHT MARK ‎
  • U+200F RIGHT-TO-LEFT MARK ‏
  • U+202A LEFT-TO-RIGHT EMBEDDING
  • U+202B RIGHT-TO-LEFT EMBEDDING
  • U+202C POP DIRECTIONAL FORMATTING
  • U+202D LEFT-TO-RIGHT OVERRIDE
  • U+202E RIGHT-TO-LEFT OVERRIDE
  • U+2060 WORD JOINER ⁠
  • U+2066 LEFT-TO-RIGHT ISOLATE
  • U+2067 RIGHT-TO-LEFT ISOLATE
  • U+2068 FIRST STRONG ISOLATE
  • U+2069 POP DIRECTIONAL ISOLATE

General Punctuation — Separators

  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

General Punctuation — Space

  • U+202F NARROW NO-BREAK SPACE
  • U+205F MEDIUM MATHEMATICAL SPACE   AND part of    (U+205F U+200A)

General Punctuation — Invisible operators

  • U+2061 FUNCTION APPLICATION ⁡ AND ⁡
  • U+2062 INVISIBLE TIMES ⁢ AND ⁢
  • U+2063 INVISIBLE SEPARATOR ⁣ AND ⁣
  • U+2064 INVISIBLE PLUS
  • U+206D ACTIVATE ARABIC FORM SHAPING

CJK Symbols And Punctuation — CJK symbols and punctuation

  • U+3000 IDEOGRAPHIC SPACE

Hangul Compatibility Jamo — Special character

  • U+3164 HANGUL FILLER

Halfwidth And Fullwidth Forms — Halfwidth Hangul variants

  • U+FFA0 HALFWIDTH HANGUL FILLER

Shorthand Format Controls — Shorthand format controls

  • U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
  • U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
  • U+1BCA2 SHORTHAND FORMAT DOWN STEP
  • U+1BCA3 SHORTHAND FORMAT UP STEP

Musical Symbols — Beams and slurs

  • U+1D173 MUSICAL SYMBOL BEGIN BEAM
  • U+1D174 MUSICAL SYMBOL END BEAM
  • U+1D175 MUSICAL SYMBOL BEGIN TIE
  • U+1D176 MUSICAL SYMBOL END TIE
  • U+1D177 MUSICAL SYMBOL BEGIN SLUR
  • U+1D178 MUSICAL SYMBOL END SLUR
  • U+1D179 MUSICAL SYMBOL BEGIN PHRASE
  • U+1D17A MUSICAL SYMBOL END PHRASE

Emoji Variation Selectors - turns on and off colour

  • U+FE0E: VARIATION SELECTOR-15
  • U+FE0F: VARIATION SELECTOR-16

@aphillips aphillips added the DONE Action that has been completed but not yet reviewed in telecon label Feb 20, 2024
@ghurlbot
Copy link
Collaborator Author

Closed by @aphillips via IRC channel #i18n on irc.w3.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action An action item DONE Action that has been completed but not yet reviewed in telecon
Projects
None yet
Development

No branches or pull requests

3 participants