Skip to content

Canonical Ordering of Marks in Thai Script #18

Open
@r12a

Description

@r12a

I'm raising this issue to bring attention to a document by Peter Constable which is going through the Unicode committees. It's certainly something i have thought about before, and something that may well apply to other scripts too.

https://www.unicode.org/L2/L2018/18216-thai-order.pdf

It set me thinking that any time you have a sequence that looks the same but is not re-ordered during normalisation you have a problem for matching strings and possibly also security. I suspect that it would be useful to further constrain ordering in fonts so that anything that doesn't follow the rule becomes visually evident to the user. I also suspect that it might be appropriate to have similar rules for other SE Asian scripts.

I noticed the following interesting behaviour across 3 fonts, two of which are from the same stable. In each case the order of characters is:
U+0E01 THAI CHARACTER KO KAI
U+0E34 THAI CHARACTER SARA I​
U+0E38 THAI CHARACTER SARA U​

U+0E01 THAI CHARACTER KO KAI
U+0E35 THAI CHARACTER SARA II​
U+0E48 THAI CHARACTER MAI EK​

U+0E01 THAI CHARACTER KO KAI
U+0E48 THAI CHARACTER MAI EK​
U+0E35 THAI CHARACTER SARA II

Noto Serif Thai
screen shot 2018-08-29 at 16 58 17

A webfont created from an older version of Noto Sans Thai
screen shot 2018-08-29 at 16 59 38

Ayuthaya
screen shot 2018-08-29 at 17 00 23

Some interesting variations on the theme there, in some cases preventing you from seeing that there's a different underlying order of code points, in others preventing you from seeing that you've done something that's not 'normal'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions