Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

UTN#11 versus OpenType Myanmar shaping #43

Open
simoncozens opened this issue Jan 21, 2021 · 8 comments
Open

UTN#11 versus OpenType Myanmar shaping #43

simoncozens opened this issue Jan 21, 2021 · 8 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@simoncozens
Copy link
Collaborator

UTN#11 ("Representing Myanmar in Unicode") specifies a suggested canonical order of storing syllabic elements, as well as some fairly sensible constraints on the syllable structure. The OpenType Myanmar shaper, however, performs fairly minimal reordering - kinzi, medial ra, and pre-base vowels go before the consonant, A VBlw becomes VBlw A. OpenType also has a very loosely constrained syllabic structure.

The upshot of this is that equivalent sequences are not reordered and so produce different output:

$ hb-shape ~/work/myanmar/Noto/NSM-2.ttf -u '1000 102B 1036'
[ka=0+1124|_tall_aa=0+267|anusvara=0@206,374+0]

$ hb-shape ~/work/myanmar/Noto/NSM-2.ttf -u '1000 1036 102B'
[ka=0+1299|anusvara=0@-202,0+0|_tall_aa=0+267]

It would make sense for the shaper behaviour to match the syllable pattern of UTN11, and perform a strong canonical reordering.

@tiroj
Copy link

tiroj commented Jan 21, 2021

Although Microsoft opted to maintain its dedicated Myanmar shaper, my understanding is that the cluster model and reordering used is close to that of USE, and Andrew Glass was at one stage considering passing Myanmar to USE*

I sort of agree that it makes sense for a dedicated shaping engine to perform ordering according to UTN#11, but in general glyph ordering for display is often usefully less strict than character order normalisation, and a generic cluster model such as that employed by USE needs to be quite flexible. We’ve been bitten plenty of times by canonical ordering being too strict and then encountering real-world exceptions to that ordering.


*Which is why this test cluster shows up in USE presentations:
image

@lianghai
Copy link
Contributor

… The upshot of this is that equivalent sequences …

You need to explain in what sense these are “equivalent”.

@simoncozens
Copy link
Collaborator Author

Andrew Glass was at one stage considering passing Myanmar to USE.

Unfortunately it looks like this was tried but rejected. (harfbuzz/harfbuzz#1773)

I say "unfortunately" because I found another discrepancy between actual usage and the Microsoft spec. The sequence medial la / medial ha does occur in Mon, but is disallowed by current shapers. This is because MS has both Medial Ha and Mon La in the same (MH) group, and only allows one consecutive MH in a cluster.

Not sure how to fix this: one option is to move medial la to its own group; another is to allow MH MH? instead of MH within the cluster definition.

A third, and potentially more future-proof, solution would be to reopen the USE/mym3 idea.

@simoncozens
Copy link
Collaborator Author

@ohbendy, can you definitively confirm that medial la-ha is a real thing? I only ask because in UTN11, @mhosken has [U+103E, U+1060] in a mutually exclusive "Medial H" group, just like in the Microsoft cluster definition. If la and ha can both appear in a cluster, then both sources will need to change.

@ohbendy
Copy link

ohbendy commented Sep 20, 2021

Ha yeah I checked these recently. Apparently medial La and medial Ha have never been possible in Mon language, but Old Burmese has the sequence 1039 101C 103E (so the medial La isn't the Mon medial La encoded at 1060). However it appears that Asho Chin has the sequence 1060 103E as in the last line here:

Screenshot 2021-09-20 at 13 57 27

I also noticed the Padauk font contains that ligature as 103E_1060 (since the order of medials otherwise follows alphabetic order I wonder why it's not 1060_103E) and 103D_1060; I'm not certain which language has that sequence.

We also find 103D 103D in the Tai languages of Northeast India and Northwest Burma, since 103D occurs as a vowel sign in those languages, and can be reduplicated.

@ohbendy
Copy link

ohbendy commented Sep 20, 2021

Also just checked UTN11 version 5 which Martin sent me last year for comments. Here we find:

Screenshot 2021-09-20 at 14 09 18

And for Asho Chin:
Screenshot 2021-09-20 at 14 09 51

It's odd to me that the medial La gets stored after the Wa or Ha, that doesn't follow alphabetic order and I'd bet linguistically it's not strictly correct either.

@simoncozens
Copy link
Collaborator Author

Excellent, thanks. I'm going to raise a query/issue in MicrosoftTypography; will fix in Harfbuzz too.

@himorin himorin added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Sep 20, 2021
@simoncozens
Copy link
Collaborator Author

Harfbuzz now supports medial ha - medial la. :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

5 participants