Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Myanmar shaping #6

Open
brawer opened this issue Oct 26, 2016 · 10 comments
Open

Test Myanmar shaping #6

brawer opened this issue Oct 26, 2016 · 10 comments

Comments

@brawer
Copy link
Collaborator

brawer commented Oct 26, 2016

https://github.com/googlei18n/noto-fonts/issues/769#issuecomment-254315022 has test cases for Myanmar shaping. Before making adding them as test cases, we need to triple-check that these are actually Unicode strings and not in Zawgyi encoding.

အကျွန်ုပ်သည်
သွားလတံ္တနည်း
သဗ္ဗာသဝသုတ်
အကျွန်ုပ်သည်
သာဝတ္ထိ
ဤသို့
မြတ်စွာဘုရားသည်
သွားလတံ္တနည်း

@brawer
Copy link
Collaborator Author

brawer commented Oct 26, 2016

Feedback from Google’s Burmese linguist: Most of the above are correct Unicode, but သွားလတံ္တနည်း should be သွားလတ္တံနည်း

@brawer
Copy link
Collaborator Author

brawer commented Oct 27, 2016

There’s a neat test case for Myanmar OpenType shapers at the end of section Well-formed Clusters in the spec, just before Reordering Characters:

င်္က္ကျြွှေို့်ာှီ့ၤဲံ့းႍ

image

@davelab6, @behdad or @mjansche, are you aware of any font that can render it? If so, I’d ask the copyright owner if they’d be willing to allow us (Unicode) to incorporate the glyphs for just this one cluster into Unicode’s test suite for text rendering engines. They’d need to sign Unicode’s Contributor Licensing Agreement; I’ll handle the paperwork.

@mjansche
Copy link

That's a rather contrived example. I have been using examples from UTN 11 as test cases for a similar purpose. Coincidentally I also prepared a list of frequent clusters that occur in a large corpus, which I've been meaning to push out. Stay tuned for that.

@brawer
Copy link
Collaborator Author

brawer commented Oct 28, 2016

Oh cool. Please don't hesitate to send pull requests; much appreciated.

@mjansche
Copy link

Now that I'm looking at the description of Well-formed Clusters in the OpenType spec, I notice that it doesn't seem to match the corresponding description in UTN 11. (Working code: https://github.com/googlei18n/language-resources/blob/master/third_party/unicode/utn11.py)
According to the regex in UTN 11, that cluster is not recognized as valid and/or in canonical storage order. This could well be a problem in the regex, but I think it points to a deeper mismatch between what fonts/shapers/renderers have to worry about vs. what is needed for representing actual text.

@brawer
Copy link
Collaborator Author

brawer commented Oct 28, 2016

Adding @mhosken who wrote UTN11 for clarification.

@mhosken
Copy link

mhosken commented Oct 28, 2016

FWIW, rendering using padauk in a graphite context (firefox or libreoffice) will give you a pretty strong test of strings conformity to UTN#11. UTN#11 is stricter than the OpenType spec, and that's OK. I don't think it's necessarily the shaper's responsibility to be the encoding police. The only thing that would be bad is if the shaper marked something bad that UTN#11 says is good.

BTW you are welcome to use Padauk for your test string and that is an OFL font that needs no agreement to use in the Unicode Standard book or anywhere else by them.

@brawer
Copy link
Collaborator Author

brawer commented Oct 28, 2016

@mhosken, your text certainly looks like a nice test case; can you post the Unicode string for it? (Sorry to ask, but my Burmese is inexistent).

image

@mhosken
Copy link

mhosken commented Oct 31, 2016

It's your string from comment 3 above. But the Graphite font hasn't been set up to render 4 medials in sequence like that because no language ever uses them all. Of course there are also other medials used by minorities, not in your string. So I suppose it could be madder. Hence not reordering the U+1031.

@behdad
Copy link
Contributor

behdad commented Oct 31, 2016

Myanmar Text from Microsoft renders the original test correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants