Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Positioning of prescript vowel signs in non-conjunct clusters #51

Open
r12a opened this issue Mar 14, 2019 · 2 comments
Open

Positioning of prescript vowel signs in non-conjunct clusters #51

r12a opened this issue Mar 14, 2019 · 2 comments
Labels
l:bn Bengali language & script l:hi Hindi, Devanagari script l:ta Tamil language & script question Further information is requested useful-discussion

Comments

@r12a
Copy link
Contributor

r12a commented Mar 14, 2019

I'm moving this question here so that it doesn't get swamped by other discussion at #34 (comment)

@tiroj wrote:

The individual font has a lot of control over how this sequence is displayed. Not only is it possible to display the sequence in the two ways described by @vivekpani, but also it is possible to force an older display convention in which the ि reordering is not blocked by the presence of the explicit halant in linear layout but instead moves to before the first ड.

Indeed, and not only older fonts do this. If you put a ZWJ in the middle of the हड्डियाँ conjunct, Noto Serif Devanagari produces

Screenshot 2019-03-13 at 11 44 02

whereas Noto Sans Devanagari produces

Screenshot 2019-03-13 at 11 43 37

My questions for the indic experts here are:

  1. are those both equally acceptable outcomes?
  2. should they produce different behaviours for selection, cursoring, line-breaking, etc?

(Although i used a ZWJ in this case, i assume that the principle applies equally to situations where a font doesn't have the means to produce a conjunct.)

Originally posted by @r12a in #34 (comment)

@r12a r12a added question Further information is requested l:hi Hindi, Devanagari script l:bn Bengali language & script l:ta Tamil language & script labels Mar 14, 2019
@tiroj
Copy link

tiroj commented Mar 14, 2019

I can't answer your question about acceptability, but it may be helpful to understand what appears to be happening in the two fonts:

The Noto Serif illustration shows the behaviour when a font handles the display of ड (and presumably other consonant letters that don't have graphical half forms) as a full ड plus a halant mark character. The OpenType Layout 'dev2' shaping engine uses the presence of the halant character glyph after basic orthographic unit shaping* to reorder the ikar vowel sign only as far as the halant, and not to the beginning of the clusterm (it also serves the same function in preventing reordering of reph in the opposite direction). This orthographic convention has been standard in Hindi since, as I understand, the 1960s. [Prior the to specification of 'dev2' shaping, under the pre-Windows Vista 'deva' model — which is still supported —, ikar and reph would be ordered past the explicit halant. The provision of two separate OTL script codes allows font makers to produce fonts that support both 'deva' and 'dev2' shaping.]

The Noto Sans illustration shows the behaviour when ड plus halant is handled as a combined, nominal half form glyph (there are some other methods that could display in this way, but that's the most likely). This is something that we would do commonly in the 'deva' model, in which the visual presence of halant made no difference to reordering, but generally don't do in 'dev2', since the whole point of that shaping model is to support the convention in which ikar and reph don't reorder past the halant.

That said, the 'dev2' model is designed in a way that provides the font maker with flexibility as to how they want such clusters to display. Conjunct ligature shaping in the 'dev2' model is split between a pre-reordering 'cjct' feature that takes sequences with explicit halant as input, and a post-reordering 'pres' feature that takes half forms as input. And as the Noto Sans example shows, it is also possible to build the font in a way that treats letters with visual halant — but not separate halant glyphs — as nominal half forms.

So this difference in display is tied up in how the glyphs are built and presented, and I'm not sure how feasible it is to provide users with different experiences in terms of editing and line layout based on those display differences. At the character level, it's possible for (at least) three different cluster display options to be identically encoded, and those different display options might, themselves, result from multiple ways of handling the OpenType Layout.

In terms of linebreaking, I suspect the answer is that linebreaks should be happen at phonetic syllable boundaries, which won't always correspond to graphical cluster boundaries. In that case, you need dictionary support. In Devanagari, particularly for modern Hindi where retroflex consonants tend to display with an explicit halant instead of traditional, vertical conjunct ligatures, you can't tell from the either the character sequence or how it is displayed within a line whether a linebreaking opportunity occurs at the halant.

In some other Indian scripts, ZWNJ is often used explicitly to mark a phonetic syllable division. So, for example, the English loanword 'software' is typically written in Telugu with the 'ftw' sequence as f+t+halant+ZWNJ+w, i.e. with an explicit halant on the ft cluster and a full form of w, rather than as a three-consonant cluster. That suggests that the presence of ZWNJ might be taken as a linebreak opportunity, but of course that assumes that the control character is being consistently used to make phonetic syllable divisions, which probably isn't always the case for any script, and definitely isn't the case with Devanagari. So, again, you'd have to involve dictionary support.


@lianghai
Copy link

  1. are those both equally acceptable outcomes?

They are both acceptable, and how much each one is preferred varies significantly in different situations. Basically they’re spelling variations, like how other writing systems’ spelling variations are preferred in various degrees in different situations. Except these spelling variations in Indic scripts (particularly, Devanagari) were considered equivalent to some extent and the ISCII encoding model handed this complexity to the font level, instead of how other scripts have such variations resolved in encoding (character sequences) directly.

  1. should they produce different behaviours for selection, cursoring, line-breaking, etc?

They probably really need to produce different behaviors for those use cases, as the font renderings (instead of underlying character sequences) are what users see and expect to interact. And getting boundary information directly from the font rendering process is probably the only viable way to do that.

If a single behavior is forced on both:

  • A maximized cluster would include <consonant, virama, consonant, vowel sign> no matter this sequence is rendered as one or two clusters. This would be awkward for the case 1 (rendered two clusters).

  • A logic with minimized clusters would produce two clusters, <consonant, virama> and <consonant, vowel sign>, then there wouldn’t be any way of selecting these two clusters one by one that makes sense for the second case (rendered one cluster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
l:bn Bengali language & script l:hi Hindi, Devanagari script l:ta Tamil language & script question Further information is requested useful-discussion
Projects
None yet
Development

No branches or pull requests

3 participants