Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Todo hyphen #30

Open
mheijdra opened this issue Dec 3, 2018 · 16 comments
Open

Todo hyphen #30

mheijdra opened this issue Dec 3, 2018 · 16 comments
Labels
i:hyphenation Hyphenation (a subset of Line-breaking & hyphenation) i:line_breaking Line breaking & hyphenation question

Comments

@mheijdra
Copy link

mheijdra commented Dec 3, 2018

Although the main discussion in this group is Mongolian in Traditional Mongolian, and not Todo, Manchu or Sibe, I would suggest incorporating a note on how the Todo hyphen is supposed to work.

@r12a
Copy link
Contributor

r12a commented Dec 6, 2018

@mheijdra would you be able to propose some text for this? (If so, contact me at ishida@w3.org so i can take you through the necessary agreements.)

@lianghai
Copy link

Btw, it’s mentioned by Badral Sanlig (@badaa) et al at April’s Mongolian Working Group meeting that the so called TODO SOFT HYPHEN is not only supposed to be used in Todo text, but it’s also used (or wanted, if there isn’t an established usage yet) in Mongolian/Hudum text. Badral might be able to provide more background information.

@xfq
Copy link
Member

xfq commented Feb 15, 2021

In the Unicode Standard, v11, p536:

In writing Mongolian and Todo, U+1806 mongolian todo soft hyphen is used at the beginning of the second line to indicate resumption of a broken word. It functions like U+2010 hyphen, except that U+1806 appears at the beginning of a line rather than at the end.

@r12a r12a added the i:hyphenation Hyphenation (a subset of Line-breaking & hyphenation) label Jan 25, 2022
@r12a r12a added the question label Jan 11, 2023
@r12a
Copy link
Contributor

r12a commented Jan 11, 2023

Folks, we need to clarify the situation about the Todo soft hyphen character for CSS folks as well as for our gap analysis (which i suspect needs to be rewritten). cc @badaa @mheijdra @fantasai

My understanding is as follows:

  1. Mongolian words are not usually split during line-breaking (which is useful, because it avoids tricky questions about cursive behaviour at line breaks). However, compound words may contain hyphens. In those cases, the line may be broken at the hyphen. However, the hyphen should not remain on the same line, but should move to the next line.
  2. Unicode provides U+1806 TODO SOFT HYPHEN for that. It has a line-break property value of BB (Break Before), so it should end up in the right place. Furthermore, it is not a formatting character (ie. it is always visible, just like U+2010 HYPHEN, and doesn't appear/disappear in certain contexts). Also, it's not only used in Todo, but also in Mongolian text.

Here are my questions:

  1. is my summary above correct?
  2. can anyone provide me with an example of such a compound word? I had a search but didn't find 1806 being used online, but perhaps it should be used rather than an ordinary hyphen for ᠠᠲ᠋ᠠ‍᠆ᠮᠠᠯᠢᠺ and ᠠᠳ᠋᠆ᠳᠢᠨ‍ ? source
  3. is this hyphen only used for compound words, or should it be used generally for bits of text that are separated by hyphens? For example, would it be used for 2021-2030, or MK-DOS, or ᠪᠠᠷᠢᠮᠵᠢᠶ᠎ᠠ-8045, etc?

thanks for your help.

@r12a
Copy link
Contributor

r12a commented Jan 11, 2023

I tried it out on the major browser engines. For a test and results see w3c/line_paragraph_tests#84

Summary: Works on Blink & WebKit (though the latter doesn't produce readable text), but fails on Gecko.

@mheijdra
Copy link
Author

mheijdra commented Jan 11, 2023 via email

@r12a
Copy link
Contributor

r12a commented Jan 11, 2023

perhaps it should be used rather than an ordinary hyphen for ᠠᠲ᠋ᠠ‍᠆ᠮᠠᠯᠢᠺ and ᠠᠳ᠋᠆ᠳᠢᠨ‍ ?

@mheijdra do you not think these compounds are candidates for examples?

@mheijdra
Copy link
Author

mheijdra commented Jan 11, 2023 via email

@ddamato
Copy link

ddamato commented Jan 12, 2023

Hi folks,

Not a native speaker of Mongolian, and I've haven't studied the language in over a decade but I still have all my reference material. According to Mongolian Grammar (D. Tserenpil, R. Kullman) we have the following description of hyphen usage in the script on p.402:

Marks local or temporal beginnings and ends.

On p.405, usage is extended for Cyrillic:

Hyphens are used to connect two words, numbers and their suffixes, and also to connect personal names: If the second of the two names begins with a vowel, then a hyphen is put in between them. In literature, works of similar meaning are equated using a hyphen

I found an example of the word "July" on p.85. The Cyrillic uses the hyphen, the script uses a space.

I also happened to find an English compound word on p.198 "pitch-black" which is not a hyphen compound in either Cyrillic or the script; just broken into two separate words.

My assumption, due to the language being classified as agglutinated, is that there are probably no hyphenated compound words native to the script. Moreover that the hyphen usage in the script is as stated above, only used for datetime range.

As a final rough check, I paged through my Mongolian English Dictionary (F. Lessing) looking for words with Cyrillic hyphens and didn't come across any. If you have something you'd like me to look up in particular to view the script, I have the physical copy.

@badaa
Copy link

badaa commented Jan 12, 2023

Actually, the usage of the TODO SOFT HYPHEN would be high, if it works as expected or correctly. U+1806 is probably meant to connect Mongolian suffixes to abbreviations or foreign words. For example: Director General of UN will be written in Mongolian ᠨᠡ‍᠂ᠦ‍᠂ᠪᠠ‍᠂᠆ᠤᠨ ᠶᠡᠷᠦᠩᢈᠡᠢ ᠵᠠᢈᠢᠷᠤᠯ, and declinable by all grammar cases and other clauses. There exist numerous suffixes as Mongolian language is morphology rich.
Now what is the problem?
The problem is the suffixes must start without Titem which means suffixes must start like medial variant. Thus people use normal hyphen + nirugu in this case to show their text correctly.
It is what I can spontaneously say off by heart. I will ask other use cases from Jamiyan.
Remark:
It should not be confused with hyphen which is popularly used to write (compound) human names in Mongolia. Human names are written with hyphen if the second word starts with a vowel. It is already standardized in official Cyrillic script. Thus, people write always their names with hyphen also in Mongolian script. For example, Алтан-Уул, Altan-Uul (meaning Goldberg).

@eric-albert
Copy link

I pointed a few language experts to this conversation and Tim Brookes of Endangered Alphabets says he may know someone who could answer the questions here. Contact him at tim@endangeredalphabets.com and he’ll help out.

@ZmongolCode
Copy link

ZmongolCode commented Jan 13, 2023 via email

@ZmongolCode

This comment was marked as duplicate.

@ZmongolCode
Copy link

ZmongolCode commented Jan 13, 2023 via email

@r12a
Copy link
Contributor

r12a commented Jan 13, 2023

@ZmongolCode Thank you for your comments. While i share your frustration at the time it is taking to arrive at the best encoding model for Mongolian, and i sympathize with the need to keep Mongolian simple, it's not something we can fix here at the W3C. I recommend you contact the Unicode Consortium, who are the people working on that.

For clarity (for everyone participating in this discussion), this thread is not about MVS or FVS. It's about U+1806 TODO SOFT HYPHEN (or similar visible hyphens/dashes) and their relationship with line-breaking. If you'd like to discuss something else, feel free to start another issue.

Housekeeping note for those who are replying via email, rather than using the GitHub interface: please delete other people's emails from the bottom of your message before sending. That reduces clutter and makes it easier to follow the GH thread. Thanks.

@ZmongolCode
Copy link

ZmongolCode commented Jan 13, 2023 via email

@r12a r12a added the i:line_breaking Line breaking & hyphenation label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:hyphenation Hyphenation (a subset of Line-breaking & hyphenation) i:line_breaking Line breaking & hyphenation question
Projects
None yet
Development

No branches or pull requests

8 participants