Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable sized spaces in Thai #46

Open
r12a opened this issue Jan 29, 2021 · 26 comments
Open

Variable sized spaces in Thai #46

r12a opened this issue Jan 29, 2021 · 26 comments
Labels
i:punctuation_etc Phrase & section boundaries question Further information is requested s:thai

Comments

@r12a
Copy link
Contributor

r12a commented Jan 29, 2021

@srakrn pointed to this article, which describes two different sizes of space in Thai: large spaces between sentences, and small spaces in other places (eg. for separating sub clauses).

Given that web browsers reduce multiple spaces in the source text to a single space for display, how do content authors normally achieve this size distinction?

@r12a r12a added question Further information is requested s:thai i:punctuation_etc Phrase & section boundaries labels Jan 29, 2021
@bact
Copy link
Contributor

bact commented Jan 29, 2021

The width of small space is equal to the width of Thai character Ko Kai “ก” U+0E01.
The width of large space is double the size of the small space.

  • There are practices, inherited from typewriter time, of using two space characters (hitting space bar two times) for that large space;
  • this practice is discouraged (at least by NSTDA) in digital media, with the assumption/expectation that the visual spacing should be handled automatically by the word processor.
  • But since word processors (and web browsers) are not necessarily display different spaces in different places in the way the author wants, Em space is suggested to be used in place of large space.
  • see https://www.nstda.or.th/th/nstda-knowledge/2894-spacing

@dtinth
Copy link

dtinth commented Jan 29, 2021

I got interested in the text spacing in Thai text when I received my graduation remark produced by the Office of His Majesty's Principal Private Secretary, which looks like this. There are 3 kinds of spaces found in the document:

  • The small space, between some words in the same sentence.
  • The large space, between sentences.
  • The smaller space, only used before MAIYAMOK (ๆ). I reckon that it may be used before PAIYANNOI (ฯ) as well but I may be wrong.

how do content authors normally achieve this size distinction

In my web publication (in which I intend to have 2 different spacings), right now I do this <span style="margin-right: .25em;">&#10;</span> whenever I need a large space. Although after seeing @bact’s link, I consider using em-space. However, I have no independent control over the em-space’s size vs normal space size without modifying the font file.

But in practice, most of the time they don’t. Most Thai contents on the web that I read uses a single space. I think Thai readers are used to it by now. Even the Office of the Royal Society’s web page about the rules of spacing uses a single space for both small and large spaces, and so they are indistinguishable from each other. I never see &emsp; used anywhere in Thai text on the web — even the NSTDA page that suggests the usage of em spaces also exclusively uses regular spaces in the article.

@bact
Copy link
Contributor

bact commented Jan 29, 2021

The smaller space, only used before MAIYAMOK (ๆ). I reckon that it may be used before PAIYANNOI (ฯ) as well but I may be wrong.

See #19 (comment) (Maiyamok in justified paragraph)

@srakrn
Copy link

srakrn commented Jan 30, 2021

Could confirm @dtinth that I don't care. Most of the people I know don't seemed to care too.

Also could confirm @bact. My mum used to write lots of documents, and it appears that she used one space for "small" space and two for "large" ones. To my knowledge this is relatively identical to how people typeset documents in English using two spaces after a sentence-ending full stop.

@r12a
Copy link
Contributor Author

r12a commented Feb 3, 2021

So i guess that the questions i have are, if people don't use the wide space much these days, is it because:

a. they'd like to, but just can't figure out how to do it, and there's no em-space key on their keyboard
b. technological gaps have forced a change to the typographic culture, and people have become happy with the only a single width space
c. people just don't really care about the distinction anyway.

Should i raise an issue in the gap analysis document (presumably not as a high priority) about the need to support different space widths, or should i not?

@srakrn
Copy link

srakrn commented Feb 5, 2021

This is solely my two cents: yes, better raise the issue to gap analysis documents on low priority.

  • There will be people who don't care, we are not changing their behaviour, and this—to my understanding—won't cause any consequences to them.
  • For those who care, if the glyphs used for spacings are standardised, there will believably be implementations on linting/auto-replace tools for the compliance of the gap analysis document and the official style guide.

@r12a
Copy link
Contributor Author

r12a commented Feb 15, 2021

By the way, some languages that use the Khmer script, such as Krung and Tampuan, separate words with narrow spaces such as U+2006 SIX-PER-EM SPACE, and also separate phrases with a wider space such as U+2003 EM SPACE. (Khmer script used for Cambodian language generally uses a normal SPACE for phrase separation, and no space around words.)

@r12a
Copy link
Contributor Author

r12a commented Feb 18, 2021

Created some text about this at #49 which appears in the gap-analysis document at https://www.w3.org/TR/2021/WD-thai-gap-20210218/#punctuation_etc

@andjc
Copy link

andjc commented Feb 19, 2021 via email

@ohbendy
Copy link

ohbendy commented Feb 19, 2021

Some Thai designers have mentioned to me before that a wordspace designed for Latin will be too small for Thai and one designed for Thai will be too big for Latin. (And we can sometimes see a distinction in its width between fonts made by Thai-native designers and fonts made by Latin-native designers.) It makes some kind of sense because the purpose and frequency are different. So it raises the question of whether the wordspace is technically the best character to use in Thai. Of course it wouldn't be practical to suggest any kind of change, just an observation.

I don't think I've ever specifically included an em-space in the Thai/Lao/Khmer fonts I've made, but can easily do so in future projects.

@andjc
Copy link

andjc commented Feb 19, 2021 via email

@paepae
Copy link

paepae commented Feb 19, 2021 via email

@ohbendy
Copy link

ohbendy commented Feb 19, 2021

@andjc I'm afraid I don't have insight about the why, but as you mentioned above, if fonts and keyboards don't have a way to make the space smaller or larger, and if there's no general awareness of the practice, it seems unlikely people would be prompted to try other spaces.

@r12a
Copy link
Contributor Author

r12a commented Feb 19, 2021

Thoughts off the top of my head: I agree that it's probable that most people won't worry about space width, but i imagine that some people who want to do careful typography, and perhaps some authors of ebooks, etc. may want to avail themselves of the opportunity to do this. Since there seems to be no clear indication of how it is to be done, i think it's helpful to establish the principle and discuss how it could work. If the EM SPACE works, then those people could probably find a way to use it even if it's not on a keyboard (though it would be better if it were, of course). But a gap in the font could be more problematic.

To improve the visibility for this thread, here's a note i included in the gap-analysis document about support for EM SPACE in Thai fonts: this test shows that no Thai fonts on Mac OS X or Windows 10 have a glyph for EM SPACE, with the exception of Arial Unicode MS and Tahoma. (For best results you need to download Adobe NotDef, and view the page on both Mac and Windows OS.)

Also, at #49 we started asking the following questions:

  1. is there an issue related to handling of white space (eg. rendering a line where EM SPACE appears at the end, or rendering source text that has EM SPACE at the end of a line).
  2. The other thing to check is what is the impact on justification. I believe EM SPACE doesn't expand, which may be an issue. Perhaps in some cases the regular spaces will expand to be as wide or wider than the EM SPACE.

@r12a
Copy link
Contributor Author

r12a commented Feb 19, 2021

I find myself wondering whether a better alternative might be to add a special white-space property/value for use with Thai/Khmer/etc that stipulates that multiple U+0020 SPACE characters should not be collapsed. That would presumably remove the issues around wrapping, justification, font support and keyboards. Its pretty easy and accessible for users to type a double space if you want to increase the gap between sentences. Just a thought.

Note that the key issue we face here is the tendency of HTML and other markup to reduce white-space before rendering. In plain text, double-spacing my well occur already(see @bact 's comment about typewriters above).

Btw, fwiw, I just added EMSP to the Thai character app, to facilitate experimentation.

@r12a
Copy link
Contributor Author

r12a commented Feb 19, 2021

cc @fantasai @frivoal

@andjc
Copy link

andjc commented Feb 20, 2021 via email

@fantasai
Copy link

There already is a value that does not collapse consecutive spaces but allows wrapping: pre-wrap. However, it also preserves line breaks. In theory we could add a value that collapses breaks but not spaces and allows white space to be discarded e.g. between block elements. But there is a small problem: when we collapse line breaks we collapse them down to one space, and you're more likely to want to break at a sentence end (two spaces) than a phrase end within a sentence (one space). There's no way for us to tell if a line break in Thai ends a sentence.

EM SPACE is currently considered a fixed-width space. Afaik it's expected to be exactly one em wide, and is therefore not adjusted by justification. There's also U+3000 IDEOGRAPHIC SPACE which is typically 1em wide and is allowed to be adjusted by justification, but it's not exactly double U+0020. It does make me wonder if Unicode needs a dedicated sentence-ending space codepoint... But if Khmer is using fixed-width spaces for word spaces and sentence-ending spaces already, we've got a problem if we can't justify them. :/ I guess an important question there becomes whether such spaces are supposed to be adjusted equally or proportionally.

Wrt white space at the end of the line, css-text-3 currently allows all (invisible) space characters to hang at the end of a line for all values of white-space except break-spaces, to allow for flush alignment of the visible text. I don't believe this behavior is implemented yet across browsers.

@r12a
Copy link
Contributor Author

r12a commented Feb 22, 2021

pre-wrap also appears to prevent justification by stretching of spaces in Gecko and Webkit. The justification still happens with Blink though. test

@fantasai
Copy link

Which I guess is to say, if fixed-width spaces are now supposed to be treated as variable-width spaces because that's how Unicode wants them to be used these days, someone should file an issue against css-text-3 about that; and if we need a sentence-ending space in Unicode someone should file an issue against Unicode.

@r12a
Copy link
Contributor Author

r12a commented Feb 23, 2021

Just to be clear: what i noticed is not stretched when white-space is set to pre-wrap is U+0020 SPACE. I haven't yet checked whether or not that's expected behaviour per spec.

@r12a r12a added i:punctuation_etc i:punctuation_etc Phrase & section boundaries and removed i:punctuation_etc Phrase & section boundaries i:punctuation_etc labels Mar 12, 2021
@fantasai
Copy link

pre and pre-wrap are somewhat exempt from justification. “If an element’s white space is not collapsible, then the UA is not required to adjust its text for the purpose of justification and may instead treat the text as having no justification opportunities.” https://www.w3.org/TR/css-text-3/#text-align-property

@frivoal
Copy link

frivoal commented Mar 18, 2021

I think the issue applies to Latin text as well. People who learned typing on typewriters often use two spaces after a sentence-ending period, and one space elsewhere. This practice is no longer terribly fashionable, and people who continue to do it get push-back, and are told that since we don't live in a monospaced world anymore, that's not the right way to achieve, it… but there isn't really a right way to achieve it.

As @fantasai said earlier, I suspect the solution is to either change (in css-text?) the definition of (some of?) the various fixed-width space, so that they can grow due to justification, or to introduce a new unicode codepoint for a larger-than-U+0020-but-stretchable space. I suspect the former is more likely to be practical.

@r12a
Copy link
Contributor Author

r12a commented Jan 10, 2023

I mentioned this briefly at https://www.w3.org/International/sealreq/thai/#space_widths

@r12a
Copy link
Contributor Author

r12a commented Feb 1, 2024

In another issue @bact referred to https://www.si.mahidol.ac.th/th/division/soqd/admin/knowledges_files/373_18_1.pdf

I'm mentioning it here because the initial part of that page says that 2 spaces are needed between sentences.

Screenshot 2024-02-01 at 16 28 05

@bact
Copy link
Contributor

bact commented Feb 1, 2024

That two spaces practice from the typewriter era, for Thai, is to mimic the writing where the writer can do the spacing but just move their hand a bit further after the last character of the previous sentence.

This two spaces practice work well in a word processor but is quite ineffective in HTML, where multiple spaces are treated like just one space.

Without an explicit end of sentence symbol, it is also difficult for a web browser to render a space differently according to its semantic (a space between sentences vs a space between words).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:punctuation_etc Phrase & section boundaries question Further information is requested s:thai
Projects
None yet
Development

No branches or pull requests

9 participants