Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text-4] Make ideograph-alpha and ideograph-numeric part of text-spacing: normal #6950

Closed
hfhchan opened this issue Jan 14, 2022 · 30 comments
Labels
Closed Accepted by CSSWG Resolution css-text-4 i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@hfhchan
Copy link

hfhchan commented Jan 14, 2022

The baseline behaviour of text-spacing should always turn on ideograph-alpha and ideograph-numeric when in the horizontal writing direction. In the vertical direction, it should be turned on when the western text is rotated1 and turned off if the western text is not2.

Microsoft Word has ideograph-alpha/ideograph-numeric turned on by default. The iOS UI also has this spacing, but I'm not sure if it's done on the rendering level, or spaces are injected during string interpolation.

On other platforms, it is not supported out of the box. Authors end up adding spaces themselves inconsistently, leading to horrible UIs like this:
image

CSS should support this turned on as the default.

There may also need to be an additional property for authors to specify the width of the inserted space they want. The default 1/4 space is fine.

Footnotes

  1. See CLREQ Figure 8.

  2. See CLREQ Figure 7.

@xfq xfq added css-text-4 i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Jan 20, 2022
@frivoal
Copy link
Collaborator

frivoal commented Jan 31, 2022

There's a double question here:

  • what is good behavior
  • what is compatible with existing content

I believe you're right about what's good, but given that it is not the historical behavior and that plenty of content was written assuming it wouldn't happen, this is likely not to be compatible.

We could maybe write an informative note giving a suggestion of an algorithm for the UA-dependent auto value, which could do what you suggest, including the dependencies on writing-modes, but I suspect we cannot make that behavior the default. If you'd like to insist on it, it would take some evidence that this is web compatible.

@hfhchan
Copy link
Author

hfhchan commented Jan 31, 2022

I tried imagining scenarios where changing to more desirable defaults would make it incompatible with existing content, but I cannot come up with much.

Content which already has extra space characters injected wouldn't be affected, because ideograph-alpha and ideograph-numeric do not insert additional spaces between characters and spaces.

One case which may potentially be affected would be designs that emulate a terminal with mixed western/CJK text.

Since there's no guarantee that the browser font selected for CJK will be an integer multiple of the monospace font selected for western text, any site which wants pixel-perfect alignment will likely already be breaking characters into blocks with hard-coded widths and/or CSS grid. Ideograph-alpha and ideograph-numeric should not interact across blocks, so there would not be any impact if implemented that way.

About the behavior being turned off depending on the writing mode, I think I have approached it wrongly. It's more about the behavior of ideograph-alpha and ideograph-numeric itself, i.e. when turned on, spaces are only injected between characters in the same inline direction.

@frivoal
Copy link
Collaborator

frivoal commented Jan 31, 2022

I tried imagining scenarios where changing to more desirable defaults would make it incompatible with existing content, but I cannot come up with much.

Content that used to fit within a element of a certain given size is now longer due to these inserted spaces, no longer fits, and now breaks into two lines.

If a paragraph of text goes from 7 to 8 lines, it's usually fine, but if a UI element like a button or a menu entry goes from 1 line to 2, there's not always room to fit that, and it can easily overflow.

@hfhchan
Copy link
Author

hfhchan commented Jan 31, 2022

Note the current specified value trim-adjacent and allow-end for normal also causes content to no longer align on character grid boundaries.

Quad space is not much, has been universally used in commercial printing (sometimes more than quad, so the western text takes up a whole number of CJK characters), and is still better for alignment than a full space.

Also it's quite unlikely for UI to have mixed script CJK & Western; it is usually considered bad style. The vast majority of script mixing would be in titles, slogans and prose.

@hfhchan
Copy link
Author

hfhchan commented Mar 5, 2022

Would there be an update to this issue?

@CharlesBelov
Copy link

English reader here who has a bare acquaintance with Chinese text. As a webmaster, I would prefer that the default behavior, not inserting automatic spaces on language switch between CJK and Western, not be changed and that any new behaviors, such as automatically inserting spaces, need to be explicitly stated through a non-default property. This would prevent breaking legacy content.

I checked with two colleagues who have professional experience in Chinese-language media and here are their responses on the proposal:

Jay Lu:

“When I worked for the Chinese media, our style book called for no-space. I would recommend no-space because 1) our local press (Sing Tao and World Journal) follows no-space guide; 2) it looks better.”

Jessie Liang:

“As for newspapers and online articles, it's better to have no spaces before and after the English words, which look better. However, as for the subtitle on TV, you may or may not use a space because when you read a subtitle with a limited time, having a space before and after the English words would be easier to catch each word.”

@r12a
Copy link
Contributor

r12a commented Mar 28, 2022

It's not an easy call, one way or the other.

If autospacing is made the default, it's possible that some web text somewhere might grow a little, although it's not clear to me how many cases like that exist, and it may be a small number.

On the other hand, if spacing is preferred and the behaviour has to be applied manually to all Chinese and Japanese text in the future, that's a pain in the neck (for all web pages going forward, forever). Especially in cases such as where text is converted automatically from markdown, or where strings from a database or field-based input from users are integrated into a document that doesn't have styling for Chinese, and so on.


I raised a separate issue for the following comments. See #7183

But i also find myself wondering about a couple of other aspects of this.

[1] @CharlesBelov makes a case for no autospacing in some types of text. Of course, one possibility would be to have a value that turns default autospacing off, except that we don't appear to have one, and that leads to another thought. It suggests that it would also not be possible to have, say, autospacing set for the document as a whole (by default or by a style declaration), and then turn it off for some parts of the content.

[2] Another feature that might be useful is the ability to say, "If spaces have been used in an existing text between Han and ASCII alphanumeric text, rather than adding an even larger gap if autospacing is turned on (1/4 em plus the existing space character), replace the existing space with a 1/4em gap." That would make it easier to have consistent spacing when mixing text from different sources on the same page (especially including user input via forms, where they may add spaces). That would also allow content authors to apply autospacing without being concerned that existing text would wrap onto 2 lines, as @frivoal fears.

Which makes me wonder whether it's right to treat autospacing as just a value of text-spacing, or whether it's better to have an autospacing property with values such as on|off|override-space|<width>. (I seem to vaguely remember that this is more like what was implemented in Internet Explorer in times past.)

@r12a
Copy link
Contributor

r12a commented Mar 28, 2022

See #7183

By the way, i think that separating out the autospacing we're talking about here – moving Han and alphanumeric inline runs apart – from the punctuation-related text spacing feels intuitively better, too. When writing this stuff up so that others could learn about it, i found myself doing that anyway (see https://r12a.github.io/scripts/jpan/#letterspace).

@macnmm
Copy link

macnmm commented Mar 28, 2022

These "spaces" need special handling not normally found in Western typography, in that they compress more readily than Western spaces would in a tight justification scenario, and they would do so before any adjustments would be applied to U+0020 characters' widths. So, this is a new concept and has complexity. The Japanese case is more so, in that spacing around full-width punctuation is added to the prioritized list of spacing to be adjusted. These adjustment considerations need to be accounted for in the feature or it may not work well.

@r12a
Copy link
Contributor

r12a commented Mar 29, 2022

I think those features would be accounted for in the justification algorithm, no?

@CharlesBelov
Copy link

CharlesBelov commented Mar 30, 2022

I'm not sure I've been understood. That Microsoft Word does autospacing when mixing CJK and Western text does not make it correct (or wrong, for that matter). My concern is that you appear to be proposing to implement something that changes default behavior whether a site wants it or not, resulting in breaking existing content, based on your perception that adding spacing looks better. I have had two professionals who come from journalism tell me that removing spacing looks better, with adding spaces being reserved for special use cases.

Again, I am not an expert in Chinese, so take my random search judiciously, but in visits to four Chinese newspaper websites, all four sites had examples of mixed content where it appears there is intentionally no spaces between CJK and Western text. No endorsement of the content is intended; these are merely examples of Chinese content with strings of Western text.

https://std.stheadline.com/daily/article/2452186/%E6%97%A5%E5%A0%B1-%E6%B8%AF%E8%81%9E-%E5%85%A7%E5%9C%B0%E8%88%87%E6%B8%AF%E7%A7%8B%E5%AD%A3%E5%89%8D%E6%81%90%E9%9B%A3%E9%80%9A%E9%97%9C (Sing Tao Daily)

https://language.chinadaily.com.cn/a/202203/29/WS6242d221a310fd2b29e53fc5.html (China Daily)

https://www.zaobao.com.sg/lifestyle/columns/story20220328-1256766 (Zaobao Singapore)

https://www.chinatimes.com/realtimenews/20220330001102-260403 (China Times, Taiwan)

I acknowledge I did see in my searching discussions on this issue asserting that web designers like to add spacing in these cases. So I am fully in support of adding the automated capability. I am only concerned with the idea of making it the default behavior to the point where it breaks existing content or changes existing behavior to enforce a behavior that appears to not have agreement between designers and journalists.

I want to urge that when the CSS Working Group discusses the outcome of this issue, that they respect legacy content. If I misunderstood that you intend added spacing to be the default behavior, my apologies.

@xfq
Copy link
Member

xfq commented Mar 30, 2022

According to my observations, the two cases (non-ideographic letters and numerals) are not the same. But in either case, you can find examples with and without extra spacing on the market.

Note that I am only talking about print publications here. I'm not talking about the web because no browsers currently support text-spacing. Although it is possible to implement this behavior via JavaScript (e.g., 1 2 3 4), it is more troublesome and there is no standard way. And another issue is that some libraries implement this by adding ASCII spaces (U+0020), but the width of U+0020 is uneven font by font.


Here are a few examples:


Reference News in 2014:

Reference_News_2014

There's extra spacing between runs of ideographs and non-ideographic numerals.


People's Daily in 2022:

PD_2022

There's no extra spacing between runs of ideographs and non-ideographic letters.


ネットワークはなぜつながるのか 第2版 in Chinese:

ネットワークはなぜつながるのか

There's extra spacing between runs of ideographs and non-ideographic letters.

@r12a
Copy link
Contributor

r12a commented Mar 30, 2022

@CharlesBelov if by 'you' you mean me, i was not proposing a particular default, just voicing some factors that need to be taken into consideration.

I edited my comment a little to hopefully make it clearer.

The main point of my comment was the stuff below the line that i added. hth

@r12a
Copy link
Contributor

r12a commented Mar 30, 2022

Btw, i'd urge a little caution in basing decisions solely on online text. It may indicate a fundamental preference for no autospacing. Or it may be done because without a proper way of typesetting online text so that it has the right size of gap, no gap looks better. It may even be the case that this is a new trend that is becoming normalised.

Across many languages i see approaches to typography that have been produced by the inability of technology to do what's really wanted, and several cases where this is just accepted, or even starting to become considered as normal.

So, I think it's also important to also consider common approaches in quality printed text, where correct autospacing can be applied if desired, and @xfq provides several examples just above that suggest that autospacing is used in such.

I await with interest the advice of people (such as the clreq group) that have expertise in Chinese typography as to what should be the best default – although there may not be a clear answer even then.

@CharlesBelov
Copy link

CharlesBelov commented Apr 4, 2022

The following examples are from the print version of SingTao Daily, April 4, 2022, as published in San Francisco, CA. These images are intended solely for the demonstration of newspaper typography and were selected by me. No endorsement or criticism of the source, content or issues covered is implied.
Samples for posting to issue

The image shows excerpts of headline and story body text, demonstrating apparently intentional lack of a space when switching between Latin and Chinese characters.

I don't deny that some publications apparently (from the images in above posts) add a space. It's just that if some publications consistently and intentionally add a space and others consistently don't add spacing, we cannot assume any default desired behavior, and any enforcement of behavior must be explicitly stated through a targeted keyword rather than folding this behavior into unrelated keywords.

@xfq
Copy link
Member

xfq commented Apr 20, 2022

The Chinese Layout Task Force discussed this issue. Most group participants think that ideally, there should be extra spacing between Han and Western text. The specific value can be discussed separately, but the TF members think 1/8em is an acceptable default value for the Web (for print publications, the spacing can be larger). Many people think that solid setting between Han and Western text is bad typography.

Also, some think that Japanese is more tolerant of not having the extra spacing, because unlike Chinese, the Japanese writing system uses a combination of a variety of scripts. The Japanese version of some apps do not have extra spacing by default, while the Chinese version of the same app have extra spacing by default. But since this is original research of the clreq group, and I'll let the jlreq group share what they think.

If the CSS Working Group decides not to change the default due to compatibility issues, we (the clreq group) are fine with that.

The clreq group members think that this is an important issue, and maybe even the most important issue that CSS needs to solve now for Chinese typography. Because browsers don't support ideograph-alpha and ideograph-numeric currently, many authors add U+0020 manually, which eventually leads to all kinds of potential interoperability issues (like #4992) or bad typography (for example, U+0020 in some fonts are too wide or too narrow).

@hftf
Copy link

hftf commented Jun 3, 2022

I just came across something possibly relevant to this issue. A user of the not-hugely-popular note-taking app Logseq wrote a plugin to render extra spaces between Chinese and Latin characters. This little mod – one of a few dozen Logseq plugins – might be a pithy example of how end-users tend to go about implementing makeshift solutions in the absence of a better default or wider browser support for a modern feature, and might also convey the importance of this aspect of typography as it was enough for someone a bit savvy to spend the time creating and then to publish it for others to use to their advantage.

@CharlesBelov
Copy link

CharlesBelov commented Jun 21, 2022

I notice Microsoft Word has an option under File > Options > Proofing > Auto Correct Options... > AutoFormat as you type:

Delete needless spaces between Asian and Western text

Which indicates to me that, even if the default in Word might be to insert such spaces, there are enough subscribers to the worldview that the spaces are not to be used that someone would need a setting in Word to automatically and actively delete such spaces.

@xfq
Copy link
Member

xfq commented Jun 22, 2022

That option is about manually added whitespace (U+0020), not automatically added spacing, and this issue is about automatically added spacing.

@CharlesBelov What you are looking for is probably Format > Paragraph... > Asian Typography. In the Asian Typography task pane, there are two checkboxes:

  • Automatically adjust space between Asian and Latin text
  • Automatically adjust space between Asian text and numbers

But whether an option exists has nothing to do with this issue, which is about the default behavior (because CSS already has such a property, and it's been around for years).

@himorin
Copy link
Contributor

himorin commented Jun 23, 2022

Sorry for late response, although I've catched this earlier but could not secure time on this.
I'm chatting with JL-TF participants if we can write something more.

Several random notes:
As @xfq pointed, Mozilla's Japanese translation places a rule to insert U+0020 between CJK Ideograph characters and western characters in Editorial Guideline (only in Japanese), which was originally agreed around 2000 and also commonly shared among documentation sites like MSDN (at that time) via collaborating discussions among these. (On this point, it might be welcomed to suppress such hand-input/hard-coded white space and put narrower space by using this CSS feature?)
Also, JL-TF is preparing draft of line composition layout to be used as a part of (so-called) jlreq-d, and 1/4em as default but preferred to be configurable for spacing between CJK ideograph characters and western characters. Some old research exist on this area in Japan and 1/6-1/4em seems to be preferred (as far as I remember...).

@xfq
Copy link
Member

xfq commented Jun 24, 2022

Because browsers don't support ideograph-alpha and ideograph-numeric currently, many authors add U+0020 manually

FWIW, here are some examples for Chinese:

@astearns astearns added this to 10:30-11:30 i18n in TPAC Friday 2022 Sep 13, 2022
@fantasai
Copy link
Collaborator

Recommendation from i18n is to make this part of the initial value if Web-compatible.

fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 29, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 29, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit to fantasai/csswg-drafts that referenced this issue Dec 30, 2022
fantasai added a commit that referenced this issue Dec 30, 2022
@fantasai
Copy link
Collaborator

Edits committed. Florian and I went with specifying 1/8ic as the spacing, since we wanted to stay on the conservative side if we're inserting spaces by default. Please file follow-up issues about anything that needs further adjustment.

Also, there's a follow-up issue about replacing spaces with auto-spacing at #8263

@MurakamiShinyu
Copy link
Collaborator

The text-spacing: normal definition is now:

normal -- Specifies the baseline behavior, equivalent to space-start trim-end trim-adjacent ideograph-alpha ideograph-numeric.

I am a little confused about the "baseline behavior.

Before ideograph-alpha and ideograph-numeric were added to normal definition, the meaning was clear to me, that is, the each value included in the definition of normal is the default value, so when we specify the text-spacing property partially changed from normal we can specify just changed values and omit values that are included in the normal definition.
For example, when just text-spacing: trim-start is specified, it implies omitted normal values, trim-end and trim-adjacent. i.e., text-spacing: trim-start was equivalent to text-spacing: trim-start trim-end trim-adjacent.

However, after ideograph-alpha and ideograph-numeric were added to normal definition, it is no longer that simple. Is text-spacing: trim-start equivalent to text-spacing: trim-start trim-end trim-adjacent ideograph-alpha ideograph-numeric? That will be useful when we want to change only trim-start/space-start/space-first behavior and keep other normal behavior, but if so, how do we change just ideograph-alpha/ideograph-numeric behavior from normal?

Then I decided to interpret it as follows:

  • text-spacing: trim-start is still equivalent to text-spacing: trim-start trim-end trim-adjacent. The omitted default *-start/first, *-end, and *-adjacent are kept but ideograph-alpha and ideograph-numeric are turned off.
  • ideograph-alpha and ideograph-numeric are turned on only when these values are explicitly specified, or normal (or auto, probably) is specified.

(I implemented it in Vivliostyle.js as per this interpretation. vivliostyle/vivliostyle.js#1080)


However this spec is not very clear and not very useful.

It would be better adding a new keyword, no-interscript-space, to turn off the interscript spacing.

The ideograph-alpha || ideograph-numeric part of the value definition becomes [no-interscript-space | ideograph-alpha || ideograph-numeric] and we can specify only this part to the text-spacing property when keeping other default (normal) behavior.

@MurakamiShinyu
Copy link
Collaborator

Or maybe it would be better to change the ideograph-alpha || ideograph-numeric value to ideograph-alpha <number> || ideograph-numeric <number> where the <number> represents the space width as a multiple of the fullwidth advance measure. The normal definition would contain ideograph-alpha 0.125 ideograph-numeric 0.125. Examples:

  • text-spacing: ideograph-alpha 0 ideograph-numeric 0 turns off the ideograph-alpha/ideograph-numeric spacing and keeps other normal behavior.
  • text-spacing: trim-start only changes the *-start/first part of normal behavior. i.e, is equivalent to text-spacing: trim-start trim-end trim-adjacent ideograph-alpha 0.125 ideograph-numeric 0.125.

@frivoal
Copy link
Collaborator

frivoal commented Jan 9, 2023

@MurakamiShinyu I raised your two comments above a a separate issue #8288, as I think it will be more easily tracked this way.

jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
fantasai added a commit that referenced this issue Feb 21, 2023
* Remove non-useful keyword combinations #4246 #8288
* Split into longhands #4246 #7183 #8288
* Ensure off values for each thing #8288 #6950
* Add insert|replace to allow replacing incorrect space characters #318 #8263 #7183
* Make space-first the initial value #2462
* Allow hanging-punctuation to hang leading ideographic spaces #2462
* Move no-compress to text-justify #7079

See https://lists.w3.org/Archives/Public/www-style/2023Feb/0002.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution css-text-4 i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
No open projects
November 30 2022
Agenda+ TPAC
TPAC Friday 2022
10:30-11:30 i18n
Development

No branches or pull requests