revamp wcwidth #1467

jerch · 2018-05-23T07:12:07Z

wcwidth needs several enhancements:

interface to configure width of ambiguous chars at runtime, maybe locale dependent (see https://chromium.googlesource.com/native_client/nacl-newlib/+/master-backup/newlib/libc/string/wcwidth.c). Maybe include this into 2.
interface to load widths from some kind of outer source, see Emoji skin modifiers don't work #942 (comment)
This would make it possible to overload the default width set with some custom values. Those values could be based on backend or frontend or elseone's needs.
Still in doubt - maybe merge wcwidth with future grapheme handling. Most of the wcwidth rules are a subset of the grapheme clustering. Once we have working grapheme handling merging them might save a lot of runtime. This is somewhat doubtful because grapheme handling might not work at an early stage with current cell based buffer model and should be added at a later stage (closer to the renderer). See also Grapheme support #1468.

The text was updated successfully, but these errors were encountered:

jerch · 2018-05-25T05:00:35Z

While playing around with the custom setting in #1470 which addresses 2. from the list above it became clear that the emulator should primarily follow the wcwidth behavior of the backend system (where the pty is running) to avoid all kinds of weird cursor and eol behavior. This is due intelligent backend libs like libreadline that anticipate the terminal behavior and are used in almost every terminal program nowadays.

It can be solved with another node package that collects the wcwidth output from the server side. For not nodejs based server components it might need a small C helper to get those data.

jerch · 2018-06-13T14:04:32Z

Kinda offtopic here but related to recent issues - it is kinda hard to get wcwidth to respect the backend situation in a platform independent fashion because of the following:

newer wcwidth impls are locale dependent. Plain nodejs does a bad job regarding locale settings (cant handle it at all in default builds lol). This can only be fixed by some custom C++ binding, bummer.
Given 1# the next question is - which wcwidth should we link? Due to poor update policy some OSes have an outdated wcwidth in their clib (OSX), Windows has none at all. cmdline apps fix those awkward env situations by providing their own wcwidth (as we did in xterm.js) with additional heuristics to get something useful shown. While linking against the clib's wcwidth should work correctly under most Linux and BSD distros, OSX support might need additional maps per OS version and appXY. For Windows it might need a fix in winpty to get the cursor movement after a char insert, no clue yet how WSL does play into this.
browser based builds of xterm.js: JS in browsers dont offer a wcwidth binding. Only chance to sync it with the backend is to get the wcwidth data from the server component to the frontend somehow in a feasible way. Polling char by char is no option due to high latency.
and 2. clearly leads to an additional native tool (node module and/or some helper app). 3. needs some thinking about creating runtime data and providing it to the browser based frontend.

mofux · 2018-06-13T14:15:13Z

5. When using xterm.js as an SSH frontend, there is no way to reliably determine the wcwith of the target system.

@egmontkob Do you know how VTE solves this, especially for the SSH case?

jerch · 2018-06-13T14:37:57Z

Here is a quite elaborated view on this problem: mobile-shell/mosh#234 (comment)

mofux · 2018-06-13T16:50:10Z

@jerch What a mess 😢Seems unsolvable without having a living global specification for wcwidth that everybody can follow.

egmontkob · 2018-06-13T18:06:27Z

VTE simply uses whatever's returned by g_unichar_iswide(), which is already sub-ideal, we have a pending bug 772890 to perhaps use glibc's wcwidth() instead. In case these methods are locale-dependent (are they really? even among the UTF-8 ones?), simply the process's locale is used (in case of gnome-terminal it's the locale set up by systemd --user).

Plus there's an API option in VTE (compatibility preference in gnome-terminal) for ambiguous-width characters to be treated as wide. This changes the width of several characters. The set is somewhat arbitrary, and is subject to accidental as well as planned further changes, see bug 791452. Alas there's no locale definition to back this up, I vaguely recall there's a long-standing open (or rejected?) glibc bug on sourceware, as far as I recall it had a four-digit bug number, but the last time I tried I couldn't find it. This means that apps can't really use this behavior of the terminal emulator. Ideally I guess there'd be no such API in VTE, rather you could switch (both VTE and the apps running inside) into a different locale with a different wcwidth.

That still wouldn't solve the problem of ssh, though. And wouldn't solve how the locale used by the terminal emulator would match up with the locale of the app. Maybe be need to introduce an escape sequence for that?? And then juggle with multiple locales inside a single terminal emulator process (e.g. in case of gnome-terminal), using newlocale/uselocale/whatever_l and friends.

Seems unsolvable without having a living global specification for wcwidth that everybody can follow.

Even worse, a specification couldn't solve it either, see the mess caused by Unicode 9.0 changing the width of certain already existing codepoints (e.g. VTE 772812 and tons of duplicates out there). We'd either need a specification that's guaranteed not to change in backwards incompatible ways (Unicode seems not to make such a guarantee), or introduce some kind of versioning. I guess versioning is required anyways, an app should be able to tell if the terminal emulator doesn't yet recognize a new codepoint.

And of course, design all of these in a way that can even work across ssh, across different OSes with different notations for locales. Sigh.

Maybe, maybe just ignoring this whole problem set and living with the tiny breakages every now and then is better than overengineering it??

mofux · 2018-06-13T18:49:09Z

@egmontkob Thanks for your deep insights, very helpful!

Maybe, maybe just ignoring this whole problem set and living with the tiny breakages every now and then is better than overengineering it??

I'm about to agree 😅

@jerch Maybe it is a good foundation to have a pluggable wcwidth implementation, ideally one that can be hot-plugged (so it re-evaluates the CHAR_WIDTH stored in our buffer once it gets swapped out, then triggers a renderer refresh). This would at least allow us to ship different implementations and have the user switch between them (live) to see which one works best. It would also provide a way for a backend to determine its wcwidth characteristics and then choose a matching implementation in the frontend.

jerch · 2018-06-14T04:53:57Z

Lol what a nightmare, seems to be unfixable without the versioning thing. This is also an issue with the other unicode algos like the grapheme splitting and maybe bidi (have not yet read deeper into the latter). I like the idea to have some escape sequence asking for unicode support XY, but thats a different story, it wont help here unless specified and supported by a critical mass of cmdline apps. @egmontkob Are there any efforts drafted yet into this direction?

@mofux Yeah gonna stick to the pluggable concept, not sure yet if it makes sense to deliver preconfigured unicode version sets, not sure how closely duplexers like tmux, screen and such follow the unicode spec or do their "own thing". About live patching the widths: also not sure yet how to get this done in a sane way - we dont record the input data atm and the buffer is likely to change all over if the first char is misaligned (with all weirdness like changing the cursor position and the line wraps).

egmontkob · 2018-06-14T07:32:05Z

Are there any efforts drafted yet into this direction?

No, not that I'm aware of.

Note that a specification without a reasonably easy way of implementing it by emulators and apps is unlikely to get adopted. And here for the apps we'd at least need a way to query the Unicode version (I'm not aware of such method), and for the emulators we'd need to even support more of them at the same time (there are much fewer emulators than apps, so maybe inlining the databases is okay here).

Tyriar · 2019-10-07T17:33:37Z

@jerch can/should we close this in favor of #1709?

jerch · 2019-10-07T17:40:44Z

Yes I think we have there all needed bits that need to be wrapped up. Closing this issue.

jerch mentioned this issue May 23, 2018

Support options to treat ambiguous width characters as double width #1453

Closed

Tyriar added the type/enhancement Features or improvements to existing features label May 23, 2018

jerch mentioned this issue May 24, 2018

Revamp wcwidth #1470

Closed

jerch mentioned this issue Jun 12, 2018

Emoji can leave artifacts in right cell #1502

Closed

Tyriar mentioned this issue Jun 13, 2018

Wrong cursor position in intergrated terminal with speical character left of prompt microsoft/vscode#51739

Closed

Tyriar mentioned this issue Aug 31, 2018

Something wrong in terminal while using MSYS 2 with zsh. microsoft/vscode#57195

Closed

jerch mentioned this issue Sep 24, 2018

Update wcwidth/CharWidth.ts #1707

Closed

jerch closed this as completed Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revamp wcwidth #1467

revamp wcwidth #1467

jerch commented May 23, 2018 •

edited

Loading

jerch commented May 25, 2018

jerch commented Jun 13, 2018

mofux commented Jun 13, 2018

jerch commented Jun 13, 2018

mofux commented Jun 13, 2018

egmontkob commented Jun 13, 2018

mofux commented Jun 13, 2018

jerch commented Jun 14, 2018

egmontkob commented Jun 14, 2018

Tyriar commented Oct 7, 2019

jerch commented Oct 7, 2019

revamp wcwidth #1467

revamp wcwidth #1467

Comments

jerch commented May 23, 2018 • edited Loading

jerch commented May 25, 2018

jerch commented Jun 13, 2018

mofux commented Jun 13, 2018

jerch commented Jun 13, 2018

mofux commented Jun 13, 2018

egmontkob commented Jun 13, 2018

mofux commented Jun 13, 2018

jerch commented Jun 14, 2018

egmontkob commented Jun 14, 2018

Tyriar commented Oct 7, 2019

jerch commented Oct 7, 2019

jerch commented May 23, 2018 •

edited

Loading