Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revamp wcwidth #1467

Closed
jerch opened this issue May 23, 2018 · 11 comments
Closed

revamp wcwidth #1467

jerch opened this issue May 23, 2018 · 11 comments
Labels
type/enhancement Features or improvements to existing features

Comments

@jerch
Copy link
Member

jerch commented May 23, 2018

wcwidth needs several enhancements:

  1. interface to configure width of ambiguous chars at runtime, maybe locale dependent (see https://chromium.googlesource.com/native_client/nacl-newlib/+/master-backup/newlib/libc/string/wcwidth.c). Maybe include this into 2.
  2. interface to load widths from some kind of outer source, see Emoji skin modifiers don't work #942 (comment)
    This would make it possible to overload the default width set with some custom values. Those values could be based on backend or frontend or elseone's needs.
  3. Still in doubt - maybe merge wcwidth with future grapheme handling. Most of the wcwidth rules are a subset of the grapheme clustering. Once we have working grapheme handling merging them might save a lot of runtime. This is somewhat doubtful because grapheme handling might not work at an early stage with current cell based buffer model and should be added at a later stage (closer to the renderer). See also Grapheme support #1468.
@Tyriar Tyriar added the type/enhancement Features or improvements to existing features label May 23, 2018
@jerch jerch mentioned this issue May 24, 2018
@jerch
Copy link
Member Author

jerch commented May 25, 2018

While playing around with the custom setting in #1470 which addresses 2. from the list above it became clear that the emulator should primarily follow the wcwidth behavior of the backend system (where the pty is running) to avoid all kinds of weird cursor and eol behavior. This is due intelligent backend libs like libreadline that anticipate the terminal behavior and are used in almost every terminal program nowadays.

It can be solved with another node package that collects the wcwidth output from the server side. For not nodejs based server components it might need a small C helper to get those data.

@jerch
Copy link
Member Author

jerch commented Jun 13, 2018

Kinda offtopic here but related to recent issues - it is kinda hard to get wcwidth to respect the backend situation in a platform independent fashion because of the following:

  1. newer wcwidth impls are locale dependent. Plain nodejs does a bad job regarding locale settings (cant handle it at all in default builds lol). This can only be fixed by some custom C++ binding, bummer.

  2. Given 1# the next question is - which wcwidth should we link? Due to poor update policy some OSes have an outdated wcwidth in their clib (OSX), Windows has none at all. cmdline apps fix those awkward env situations by providing their own wcwidth (as we did in xterm.js) with additional heuristics to get something useful shown. While linking against the clib's wcwidth should work correctly under most Linux and BSD distros, OSX support might need additional maps per OS version and appXY. For Windows it might need a fix in winpty to get the cursor movement after a char insert, no clue yet how WSL does play into this.

  3. browser based builds of xterm.js: JS in browsers dont offer a wcwidth binding. Only chance to sync it with the backend is to get the wcwidth data from the server component to the frontend somehow in a feasible way. Polling char by char is no option due to high latency.

  4. and 2. clearly leads to an additional native tool (node module and/or some helper app). 3. needs some thinking about creating runtime data and providing it to the browser based frontend.

@mofux
Copy link
Contributor

mofux commented Jun 13, 2018

5. When using xterm.js as an SSH frontend, there is no way to reliably determine the wcwith of the target system.

@egmontkob Do you know how VTE solves this, especially for the SSH case?

@jerch
Copy link
Member Author

jerch commented Jun 13, 2018

Here is a quite elaborated view on this problem: mobile-shell/mosh#234 (comment)

@mofux
Copy link
Contributor

mofux commented Jun 13, 2018

@jerch What a mess 😢Seems unsolvable without having a living global specification for wcwidth that everybody can follow.

@egmontkob
Copy link

VTE simply uses whatever's returned by g_unichar_iswide(), which is already sub-ideal, we have a pending bug 772890 to perhaps use glibc's wcwidth() instead. In case these methods are locale-dependent (are they really? even among the UTF-8 ones?), simply the process's locale is used (in case of gnome-terminal it's the locale set up by systemd --user).

Plus there's an API option in VTE (compatibility preference in gnome-terminal) for ambiguous-width characters to be treated as wide. This changes the width of several characters. The set is somewhat arbitrary, and is subject to accidental as well as planned further changes, see bug 791452. Alas there's no locale definition to back this up, I vaguely recall there's a long-standing open (or rejected?) glibc bug on sourceware, as far as I recall it had a four-digit bug number, but the last time I tried I couldn't find it. This means that apps can't really use this behavior of the terminal emulator. Ideally I guess there'd be no such API in VTE, rather you could switch (both VTE and the apps running inside) into a different locale with a different wcwidth.

That still wouldn't solve the problem of ssh, though. And wouldn't solve how the locale used by the terminal emulator would match up with the locale of the app. Maybe be need to introduce an escape sequence for that?? And then juggle with multiple locales inside a single terminal emulator process (e.g. in case of gnome-terminal), using newlocale/uselocale/whatever_l and friends.

Seems unsolvable without having a living global specification for wcwidth that everybody can follow.

Even worse, a specification couldn't solve it either, see the mess caused by Unicode 9.0 changing the width of certain already existing codepoints (e.g. VTE 772812 and tons of duplicates out there). We'd either need a specification that's guaranteed not to change in backwards incompatible ways (Unicode seems not to make such a guarantee), or introduce some kind of versioning. I guess versioning is required anyways, an app should be able to tell if the terminal emulator doesn't yet recognize a new codepoint.

And of course, design all of these in a way that can even work across ssh, across different OSes with different notations for locales. Sigh.

Maybe, maybe just ignoring this whole problem set and living with the tiny breakages every now and then is better than overengineering it??

@mofux
Copy link
Contributor

mofux commented Jun 13, 2018

@egmontkob Thanks for your deep insights, very helpful!

Maybe, maybe just ignoring this whole problem set and living with the tiny breakages every now and then is better than overengineering it??

I'm about to agree 😅

@jerch Maybe it is a good foundation to have a pluggable wcwidth implementation, ideally one that can be hot-plugged (so it re-evaluates the CHAR_WIDTH stored in our buffer once it gets swapped out, then triggers a renderer refresh). This would at least allow us to ship different implementations and have the user switch between them (live) to see which one works best. It would also provide a way for a backend to determine its wcwidth characteristics and then choose a matching implementation in the frontend.

@jerch
Copy link
Member Author

jerch commented Jun 14, 2018

Lol what a nightmare, seems to be unfixable without the versioning thing. This is also an issue with the other unicode algos like the grapheme splitting and maybe bidi (have not yet read deeper into the latter). I like the idea to have some escape sequence asking for unicode support XY, but thats a different story, it wont help here unless specified and supported by a critical mass of cmdline apps. @egmontkob Are there any efforts drafted yet into this direction?

@mofux Yeah gonna stick to the pluggable concept, not sure yet if it makes sense to deliver preconfigured unicode version sets, not sure how closely duplexers like tmux, screen and such follow the unicode spec or do their "own thing". About live patching the widths: also not sure yet how to get this done in a sane way - we dont record the input data atm and the buffer is likely to change all over if the first char is misaligned (with all weirdness like changing the cursor position and the line wraps).

@egmontkob
Copy link

Are there any efforts drafted yet into this direction?

No, not that I'm aware of.

Note that a specification without a reasonably easy way of implementing it by emulators and apps is unlikely to get adopted. And here for the apps we'd at least need a way to query the Unicode version (I'm not aware of such method), and for the emulators we'd need to even support more of them at the same time (there are much fewer emulators than apps, so maybe inlining the databases is okay here).

@Tyriar
Copy link
Member

Tyriar commented Oct 7, 2019

@jerch can/should we close this in favor of #1709?

@jerch
Copy link
Member Author

jerch commented Oct 7, 2019

Yes I think we have there all needed bits that need to be wrapped up. Closing this issue.

@jerch jerch closed this as completed Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Features or improvements to existing features
Projects
None yet
Development

No branches or pull requests

4 participants