-
-
Notifications
You must be signed in to change notification settings - Fork 335
How does weechat want zero width spaces to work? #1669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
WeeChat uses glibc to determine how wide characters are, and thus how much the cursor should be moved. On my system (Arch Linux with glibc version 2.33), glibc returns a width of 0 for ZWS, so then WeeChat doesn't expect the cursor to be moved. You can check this in WeeChat by running: I don't know if glibc has reported a different width for ZWS in earlier versions, but it would surprise me. In the first issue you link to you said that Konsole and xterm treat ZWS as a regular space. I don't know if this was different four years ago, but at least on my machine now it's not. In Konsole it's invisible, and in xterm it's rendered as a dotted border around the preceding character. The cursor is not moved in either. |
I see that On macOS, It looks like weechat interprets as width of -1 as 1:
I don't know if this is the right choice. The example it gives of a snowman in particular is wrong because since Unicode 9 all emoji are width 2. FWIW, in iTerm2 I have to keep a list of characters with the DI (Default Ignorable Code Point) property to avoid moving the cursor for them. |
Hm, yes, that's strange. On my machine with glibc 2.36 it returns 0, tested with this code:
I do see that the man page says this though:
Which seems to contradict what I get. Given this line, the behavior on macOS and that the comment in the code you pasted says that U+26C4 returns -1 (while I get 2 for this character now), can there have been a change of behavior in wcswidth at some point, which was not updated in the man page? I see the commit message for the code you pasted links to https://savannah.nongnu.org/bugs/?40115, but it doesn't contain that much info. It seems strange to me that it sets the length to 1 instead of 0 if wcswidth returns -1. Do you remember why @flashcode? |
By the way, in https://gitlab.com/gnachman/iterm2/-/commit/04036736f13742668037fb89fc269c9aad88f252 you say that xterm and Konsole advance the cursor on ZWS, but if I run |
U+26C4 will give -1 on macOS because macOS's wcwidth is horrible and probably hasn't been updated since Emoji was invented. A modern OS will give you 2, of course. So it's not that wc(s)width changed on macOS, it's that the rest of the world changed and macOS didn't :) I suspect that xterm and Konsole's behavior depends on wcwidth or something similar and will be platform-specific. tmux encountered similar difficulties and switched to using utf8proc on macOS. I think that's the right call—wc(s)width should be considered harmful on that platform, unfortunately. See here for their wcwidth wrapper: https://github.com/tmux/tmux/blob/master/compat/utf8proc.c#L24 |
@trygveaa: yes, some comments about this code:
I found many other problems with unicode chars, as well as other issues opened (for example for soft-hyphens), so I'm currently writing a specification to rework this and propose a new behavior, I'll post the link here once it's ready to be shared. |
@gnachman, @trygveaa: I wrote the specification: https://specs.weechat.org/specs/2022-003-fix-unicode-display.html Please tell me what you think about the proposed changes before I implement them. I can make them available on a testing branch before merging into master. |
I pushed the branch Please ping me if you find differences with the specification or display bugs (chat and bars). |
@gnachman said wcwidth on macOS returns -1 for U+26C4. So does this mean that this character will be stripped away on macOS now? If so, that's not good. If wcwidth on macOS works so poorly, I think you should consider using something else (this also ties into what I wrote on IRC with wcwidth being wrong for some emojis, which might cause issues depending on the terminal emulator). |
Yes if -1 is returned by wcwidth for U+26C4, it will not be displayed. |
So it seems the behavior is changed from displaying it with the incorrect width (leading to render issues in some cases), to not displaying it at all. This could be considered a regression, even though the bug lies in wcwidth on macOS. Either way, it seems the fix would be to not use wcwidth/wcswidth from glibc. |
Question
Weechat seems to use zero-width spaces differently than other apps.
I modified iTerm2 to advance the cursor on a zero-width space in response to https://gitlab.com/gnachman/iterm2/-/issues/5397. But now in https://gitlab.com/gnachman/iterm2/-/issues/9786 we see that the opposite is expected.
How does Weechat actually want zero-width space to work? Should the cursor move? TBH I found it surprising that you'd want the cursor to move for this non-spacing character, so if it's up in the air I'd prefer to change it to not expect movement.
Is there more to it than this?
The text was updated successfully, but these errors were encountered: