Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Rendering of Characters Composed of Multiple Code Points in kitty #3600

Closed
mrmeowski opened this issue Jun 24, 2023 · 44 comments
Closed

Comments

@mrmeowski
Copy link

mrmeowski commented Jun 24, 2023

Description

Characters composed of multiple Unicode code points (e.g., 馃憤馃徎 which is composed of U+1F44D and U+1F3FB) do not render properly when running tmux inside kitty.

Steps to Reproduce

  1. Run kitty without any configuration from a different terminal: kitty --config NONE.
  2. Run printf "Test\t\t|\n'馃憤' Test\t|\n'馃憤馃徎' Test\t|\n" > test.txt.
  3. Run cat test.txt.
  4. Observe that the | characters are aligned properly.
  5. Run tmux without any configuration inside kitty: tmux -vv -f /dev/null new.
  6. Run cat test.txt.
  7. Observe that the | characters are misaligned.

This example is not drastic, but this bug causes problems when typing commands with multi-code-point characters in the shell or editing files with multi-code-point characters in vim/neovim: (kovidgoyal/kitty#6355 (comment)).

Screenshots

kitty without tmux:

image

kitty + tmux:

image

Gnome Terminal:

image

Gnome Terminal + tmux:

image

Environment Details

  • tmux version (tmux -V): tmux 3.3a

  • $TERM inside tmux (echo $TERM): tmux-256color

  • $TERM outside tmux (echo $TERM): xterm-kitty

  • kitty version (kitty --version): kitty 0.28.1 created by Kovid Goyal

  • Platform (uname -sp): Linux unknown (Arch Linux)

Logs

Logs of starting tmux in kitty using tmux -vv -f /dev/null new and running cat test.txt followed by exit:

tmux-client-14609.log
tmux-out-14611.log
tmux-server-14611.log

Potential Culprit

kitty combines a character with multiple code points per the Unicode standard. That is, it combines U+1F44D and U+1F3FB and displays 馃憤馃徎 instead of 馃憤 followed by 馃徎. It seems like tmux expects that those two code points will be displayed separately. The mismatch between the character width that tmux expects and the width that kitty actually uses causes this issue.

Additional Context

The relevant bug on the kitty side: kovidgoyal/kitty#6355.

@nicm
Copy link
Member

nicm commented Jun 24, 2023

Thanks, your analysis looks correct. tmux, like GNOME terminal, does not currently support this feature. Until it does, I suggest you either avoid the characters, or configure Kitty not to do this.

If you (or anyone else) want to look at adding this, the implementation would not be very difficult - it is just a form of combining characters which is already supported (look at how ZWJ is handled in screen_write_cell). The trickiest thing would be working out which codepoints it applies to, since we don't want to carry around a load of Unicode data. Possibly utf8proc can provide this, I would be OK if - initially at least - this is only supported with utf8proc. We would also need an extension terminfo(5) capability and corresponding tmux "terminal feature" to tell tmux whether the outside terminal supports it.

@nicm
Copy link
Member

nicm commented Jun 24, 2023

Possibily we could assume UTF8PROC_CATEGORY_SK has width 0, although of course this will break GNOME terminal. You could try this anyway: x.diff.txt

I don't think there is a similar way to get this information from libc... I think we would need a table. It would be nice if it was a (server) option - people are occasionally complaining that their custom PUA codepoints have the wrong width and an option would let them set their own widths as well as fixing up new Unicode versions without having to upgrade/restart tmux.

diff --git a/compat/utf8proc.c b/compat/utf8proc.c
index dd4ab27f..2d21e273 100644
--- a/compat/utf8proc.c
+++ b/compat/utf8proc.c
@@ -33,6 +33,13 @@ utf8proc_wcwidth(wchar_t wc)
                 */
                return (1);
        }
+    if (cat == UTF8PROC_CATEGORY_SK) {
+        /*
+         * Modifier symbols are only useful when combined, so treat as zero
+         * width.
+         */
+        return (0);
+    }
        return (utf8proc_charwidth(wc));
 }

@nicm
Copy link
Member

nicm commented Jun 24, 2023

You can also try this (against master): tmux-codepoint-widths.diff.txt

It adds an option codepoint-widths allowing you to override the widths of codepoints, so you can do:

set -as codepoint-widths "U+1F3FB=0,U+1F3FC=0,U+1F3FD=0,U+1F3FE=0,U+1F3FF=0"

I thought of including these in the option by default, but it appears there are other major terminals that do not support this yet so perhaps that is premature...

For these specific characters it may be better to just hardcode a list and add a terminfo(5) capability, but maybe that is overkill.

@mrmeowski
Copy link
Author

Thanks a lot for the quick response, Nicholas!

I tested the diff, and it does indeed resolve the issue in my first comment (#3600 (comment)). However, there still seems to be an issue with Unicode characters that are composed of multiple code points (e.g., 馃嚜馃嚫 which is composed of U+1F1EA and U+1F1F8).

If we run printf "Test\t\t|\n'馃憤' Test\t|\n'馃嚜馃嚫' Test\t|\n" > test.txt followed by cat test.txt, the file content gets displayed properly inside and outside tmux:

kitty (without tmux):

image

kitty + tmux:

image

However, if we open test.txt in neovim using nvim -u NONE -c 'set colorcolumn=20' test.txt, there's still an issue with the character width:

kitty (without tmux):

image

kitty + tmux:

image

In this case, I don't think we can unconditionally set the widths of U+1F1EA and U+1F1F8 to static values using codepoint-widths, right?

@nicm
Copy link
Member

nicm commented Jul 1, 2023

All programs must agree on the widths: terminal, tmux and vim. If they don't agree, things will look funny. If you configure these characters to be width 2 in tmux, you will need to configure vim in the same way, or tmux will move the cursor more or fewer positions than it expects.

@mrmeowski
Copy link
Author

Yes, I understand that, but for certain code points, I don't think the widths can be statically specified in codepoint-widths. For example, the Armenian flag emoji (馃嚘馃嚥) is composed of U+1F1E6 (馃嚘) + U+1F1F2 (馃嚥), and the Moroccan flag emoji (馃嚥馃嚘) is composed of the same code points but in the reverse order: U+1F1F2 (馃嚥) + U+1F1E6 (馃嚘) (try removing the space between 馃嚘 馃嚥 and 馃嚥 馃嚘, and the flags should be rendered).

In the case of the Armenian flag, we want 馃嚘 to have a width of 1 but 馃嚥 to have a width of zero. In the case of the Moroccan flag, we want 馃嚥 to have a width of 1, but 馃嚘 to have a width of 0. When the code points are separated (e.g., by a space between them), we want each of them to have a width of 1. Therefore, I don't think we should statically define the width of the code points in codepoint-widths. Instead, I think tmux should be reasoning about the width of an entire character (grapheme) rather than the width of a code point.

What do you think?

@mrmeowski
Copy link
Author

Perhaps we could use utf8proc_grapheme_break_stateful to automatically determine the real width of a grapheme? That way, we don't need to manually/statically specify the width of individual code points.

@nicm
Copy link
Member

nicm commented Jul 1, 2023

Yes we would need the ability to specify codepoint sequences rather than individual codepoints, and tmux would need to be able to understand "this codepoint might be followed by X or Z" in the same way as it does for keys.

I don't really like depending on utf8proc because that means it won't work without it.

@nicm
Copy link
Member

nicm commented Jul 5, 2023

Can you please try this: tmux-unicode.diff.txt

This adds a unicode-widths option instead; you can't use U+ now, you either need to enter the characters as-is or use \u/\U:

set -as unicode-widths '馃嚘馃嚥=2'
set -as unicode-widths '馃嚥馃嚘=2'
set -as unicode-widths '馃憤馃徎=2'

It works for me but this is very much a work in progress:

  • It is not very efficient as it caches the option values as a list. This should be a tree but I had changed it to a list to do it a different way and I don't have time to change it back. The cache should also store the final combined character so it can be used immediately, instead of building it in input_utf8_finish.

  • This requires that each codepoint that makes up the unicode-widths list be valid and have a valid width. So for exotic/recent characters you are likely to need to build with utfproc because wcwidth will probably not have a width for them. This limits the usefulness of this for defining widths for PUA characters, so it would be good to allow missing widths here.

  • I suspect we are missing some calls to input_utf8_finish.

  • If we are processing strings of UTF-8 in input.c instead of individual characters, we should do the same to merge zero width characters, ZWJ and so on, and remove the code in screen-write.c (this would probably be simplier in any case).

  • I think we could probably try to automatically work out the width if the user is using utf8proc, and only require them to be defined by the user for wcwidth (this could probably be done later however).

@mrmeowski
Copy link
Author

mrmeowski commented Jul 9, 2023

Thanks for continuing to work on this, Nicholas!

I applied the patch in #3600 (comment) on top of 237ee6f.

This works as expected when cating test.txt (created using printf "Test\t\t|\n'馃憤' Test\t|\n'馃憤馃徎' Test\t|\n'馃嚜馃嚫' Test\t|\n" > test.txt) and setting the following tmux options: set -as unicode-widths '馃憤=2' ; set -as unicode-widths '馃憤馃徎=2' ; set -as unicode-widths '馃嚜馃嚫=2', but, unfortunately, this doesn't resolve the issue in neovim. Here is a video of the issue:

tmux-unicode-bug.webm

@nicm
Copy link
Member

nicm commented Jul 9, 2023

Show the tmux server log please. If it works with cat - that is, following text is correctly aligned both when initially printed and when redrawing (C-b r or change window or session and back), then this is probably a mismatch in how vim and tmux see the widths.

@mrmeowski
Copy link
Author

Using refresh-client didn't seem to change the incorrect rendering

I recorded a new session:

tmux-unicode-bug.webm.webm

Here are the logs associated with the recording above:

tmux-client-30064.log
tmux-out-30066.log
tmux-server-30066.log

@nicm
Copy link
Member

nicm commented Jul 15, 2023

I meant if it is correct without vim and still correct after refresh-client without vim.

@mrmeowski
Copy link
Author

Yes, outside of neovim, the text is properly aligned when running cat test.txt followed by refresh-client (refresh-client doesn't actually change the rendering in that case).

@nicm
Copy link
Member

nicm commented Aug 9, 2023

Sorry for the delay here. I am not sure nvim is sending the same characters as actually in the file. Can you open the text file in nvim in script (inside tmux) and show me the typescript file?

@nicm
Copy link
Member

nicm commented Aug 9, 2023

I had a look at this change again, and I think we will need to do it a different method.

The problem is that these newer changes to Unicode are badly designed, particularly for stream protocols. If tmux gets a U+1F44D, it has no way to know whether it should be shown as it is or whether it will be followed by U+1F3FB.

So we can't do this the way I had intended, it will need to work like zero width characters where it goes back and replaces the original character when it sees the new one.

@nicm
Copy link
Member

nicm commented Aug 9, 2023

There are actually only a couple of hundred of these characters, so we could just list them and work out whether to combine based on the list.

Please try this instead of the last diff (only lightly tested, eg I didn't try ZWJ): tmux-unicode-new.diff.txt

You should not need to do any configuration for this (there is no option anymore).

@nicm
Copy link
Member

nicm commented Aug 9, 2023

ZWJ still seems to work, for example printf '|\360\237\244\267\342\200\215\342\231\202\357\270\217|'.

But please test and see if it works for your symbols.

@mrmeowski
Copy link
Author

Thanks for looking into this again! I applied the diff on top of 237ee6f, but it still didn't work. Here's a video of the issue:

tmux-unicode-bug.webm

Noteworthy in the video is how cating test.txt is different inside and outside of tmux:

outside tmux:

inside tmux:

Similarly, opening test.txt in neovim displays the text normally outside tmux, but the output is mangled inside tmux:

outside tmux:

inside tmux:

Are you able to replicate this at your end? Here's some information that might help with replicating:

@nicm
Copy link
Member

nicm commented Aug 20, 2023

It works for me. Are you sure you applied the diff and were running the right tmux binary? If so please show tmux logs from running tmux -Ltest -f/dev/null -vv new, cat test.txt then exit only.

Please also show me the typescript file from running script then opening the file in vim inside and outside tmux (separate files).

@mrmeowski
Copy link
Author

Are you sure you applied the diff and were running the right tmux binary?

My bad! I accidentally applied an older diff. Sorry for the confusion.

I applied the new diff (#3600 (comment)) on top of e3a8b84.

There still seems to be an issue. Here's the result of cating test.txt inside and outside tmux:

outside tmux:

inside tmux:

Here are the logs from running tmux -L test -f /dev/null -vv new /bin/bash --norc --noprofile, cat test.txt followed by exit:

tmux-client-15010.log
tmux-out-15012.log
tmux-server-15012.log

The issue with displaying the characters in neovim also persists:

outside tmux:

script output: nvim_outside_tmux.script.txt

inside tmux:

script output: nvim_inside_tmux.script.txt

During the testing, I was using /bin/bash --norc --noprofile when starting shells (e.g., started kitty using kitty --config NONE /bin/bash --norc --noprofile) to avoid any (unlikely) interference from the shell configs. The bash version I'm using is 5.1.16.

@nicm
Copy link
Member

nicm commented Aug 21, 2023

This log is not from the right build either, can you please check you are running the right binary? The server log should contain the text UTF8_WRITE_.

nvim moves the cursor around inside the characters but I will look at that later.

@mrmeowski
Copy link
Author

You are absolutely correct! I forgot that my alias for the locally-built tmux would not get used when running the shell with --norc (duh!). So third time is the charm?

For cat test.txt, the problem only seems to be with the flag now.

outside tmux:

inside tmux:

Here are the logs from running ~/tmux/tmux -L test -f /dev/null -vv new /bin/bash --norc --noprofile, cat test.txt followed by exit:

tmux-client-2283.log
tmux-out-2285.log
tmux-server-2285.log

With tmux + neovim, there is a problem with the flag and the color of the colored thumbs-up emoji.

outside tmux:

script output: outside_tmux.script.txt

inside tmux:

script output: inside_tmux.script.txt

When inside tmux, the problem with the colored thumbs-up emoji in neovim is resolved after highlighting the line in neovim:

script output: inside_tmux_highlight.script.txt

@nicm
Copy link
Member

nicm commented Aug 22, 2023

OK test.txt works for me but if I do this it is wrong:

printf '|\360\237\207\252\360\237\207\270|'

The reason is that the kernel and utf8proc both think this character is width 1.

1692644272.325423 input_top_bit_set 4 '\360\237\207\252' (width 1)
1692644272.326210 input_top_bit_set 4 '\360\237\207\270' (width 1)

However, it appears your terminal and mine both think that the character is width 2.

I don't know if any terminals will make this width 1, so we could probably just force them to width 2. Try this please: tmux-unicode-combine.diff.txt

I think I know what is wrong with vim but I am not going to work on that right now until this is working.

@nicm
Copy link
Member

nicm commented Sep 1, 2023

I have applied this to OpenBSD now since it works for me, it will be in GitHub later. Let me know if you see any further problems with this part of it.

Now I'll have a look at vim...

@nicm
Copy link
Member

nicm commented Sep 1, 2023

I made this fix as well and now trying your script file from vim does not misbehave (I have applied this change also). Can you let me know if you still see problems either by applying it as well as the last diff, or wait until GitHub syncs?

--- screen-write.c      1 Sep 2023 14:29:11 -0000       1.219
+++ screen-write.c      1 Sep 2023 16:01:24 -0000
@@ -1792,6 +1792,8 @@ screen_write_collect_add(struct screen_w
        u_int                            sx = screen_size_x(s);
        int                              collect;

+       ctx->flags &= ~SCREEN_WRITE_COMBINE;
+
        /*
         * Don't need to check that the attributes and whatnot are still the
         * same - input_parse will end the collection when anything that isn't

@0-issue
Copy link

0-issue commented Sep 1, 2023

@nicm The recent commits (in master) break multiple apps using utf box drawing characters. Here's a screenshot from fzf combining before and after:

EDIT: This sometimes does not happen in leftmost pane for same application, but always does in right pane of a vertical split.

after change (run fzf command in tmux repo):
Screenshot 2023-09-01 at 11 10 44 AM

before change (run fzf command in tmux repo):
Screenshot 2023-09-01 at 11 11 27 AM

@nicm
Copy link
Member

nicm commented Sep 1, 2023

Am I meant to guess which characters? Where are the logs?

@0-issue
Copy link

0-issue commented Sep 1, 2023

@nicm Ah, I thought you would have fzf installed, since it's so famous. Here are the logs:

tmux-client-35806.log
tmux-out-35808.log
tmux-server-35808.log

@nicm
Copy link
Member

nicm commented Sep 1, 2023

Does this fix it?

--- a/utf8-combined.c
+++ b/utf8-combined.c
@@ -1114,10 +1114,8 @@ utf8_try_combined(const struct utf8_data *ud, const struct utf8_data *last,
         * later; also force the width to two.
         */
        if (last == NULL) {
-               if (utf8_find_combined_first(ud) != NULL) {
-                       *width = 2;
+               if (utf8_find_combined_first(ud) != NULL)
                        return (UTF8_WRITE_MAYBE_COMBINE);
-               }
                return (UTF8_WRITE_NOW);
        }

@nicm
Copy link
Member

nicm commented Sep 1, 2023

Never mind, it is a different bug, please try this or wait for it to reach GitHub:

--- utf8-combined.c	1 Sep 2023 14:29:11 -0000	1.1
+++ utf8-combined.c	1 Sep 2023 18:43:08 -0000
@@ -962,7 +962,7 @@ utf8_combined_first_cmp(struct utf8_comb
 		return (-1);
 	if (ud1->size > ud2->size)
 		return (1);
-	return (memcmp(ud1->data, ud2->data, sizeof *ud1->data));
+	return (memcmp(ud1->data, ud2->data, ud1->size));
 }
 RB_HEAD(utf8_combined_tree, utf8_combined_first);
 RB_GENERATE_STATIC(utf8_combined_tree, utf8_combined_first, entry,
@@ -979,7 +979,7 @@ utf8_combined_second_cmp(const void *vp1
 		return (-1);
 	if (ud1->size > ud2->size)
 		return (1);
-	return (memcmp(ud1->data, ud2->data, sizeof *ud1->data));
+	return (memcmp(ud1->data, ud2->data, ud1->size));
 }

 static int

@0-issue
Copy link

0-issue commented Sep 1, 2023

@nicm Yes, the latest patch of yours fixes it. Thanks!

@nicm
Copy link
Member

nicm commented Sep 2, 2023

I'm going to close this to get it out of the way but please let me know if you still see issues with vim and I will reopen. Thanks!

@nicm nicm closed this as completed Sep 2, 2023
@mrmeowski
Copy link
Author

Sorry for my (very) late reply, and thanks a lot for all your work on this, Nicholas!

I tested the latest changes at master (f68d35c), and there still seems to be some issues outside and inside neovim.

For outside neovim, here is exactly how to reproduce:

  1. Open kitty using kitty --config NONE /bin/bash --norc --noprofile.
  2. Run printf "Test\t\t|\n'馃憤' Test\t|\n'馃憤馃徎' Test\t|\n'馃嚜馃嚫' Test\t|\n" > test.txt.
  3. Run cat test.txt.
  4. Observe that the vertical bars are aligned properly:
    image
  5. Run ~/tmux/tmux -L test -f /dev/null -vv new /bin/bash --norc --noprofile.
  6. Run cat test.txt.
  7. Observe that the vertical bars are not aligned:
    image

Here are the tmux logs for this session:
tmux-client-17039.log
tmux-out-17041.log
tmux-server-17041.log

For inside neovim:

  1. Open kitty using kitty --config NONE /bin/bash --norc --noprofile.
  2. Run printf "Test\t\t|\n'馃憤' Test\t|\n'馃憤馃徎' Test\t|\n'馃嚜馃嚫' Test\t|\n" > test.txt.
  3. Run nvim -u NONE -c 'set colorcolumn=20' test.txt.
  4. Observe that the color column is aligned properly:
    image
  5. Run ~/tmux/tmux -L test -f /dev/null -vv new /bin/bash --norc --noprofile.
  6. Run nvim -u NONE -c 'set colorcolumn=20' test.txt.
  7. Observe that the color column's misalignment:
    image

Here are the tmux logs for this session:
tmux-client-34960.log
tmux-out-34962.log
tmux-server-34962.log

Here are the script files for running neovim inside and outside tmux (recorded using script -c 'nvim -u NONE -c "set colorcolumn=20" test.txt' <filename>):

inside_tmux.script.txt
outside_tmux.script.txt

Software Versions:

  • kitty: 0.29.2
  • Bash: 5.1.16(1)-release
  • Linux kernel: 6.4.12

@mrmeowski
Copy link
Author

mrmeowski commented Sep 9, 2023

It's worth noting that the issue is not just about cosmetics. For example, when editing text.txt in neovim in tmux, typing Gfe in normal mode should position the cursor on top of the "e" in "Test" on the fourth line, but it positions it over the "t" instead:

image.

The cursor is positioned correctly with Gfe when editing test.txt in neovim outside of tmux:

image

@nicm
Copy link
Member

nicm commented Sep 13, 2023

Ah, it looks OK for me because iTerm2 has DECSLRM so the way tmux does cursor positioning is different and this hides the problem.

It was working but I broke it by accident, please try this:x.diff.txt

@mrmeowski
Copy link
Author

Thanks for the quick fix! I tried 8191c58 which contains the diff above. It fixes the misalignment of the vertical bars when cating test.txt inside tmux. However, the neovim issue still persists.

Do you need me to capture some logs at 8191c58?

@nicm
Copy link
Member

nicm commented Sep 14, 2023

No, I'll have a look at vim at some point.

@nicm nicm reopened this Sep 14, 2023
@nicm
Copy link
Member

nicm commented Sep 15, 2023

Is this better? I was trying to get away from reaching into the existing data to check if we need to combine because that is going to be slow, but I think that is what we will have to do: width.diff.txt

@nicm
Copy link
Member

nicm commented Sep 15, 2023

Don't bother with that one, it doesn;t work with actual zero width characters, will send a new one later.

@nicm
Copy link
Member

nicm commented Sep 15, 2023

Try this please: width2.diff.txt

I'm not wild about calling mbtowc to look up the character in the table but maybe it will be alright...

@mrmeowski
Copy link
Author

Thanks for quickly looking into this, Nicholas! I applied the latest diff to 9f9156c and cannot reproduce the issue anymore 馃帀

@nicm
Copy link
Member

nicm commented Sep 15, 2023

OK I have applied this now, please let me know if you see any further problems.

@nicm nicm closed this as completed Sep 15, 2023
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants