Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode problem #1057

Closed
oblitum opened this issue Sep 7, 2017 · 101 comments
Closed

Unicode problem #1057

oblitum opened this issue Sep 7, 2017 · 101 comments

Comments

@oblitum
Copy link

oblitum commented Sep 7, 2017

~ uname -sp && tmux -V && echo $TERM
Linux unknown
tmux master
xterm-termite
  • System is ArchLinux.

Test run with:

  • tmux -Ltest kill-server
  • tmux -vv -Ltest -f/dev/null new
  • vim ./emojis

Logs and test file used:

Results:

overflow

Issue:

The overflow on column 80 doesn't happen for vim outside tmux. This issue pervades all console applications.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

It's specially problematic with terminal IM applications, like weechat for example. Too many emojis == broken UI.

@nicm
Copy link
Member

nicm commented Sep 7, 2017

This is a most likely a problem with your system locale implementation. It is telling tmux these codepoints are width 1:

1504743323.298691 input_utf8_close 4 '\360\237\244\227' (width 1)
1504743323.298807 input_utf8_close 4 '\360\237\214\237' (width 1)

But in fact they should be width 2.

You can try building tmux against libutf8proc, or see if there is a later version of your libc, or see if you can find a font and terminal where the width of these symbols agrees with libc.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

Hi, I didn't understand how my locale implementation is the issue, for what it's worth I use:

LANG=en_US.utf8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=C
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

But you say the implementation, you mean, ArchLinux is broken? I have this issue for quite a long time I can't remember if it was since day one I started with tmux. If I have to apply patches, it's probable that they are needed in the official distribution packages. Replacing libc doesn't sound as a good idea, because it would affect the system entirely. Replacing terminal/font sounds a bit ad-hoc, I was a user of xfce4-terminal before termite and the problem was the same. So it seems the best option left is to try a tmux build against libutf8proc. I don't know exactly how to do it so I'll have a look. I still wonder, if it's an issue with system's locale implementation, why only tmux is affected? Because all terminal applications run outside tmux work fine.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

Sadly, building with --enable-utf8proc made things worse. The former problem didn't get solved, and typing in the command line is now completely unpredictable. Example, my prompt is like:

~/Desktop ❯❯❯  |

Where | is cursor position, and it's now quite far from ❯❯❯.
I then type something:

~/Desktop ❯❯❯  ls|

And force tab completion:

~/Desktop ❯❯❯ ls |

The text gets moved back one char but the cursor is left in the same position, with one additional space before it. Now if I press backspace all the way trying to erase the text, I can't, it erases solely up to:

~/Desktop ❯❯❯ l|

I can't erase the first char, it's left there visually only, because internally it was. Pressing enter doesn't result in a non-existing command l. This is just one case, as a whole it's quite unpredictable now.

@nicm
Copy link
Member

nicm commented Sep 7, 2017

Did you entirely do "make clean" before rebuilding with utf8proc?

Probably these codepoints changed width in one of the newer Unicode versions.

Latest utf8proc is Unicode 9.0; glibc 2.23 is Unicode 8.0 and 2.24/2.25 is Unicode 9.0; glib (used by VTE) is Unicode 10.0 as of 2.53.4 and 9.0 as of 2.50.1 and 8.0 as of 2.47.1.

Anyway you need libc/utf8proc and terminal/font to be on the same version or you will have problems with symbols where the width disagrees.

Or it could just be a bug or disagreement in one of the other.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

I've installed the package from AUR from scratch. I'll try the former test with more fundamental large unicode characters that didn't change width between unicode versions.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

The only thing not worth is trying with utf8proc I think.... I'll try with it again, with a different terminal prompt, because my unicode one may be interfering.

@nicm
Copy link
Member

nicm commented Sep 7, 2017

Try something like ハ (printf "\343\203\217") which is width 2 on all the platforms I have here.

@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

OK, the problem doesn't show up with:

|       |
|  ハ☺  |
|       |

So, according with your Unicode version listing, this is quite a mess to try to solve. No idea whether I'll figure it out. Thanks for the help.

@oblitum oblitum closed this as completed Sep 7, 2017
@oblitum
Copy link
Author

oblitum commented Sep 7, 2017

cc #836 as it's related.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

@micm can you tell me which emoji font you use that displays 🤗🌟 in vim under tmux correctly (without ther overload issue)? I've changed to emojione and it had no effect.

My settings where this test always produce the same issue:

  • glibc 2.25 -> Unicode 9.0
  • glib 2.52 -> Unicode 9.0
  • With or without --enable-utf8proc

Only difference is that my prompt containing ❯❯❯ gets messy when --enable-utf8proc is passed. If I remove ❯❯❯ it gets back to normal, so with --enable-utf8proc these characters cause some issue. I've tried them with two fonts (Monoid and monofur for Powerline) but it didn't matter.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

by the way, these version numbers correspond to the current state of ArchLinux.

@oblitum oblitum reopened this Sep 8, 2017
@nicm
Copy link
Member

nicm commented Sep 8, 2017

I don't have a font that shows the symbols, my font just shows them as boxes that cover two normal size characters. But on Ubuntu with glibc 2.24, tmux is correctly told that these are width 2. If I build with utf8proc (I have 2.0.2) they are also width 2:

1504853923.543282 input_utf8_close 4 '\360\237\244\227' (width 2)

You can check by doing:

rm tmux*.log
tmux -Ltest -vv new
printf '\360\237\244\227'
exit
grep input_utf8_close tmux-server-*.log

Make sure your log shows this is width 2 and see if you still have problems.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

screenshot from 2017-09-08 08-04-32

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Is this still a problem for you with xterm or with gnome-terminal? gnome-terminal with Courier works fine for me:
xxx

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

@nicm cat does not serve as test. It's OK for me with cat too. The issue surfaces solely inside terminal programs.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

No it is definitely OK:

xxx

Or you can see by entering copy mode and moving across the character, tmux will move two columns over the double width characters:
a1
a2

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

I'll try with your font and check logs... though I have some Noto emoji fonts which I may need to uninstall or something to have plain Courier.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Try to cat this test file instead, this is what vim is sending for you: newtest.txt

@nicm
Copy link
Member

nicm commented Sep 8, 2017

The reason it works outside tmux, is because vim is using absolute cursor positioning (so it draws the characters then it moves to column 80 directly). tmux is using relative cursor positioning (so it draws the characters then moves right by 52 columns), which means it needs accurate widths for the characters.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

OK, changed to Courier and used cat with new test, same behavior. Outside tmux OK, inside not OK.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

Will try to eliminate trace of fallback emoji fonts.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

You need to persuade either libc or utf8proc to tell tmux the right widths for these symbols or they are not going to work.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

By the way, just like to comment that I've just talked with another person on Mac iTerm2 and there was no problem there. Except that inside ssh he got issues, only the first char was shown.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

You need to persuade either libc or utf8proc to tell tmux the right widths for these symbols or they are not going to work.

I have no idea how to do that. I'm hoping font change will do it. Or maybe terminal emulator.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

No those will not help I suspect. What utf8proc version do you have?

@nicm
Copy link
Member

nicm commented Sep 8, 2017

I think you need to build with utf8proc 2.0.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

@nicm utf8proc on Arch community repos is 1.3.1. I've just removed the noto-fonts-emoji that was responsible for my emojis, now I'm just getting the block chars instead. Still, same issue. So I think font will make no difference indeed.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

Hi @nicm, any reason this one doesn't fit? It seems the characters and the overflowing is there. Sending a full log is problematic because of sensitive information while I browse mutt, that's the reason I've cut it to the point I enter the inoffensive test email screen.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

this one does not tell me how wide your terminal is

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Oh I think it is 240 columns.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

You can consider it's the same as my original issue log, I didn't change anything in that regard. I've started tmux the same way, just opened mutt and browsed to the affected email. Screen, etc is unchanged.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Your terminal is 239 columns. This is a problem with mutt, it is sending too many spaces after the Unicode characters, probably because it thinks they are width 1 instead of 2. This is what it is sending to tmux
mutt.txt, this wraps on a 239 columns terminal.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Any program that is using wcwidth() is likely to incorrectly display the emoji codepoints and any program using utf8proc is likely to incorrectly display the * symbol. Anything using glib should be OK.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

OK, thanks for the analysis. I was suspecting of mutt to be doing that... This is an Unicode nightmare.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

@nicm will you add the glib option upstream? It would be nice to have it.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

mutt uses wcwidth() so you will need to wait until your libc is updated.

glibc is Unicode 9.0 as of 2.26 and Unicode 10.0 as of 2.27. I am guessing Ubuntu have backported it because I only have 2.24 and it seems OK.

I you wait for 6 months or a year everything will catch up and these symbols will work.

@nicm
Copy link
Member

nicm commented Sep 8, 2017

Actually it might be 2.25 for 9.0 and 2.26 for 10.0, their changelog is pretty unclear. Anyway eventually you will have it.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

Ah, OK then. Didn't thought of viewing it as a matter of time.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

I you wait for 6 months or a year everything will catch up and these symbols will work.

@nicm One question. If glibc claims Unicode 9.0, those symbols should have been working already right? So it means that glibc is currently broken regarding their Unicode support statement or that they adopt it gradually?

@nicm
Copy link
Member

nicm commented Sep 8, 2017

I don't know if those symbols are changed in 9.0 or 10.0 and I don't know what version of Unicode glibc 2.25 actually is.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

OK, at least they claim Unicode 9 on their NEWS file.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

Scratch that, I've confused glib NEWS with glibc NEWS, I don't know about the latter.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

For what's worth, I checked their NEWS and there's no claim of full support for Unicode 9, they jump from 8 to 10.

@oblitum
Copy link
Author

oblitum commented Sep 8, 2017

Sorry to bother but I have another question I'd like to clear up. Just to fully diagnose the mutt use case and why it wraps like in that screenshot at the bottom.

The stack without tmux is:

  • mutt calculates chars using glibc, the emojis are counted as width 1.
  • termite uses glib, for which the chars are width 2.

It displays correctly but I don't know the explanation.

The stack with tmux is:

  • mutt calculates chars using glibc, the emojis are counted as width 1.
  • original tmux from master using glibc receives the char stream from mutt and counts emojis as width 1 too.
  • termite uses glib, for which the chars are width 2.

It displays incorrectly but I don't know the explanation.

@oblitum
Copy link
Author

oblitum commented Sep 9, 2017

I you wait for 6 months or a year everything will catch up and these symbols will work.

@nicm I don't know whether it was b/c I asked about the update on #ArchLinux yesterday, but they upgraded yesterday, and... IT'S FIXED!

I guess my last question doesn't matter anymore, except for sanity sake.

@oblitum
Copy link
Author

oblitum commented Sep 15, 2017

This escalated fast, now I have color emojis on termite without asking.

@tsujigiri
Copy link

tsujigiri commented Oct 12, 2017

I had the same issue as in #1057 (comment), which disappeared when I installed libutf8proc 2 via libutf8proc-julia and rebuilt tmux (the current version in Arch Linux). I'm using fish (the shell) and my only remaining problem is that fish's right prompt still doesn't align with the right border of the shell, because apparently fish doesn't get the correct character widths either. But I can do without the right prompt for now. Thanks, @oblitum, for figuring this out! This already improved my quality of life significantly! #firstworldproblems

@anubhavcodes
Copy link

I am having the same problem with tmux built without/without libutf8proc on MacOS High Sierra. Does anyone knows a fix here?

@oblitum
Copy link
Author

oblitum commented Jan 17, 2018

@neo1691 the correct fix is to figure out how to get Unicode width calculation in sync with all the programs involved, for your platform. Since Sep 15, 2017 this is now working on ArchLinux because of that, glibc was upgraded to 2.26, supporting Unicode 10, and many programs rely on it (tmux, mutt, etc). Termite, which uses glib, not glibc, has compatible Unicode char widths, so nothing breaks.

@accessd
Copy link

accessd commented Apr 26, 2018

@neo1691 If you are using iTerm try this solution gpakosz/.tmux#60 (comment)

@anubhavcodes
Copy link

anubhavcodes commented Apr 26, 2018 via email

@anubhavcodes
Copy link

@accessd I finally got time to look into this after setting up everything again on a new Mac. Thanks a lot for linking the solution. The problem is solved, however there is one issue that still persists.

When iterm is in window mode (not full screen), and you try to run a command that goes onto the next line, iterm makes it wrap onto the same line. Like the screenshot below:

screenshot 2018-05-29 16 49 23

Any idea what might be causing this?

@lock
Copy link

lock bot commented Feb 15, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

5 participants