Unicode Support / Cursor issues Fix #740

Merged
merged 3 commits into from Nov 18, 2016

Projects

None yet

9 participants

@BenoitAverty
Contributor
BenoitAverty commented Sep 23, 2016 edited

This is based on the work of @MrRio in his PR #535, so thanks to him for the work, I couldn't have done this without him.

This PR makes hterm add a <span> around every unicode character, even when they have default style. The span has the width of the font, which basically forces it to be a monospace font.

See this screenshot with vtop :
2016-09-23-224938_891x682_escrotum

And you can see that the cursor is still well positioned, even with my prompt abusing wide characters :
2016-09-23-225103_895x682_escrotum

There are still some problems that can't be solved (in my opinion, if anyone has a brilliant idea I'll take it)

  • Font rendering is done using OS libraries. I noticed that on some systems, some monospace fonts (for example SourceCodePro, which I use) are not monospace at all font-sizes. I'll post some screenshots on my debian work laptot on monday. Seems to be fine on ubuntu though. This can only be solved in Electron (even chromium probably).
  • The double-width chars are detected by hterm using wc-width npm library, but it only works on some specific characters (i.e Chinese characters). If there are wider characters that don't use this property (font Awesome glyphs for example), they will overlap with their neighbours, as you can see here:
    2016-09-23-225600_894x683_escrotum

This is probably not a problem in prompts because there are spaces around, and there are not wide chars side-by-side all that often. We could theoretically measure each characters before inserting them, but then we'd need to fix hterm's cursor position.

What's left to do:

  • @MrRio's todo in the "matchesContainer" function
  • Wait for someone else to test with their own use cases.

All feedback appreciated !

@MrRio
Collaborator
MrRio commented Sep 26, 2016

Nice!

@MrRio
Collaborator
MrRio commented Sep 26, 2016

matchesContainer should only return false when the line contains unicode chars (or more narrowly, if the previous char contains unicode chars)

@BenoitAverty
Contributor

As promised, here is a screenshot of HyperTerm at the same revision but on my debian VM. The problem is that some characters are larger than others, even though the font is monospace

2016-09-26-133836_956x1019_scrot

This is more obvious on the following 2 screenshots. All I did was change the font size from 14 to 16.
2016-09-26-134455_956x1019_scrot
2016-09-26-134445_956x1019_scrot

The only way I see to fix this would be to wrap every character in a , like I do currently with unicode chars, but I'm afraid this would hurt performance. Maybe we could make it optional, so that users that have problems can try to activate it ?

@MrRio
Collaborator
MrRio commented Sep 26, 2016

Can it be done for every char within a line if that line contains a unicode char?

@BenoitAverty
Contributor

@MrRio The debian problem is not related to unicode unfortunately, so no there's no way to detect when it's needed (other than maybe measuring each char independently and comparing that to the font-width). In any case I think it's a subject for another PR, I'll try to get to it after this one is closed.

Regarding the "matchesContainer" function, I added a finer control on that. I'd appreciate for someone else to test everything on other setups than mine (MacOS, other fonts, other prompts, other workflows...) because the way we monkeypatch hterm like this makes it very hard to ensure I covered every case. I think Hyperterm should use its own fork of hterm, it would be easier to maintain and scale (at the price of a higher entry cost). But again, that's another subject.

After reviews and tests, this PR should be almost ready to merge.

lib/hterm.js
@@ -1,16 +1,42 @@
import {hterm, lib} from 'hterm-umdjs';
+import unicodeStringUtils from 'unicode-string-utils';
@dotcypress
dotcypress Sep 27, 2016 Contributor

I wanna propose runes library for splitting unicode strings. Because unicode-string-utils don't support some emoji subset.(👩‍👩‍👧‍👦,❤️,👍🏽 etc)

@BenoitAverty
BenoitAverty Sep 28, 2016 Contributor

@dotcypress Hey, thanks for the feedback. Can you provide me with steps that produce unexpected result because of unicodeStringUtils? I haven't been able to create a problematic test case on my computer.

@dotcypress
dotcypress Sep 28, 2016 Contributor

Try curl https://gist.githubusercontent.com/dotcypress/35da7cec74643bc095bbe5c9a14f66f6/raw/51fa6f1e6080ce39a2d712175f077373d8b580cb/test.txt | cat

iTerm2 result:
screen shot 2016-09-28 at 11 05 21 am

Hyperterm result:

screen shot 2016-09-28 at 11 06 20 am

Hyperterm with your changes:
screen shot 2016-09-28 at 11 07 58 am

@tsl0922 tsl0922 referenced this pull request in tsl0922/ttyd Sep 30, 2016
Closed

CJK and IME support #10

@rauchg
Contributor
rauchg commented Oct 2, 2016

What are the performance implications of creating so many <span> elements?

@BenoitAverty
Contributor

@rauchg I didn't measure it precisely, but I don't think having many spans is a big problem (atom has way more I think, for example). A potential problem would be in applications that write a lot of unicode characters, like in vtop for example, because there are new spans created constantly (as opposed to a fixed hig-number of spans). It could be noticeable on slower or average computers.

@matheuss
Collaborator
matheuss commented Nov 1, 2016

Friendly ping @BenoitAverty 🤗

@BenoitAverty
Contributor

Hello matheuss. I'm not sure how to advance this PR. I think it's ready to merge, but I was only able to test on linux. I don't think Emoji support is done in this PR, but the cursor is OK and the char-width issues are mostly fixed, if we ignore platform-specific problems like I mentionned on debian (these problems come from chromium so we can't do anything about them except add a flag that enables a span around each chars, probably should do that in another PR).

I'm more than happy to work on this again if someone finds a bug, a regression, or something that should be added before merging.

Don't hesitate to ask, or tell me when I should rebase.

Thanks for looking at this :)

@matheuss
Collaborator
matheuss commented Nov 2, 2016

Amazing @BenoitAverty 😄 I'll test it today and then merge/give feedback 🙌

@winneon
winneon commented Nov 15, 2016

@matheuss Any update on the status of this?

@matheuss
Collaborator

@winneon I just discussed it with @rauchg recently, and: we're merging it very soon, before the next release 😄

@matheuss
Collaborator

@rauchg and I discussed this PR recently, and we agreed that we needed some benchmarks. The first thing that came to our minds was @indutny's latetyper. But it's not very useful in this context because it only types zs – I tried it and this PR produced almost the same results that Hyper 0.8.3 produced.

With that in mind, I moved to more visual (and less scientific) tests.

CPU usage

screen shot 2016-11-16 at 4 44 43 am

screen shot 2016-11-16 at 4 46 19 am

screen shot 2016-11-16 at 4 45 25 am

screen shot 2016-11-16 at 4 46 24 am

How it feels

v0.8.3:
kapture 2016-11-16 at 4 51 27

BenoitAverty:unicode:
kapture 2016-11-16 at 4 52 29

(Both GIFs were recorded @ 30fps and downscaled from 1416x1080 to 800x610)

Conclusions

I don't think that having lots of spans is a problem (looking into my Atom right now, I see 2550; with vtop open on this PR, i see 2879). Apparently, the problem begins when you have lots of moving spans. As per the GIF above, some frames are taking way too much time to render.

If you have something like 3 special chars per line, you should be fine – 3 static spans per line will not need much CPU to be rendered. But when you're running something like vtop, with hundreds of moving spans, it's likely that you'll experience some lags – at least with a setup like mine (details below).

I also noticed that if I resize the window while vtop is running, Hyper freezes for ~10 seconds – such behavior is not present on v0.8.3.

PS: my setup: MacBook 13" Retina, 2.6GHz i5.
PS2: both builds (v0.8.3 and this PR) were running in production mode (npm run dist). Also, I can guarantee that the scenario between each screenshot/GIF was the same.

@matheuss
Collaborator
matheuss commented Nov 16, 2016 edited

With that on mind, @BenoitAverty @MrRio and @rauchg: how could we improve this approach? There's room for improvement? There's any other approach(s) that is worth a try?

@BenoitAverty
Contributor

Thanks for taking the time to do this 👍

I don't have any idea currently on a completely different approach. The problem is that there's no way to avoid certain characters being larger than others. Given that fact, the only way to do anything on those characters is to wrap them in a span, whatever we do with that span (use width currently).

One Idea that I had at the beginning was to wrap several characters in spans instead of only one, and set the width of that bigger span to the theoretical width of the entire string, but this didn't work for some reason (probably a margin issue or something). @MrRio tried it before me and had the same result.

A dirty workaround would be to put this PR behind a configuration flag. That way people experiencing cursor issues could fix it, and people experiencing performance issues could go back to current behaviour for the time being. I see several drabacks to this:

  • If the default is to have the spans, some people will be annoyed by performances and not bother to look in the config file
  • If the default is to not have it, people will continue to open issues on this, or completely forget about hyper.
  • I don't know if having experimental config flags like this is something you want for hyper. It'd probably be better to have a "beta" or "next" release.
  • The code would become kind of complex

If i have some time, i'll try again with the larger spans, and if I have another idea I'll share it here, but currently I don't have any other idea.

@BenoitAverty
Contributor

After much investigations with @matheuss, here are some more information:

The problem with vtop happens because most fonts don't have the braille characters used to draw the graphs. As a result, chromium falls back on the first font it finds with braille, which is often not monospaced (or at least not the same size). Apple Braille on MacOS, DejaVu Sans on linux (DejaVu Sans Mono doesn't have braille unfortunately)

This particular problem can be solved by including another font, for example FreeMono which has braille and is monospaced. This can be done without creating any more spans, but:

  • I'm not sure it's work with any font. Maybe the character width doesn't always match
  • It doesn't solve anything regarding emojis, fontawesome glyphs... and as a result it doesn't fix the cursor issues : we'd have to find another way to move the cursor to the right position.

Anyway, that explains the problem with vtop but doesn't solve the problem with chars that are larger even in a monospaced font.

@rginda
rginda commented Nov 17, 2016

Hi, I'm the original author of hterm. If you'd like to try to upstream this fix into hterm rather than monkeypatching or forking, I can help.

@matheuss
Collaborator

@rginda that'd be amazing! Do you have any ideas on how to approach this?

@rauchg rauchg merged commit da4858a into zeit:master Nov 18, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@matheuss
Collaborator
matheuss commented Nov 18, 2016 edited

Thank you SO MUCH for your work on this, @BenoitAverty and @MrRio!
We (@rauchg) discussed extensively the performance issues that this method introduces, and we concluded that it's better to merge it as it is for now (so our next release can have it) and work on the performance improvements later.

Once again, thank you! 🙌

@rauchg
Contributor
rauchg commented Nov 18, 2016 edited

Since @rginda chimed in, this is a good case study to review together. @MrRio the creator of vtop can also comment accordingly.

We had to start wrapping special characters in <span> to introduce a "double width" through custom styling. But that's introducing a huge regression of performance in vtop (and perhaps other terminal apps)

I suspect the issue is that (without having read any code yet :P) MrRio re-paints aggressively by clearing the screen and re-appending lines a bunch of times per second. Which is what he should do, and I'm sure lots of terminal apps work this way.

I think that because we're seeing, with this PR in place, a huge number of <span> elements being created per second, and that seems to be the root cause of the slowdown.

If that theory is correct (and even if it's not), I think an optimization worth investigating is to recycle x-row by not acting on the ANSI clear instruction immediately. I suspect that this would involve hterm working with something like a v-dom, where it makes the changes that would be eventually rendered to the screen in-memory, and then dumps the resulting state of the terminal whenever there's time or at a certain interval.

So, my questions are:

  • Do you think the theory for why it's slowing down with vtop is correct?
  • Did you already contemplate this optimization? If it's worthwhile:
    • Do you have any recommendations on how to go about implementing it?
@BenoitAverty BenoitAverty deleted the BenoitAverty:unicode branch Nov 18, 2016
@carloscabanero
carloscabanero commented Nov 18, 2016 edited

Wanted to jump in as I tried to fix this a while ago for Blink too. I identified it on Spacemacs and just running the UTF-demo.
The font fallback is how I'm solving this at the moment (falling back to Menlo in my case, and maybe you guys can package it?). It really is good in 99.9% of the cases.
I tried spans and larger spans on Unicode chars. Larger spans will cause other glitches on the screen on certain sections (like "vertical lines"). The reason why I dumped it is because as @BenoitAverty wondered and I can confirm, with other fonts (if I recall I tried with Anonymous?) it is actually on almost all the characters. and I couldn't find a reliable way on when to apply it.
@rauchg suggestion is a great idea but if I understand it, it might require to change the rendering completely to always go through a renderbuffer, and seems like a big change on hterm for something that really is a problem with the font.
Hope we can all figure out something together! :)

@rginda
rginda commented Nov 18, 2016

@matheuss: re: Upstreaming patches to libapps/hterm.

Google can only accept patches from people who have signed their contributor license agreement. IANAL/TL;DR: You won't try to go after Google for patents or other IP rights based on your contributions, and you're under no obligation to support the work you contribute.

If you're ok with that, sign and continue:

This command clones the repo and fetches a commit-msg hook needed by the code review server:

   git clone https://chromium.googlesource.com/apps/libapps && (cd libapps && curl -Lo `git rev-parse --git-dir`/hooks/commit-msg https://gerrit-review.googlesource.com/tools/hooks/commit-msg ; chmod +x `git rev-parse --git-dir`/hooks/commit-msg)

Then make your changes and post them to gerrit for review:

   git push origin HEAD:refs/for/master

This will upload a new change to the libapps project on chromium-review.googlesource.com

Add me (rginda@chromium.org) as a reviewer, and I'll have a look or find someone else who can. Hyperterm folks should feel free to comment on the review too.

It probably makes sense to try some smaller patches first to get the hang of it.

@rginda
rginda commented Nov 18, 2016

I've always wondered if hterm.Screen would work better if it just had span for each character cell. I think there'd be a performance hit for append-only uses like cat /var/log/messages, but a win for tmux, vtop, less, and other full-screen apps. My initial assumption was that append-only was the primary use case.

We could have both screen types, and swap based on the whether the terminal was displaying the primary vs alternate screen. Then the issue would be that some things that work in the alternate screen wouldn't work as well in the primary screen.

@dotcypress
Contributor
dotcypress commented Nov 18, 2016 edited

I found couple of problems related to unicode issue.

  1. Seems like Electron's IPC has bug with UTF8 processing, see screenshot:
    UPD: Electron's IPC is ok, issue related to hrerm too

screen shot 2016-11-18 at 1 44 19 pm

  1. hrerm UTF8 issue:
    https://chromium.googlesource.com/apps/libapps/+/HEAD/libdot/js/lib_utf8.js#117
    https://chromium.googlesource.com/apps/libapps/+/master/hterm/js/hterm_terminal.js#1530

@rginda @rauchg @BenoitAverty @matheuss

@dotcypress dotcypress referenced this pull request Nov 19, 2016
Merged

🙌 unicode/emoji fix #1018

@BenoitAverty
Contributor

@rginda I think most terminal applications render each character separately to force them in a grid, instead of rendering the text itself. The way to do that for hterm would indeed be to make a grid using a span for each char.

The performance hit would be big though, but could be mitigated using virtual dom.

The main problem is that it's probably a complete rewrite of the hterm.Screen module 😅

@rginda
rginda commented Nov 21, 2016

@BenoitAverty Yes, it would be a complete rewrite of hterm.Screen. In fact, it would make sense to start with a new file so the two implementations could be compared side-by-side. The rewrite would probably end up much simpler than the current version, and much more predictable in the face of emoji and not-really-monospace fonts.

I don't know for sure that it would be a big performance hit. It may be, but the only way to know is to try it out and measure it under different use cases. hterm.ScrollPort already takes care of the Virtual DOM, and probably won't care if you swap out hterm.Screen for a different implementation.

@rauchg
Contributor
rauchg commented Nov 23, 2016

Agreed that it wouldn't necessarily decrease performance in a noticeable way, if done right (disclaimer: based on my intuition)

@iovis9
iovis9 commented Jan 6, 2017

Now it shifts the text one character to the left when I'm inside tmux (works perfectly otherwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment