Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji width #4

Closed
gwenn opened this issue Mar 6, 2016 · 16 comments
Closed

Emoji width #4

gwenn opened this issue Mar 6, 2016 · 16 comments

Comments

@gwenn
Copy link

gwenn commented Mar 6, 2016

I am not sure but the displayed width of emoji seems to be at least 2:

"❤️"
"12"
let w = unicode_width::UnicodeWidthStr::width("\u{2764}\u{fe0f}");
assert_eq!(2, w); // (left: `2`, right: `1`)
@kwantam
Copy link
Member

kwantam commented Mar 7, 2016

The Unicode standard defines which characters should be considered wide and which should not. To my knowledge, emoji are not considered wide characters by the standard. Note also that width refers to number of columns when displayed in monospaced font; any character can appear wider when displayed in a proportional font.

(Anecdotally, the heart symbol above occupies one column in my Unicode-aware terminal.)

@casey
Copy link

casey commented Apr 3, 2017

From reading this, I believe that as of Unicode 9, emoji are now wide characters.

I also seems that as of unicode-width 0.1.4 emojis are considered to be wide characters, so this can be closed.

PS Thanks for writing this library!

@gwenn
Copy link
Author

gwenn commented May 1, 2017

unicode-width 0.1.4 returns 1...

@casey
Copy link

casey commented May 2, 2017

Ah, it looks like unicode-width 0.1.4 reports that a ❤️ is one column wide, and a 😗 is two columns wide. I didn't specifically test the heart character, just emoji.

ogham added a commit to ogham/exa that referenced this issue May 17, 2017
The change in 0828133 means that the width of emoji are used. I think the issue unicode-rs/unicode-width#4 means that the wrong width is being calculated for emoji, and there happens to be one in the example.
ogham added a commit to ogham/exa that referenced this issue May 17, 2017
The change in 0828133 means that the width of emoji are used. I think the issue unicode-rs/unicode-width#4 means that the wrong width is being calculated for emoji, and there happens to be one in the example.
@ogham
Copy link

ogham commented May 17, 2017

I thought I was experiencing this, but it turns out that my terminal was just getting the widths wrong and I was seeing it the wrong way!

@typesanitizer
Copy link

Apart from this, there is a problem with compound emojis. The current implementation just splits things up into characters and adds all the widths. That may not be correct in the presence of compound emojis like 👩‍🔬 = 👩 + ZWJ + 🔬 , as all the individual emojis have width 2.

@Manishearth
Copy link
Member

I don't think handling that is what this crate is about -- this crate implements a spec, a spec which doesn't attempt to deal with emoji.

@typesanitizer
Copy link

The docs say "we provide the width in columns". For characters in X, Y, Z categories, we do A, B, C. AIUI Emoji don't really fall into those categories, so I'd naively expect the result to be whatever makes the most sense (if there is one such result). Depending on the user's system -- whether the compound emoji can be rendered properly or not (in which case, it shows up as two separate emoji) -- the computed width will be different. The crate picks the width you'd get when it shows as split up, which is a reasonable choice.

However, since there are two reasonable answers here, I think if the precise scope and limitations of the crate were made clearer, then the behavior for compound emoji wouldn't be an issue. I'm happy to open a PR to add this clarification if you agree.

@Manishearth
Copy link
Member

if there is one such result

There kinda isn't, the concept of "width" you're asking for is a matter of font, as well as the context (many terminals will not use emoji presentation, which means those will display as two)

The crate does already mention that it follows the UTS 11 rules. Feel free to add to the readme that this may not match actual rendered column width.

@canndrew
Copy link

I'd been using this crate on the assumption that UnicodeWidthStr::width would give the actual displayed width in columns. It's a shame that that assumption doesn't hold :/

Is there a non-trivial subset of strings for which the displayed column width is exactly specified and we can rely on it being accurate for any standards-compliant terminal? If so, can we add another method to UnicodeWidthStr which returns an Option<usize>? That way my terminal GUI library can know when it might have lost track of the cursor position.

@keidax
Copy link

keidax commented Apr 3, 2019

In regards to UAX #11, the recommendations state

UTS51 emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

and as best as I can tell from this definition, "\u{2764}\u{fe0f}" would be a valid emoji presentation sequence.

In other words, it seems like the most "correct" behavior for a character with a text presentation by default, like U+2764, would be

assert_eq!(1, UnicodeWidthStr::width("\u{2764}"));
assert_eq!(1, UnicodeWidthStr::width("\u{2764}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{2764}\u{fe0f}"));

And for a character with an emoji presentation by default:

assert_eq!(2, UnicodeWidthStr::width("\u{26a1}"));
assert_eq!(1, UnicodeWidthStr::width("\u{26a1}\u{fe0e}"));
assert_eq!(2, UnicodeWidthStr::width("\u{26a1}\u{fe0f}"));

Of course, the rendering of this also seems to vary by OS and browser:

❤︎
❤️

⚡︎
⚡️

@wez
Copy link

wez commented Nov 5, 2019

I don't really know much about this space, but here's my attempt at dealing with this in a terminal emulator.

/// Returns the number of cells visually occupied by a sequence
/// of graphemes
pub fn unicode_column_width(s: &str) -> usize {
    use unicode_segmentation::UnicodeSegmentation;
    s.graphemes(true).map(grapheme_column_width).sum()
}

/// Returns the number of cells visually occupied by a grapheme.
/// The input string must be a single grapheme.
pub fn grapheme_column_width(s: &str) -> usize {
    // Due to this issue:
    // https://github.com/unicode-rs/unicode-width/issues/4
    // we cannot simply use the unicode-width crate to compute
    // the desired value.
    // Let's check for emoji-ness for ourselves first
    use xi_unicode::EmojiExt;
    for c in s.chars() {
        if c.is_emoji_modifier_base() || c.is_emoji_modifier() {
            // treat modifier sequences as double wide
            return 2;
        }
    }
    UnicodeWidthStr::width(s)
}

wez added a commit to wez/wezterm that referenced this issue Nov 5, 2019
I noticed while scrolling `emoji-test.txt` that some of the combined
emoji sequences rendered very poorly.  This was due to the unicode
width being reported as up to 4 in some cases.

Digging into it, I discovered that the unicode width crate uses a
standard calculation that doesn't take emoji combination sequences
into account (see unicode-rs/unicode-width#4).

This commit takes a dep on the xi-unicode crate as a lightweight way
to gain access to emoji tables and test whether a given grapheme is
part of a combining sequence of emoji.
@worldmind
Copy link

Not sure, but suppose that example from this article related to this issue:

fn main() {
    println!("{}", "🤦🏼‍♂️".width());
}

returns 5, but article author think that it must be 2

@Manishearth
Copy link
Member

Right, this crate is dealing with a different notion of width.

@christianparpart
Copy link

@keidax is actually right. I came here not as a rust dev, but more as a VTE dev, because I actually forgot where in the huge mass of unicode (emoji) specs I was reading that emoji presentation is always considered to be east Asian wide (2 columns in mono spaced fonts). -- so thanks for also having provided the links @keidax.

Sadly many VTEs and even client apps are still getting this wrong, but it seems to shift slightly (Kitty for example gets a lot of it right).

@Jules-Bertholet
Copy link
Contributor

#41 added support for U+FE0F. (Emoji ZWJ sequences and skintone modifiers remain unsupported, however.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests