content: implement support for unicode emojis in messages #245

rajveermalviya · 2023-08-01T18:39:37Z

Fixes: #58

Context

Implement parsing of the unicode emojis encoded in <span class="emoji emoji-*"> elements and then generating direct TextSpans that contains the unicode codepoint for emoji and falling back to displaying emoji_name in case the parsing had failed.

This PR also vendors the "Noto Color Emoji" font in COLRv1 format (4.53 MB) that was downloaded from it's latest release.

Alternatives

As mentioned in #58 there are many alternative ways to handle the displaying of emojis. AFAIK using a font is the most efficient way to render emojis in Flutter (even for icons it prefers fonts). Using a font also provides us a way to render the emojis that are in the message content & other strings (topic/stream name). Especially on older servers which may fail to parse the newer emojis and fail to encode them in <span class="emoji"> elements and rather including the unicode codepoints in the text message content as is. (Albeit we can use the spritesheet method there too, but not without parsing each character of the message content to check for emojis.)

Now coming to different possible routes for the loading of the fonts:

Vendoring them in the app (for now the PR does this)
Using a package like google_fonts that downloads the fonts from font.google.com server and dynamically load the font in the app.
Implement the functionality from google_fonts ourselves that downloads & caches the preferred emoji font from the realm server (this will probably require that server hosts the font files) and then load the downloaded font in the app using FontLoader API.

Testing

Updated the current ones.

gnprice

Thanks @rajveermalviya! See review comments below.

We'll also want some unit tests for this code before merging it.

gnprice · 2023-08-04T01:26:03Z

lib/model/content.dart


-  final String text;
+  final String emojiName;
+  final String? emojiUnicode;


Can you describe the situation where this is expected to be null?

Yeah, after reading the surrounding code a bit more, it seems like it would be unreachable.
In previous revision, when tryParseEmojiCodeToUnicode returned null, parser would set emojiUnicode to null too. And in the widgets layer the emojiName with the fallback container would be displayed.

But thinking about it a bit more; if tryParseEmojiCodeToUnicode fails then we should just emit "unimplemented" from the parser rather than just adding a silent workaround. And also it will be pretty rare that tryParseEmojiCodeToUnicode would fail as it is mostly unreachable because of the regexp check in parser and if it does fail it would be a bug in the server.

gnprice · 2023-08-04T01:27:06Z

lib/model/content.dart

@@ -568,7 +570,8 @@ class _ZulipContentParser {
        && classes.length == 2
        && classes.contains('emoji')
        && classes.every(_emojiClassRegexp.hasMatch)) {
-      return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode);
+      final emojiUnicode = tryParseEmojiCodeToUnicode(classes.elementAt(1).replaceFirst('emoji-', ''));


This elementAt(1) implicitly assumes the emoji class comes first, and the emoji-12345 class after that. But the conditions above it don't ensure that — they might come in the other order.

ack, now uses .firstWhere()

gnprice · 2023-08-04T01:29:55Z

lib/model/content.dart

+  // Ported from https://github.com/zulip/zulip-mobile/tree/main/src/emoji/data.js
+  //
+  // Which was in turn ported from https://github.com/zulip/zulip/blob/main/zerver/models.py


When linking to code on GitHub for reference, use a permalink. Links to main can break when the file is renamed, or perhaps worse can silently become unhelpful when the relevant code moves elsewhere or otherwise changes.

GitHub has a handy keyboard shortcut y to get a permalink when you're looking at a page like this.

(I guess we didn't do that in the code you're borrowing from in zulip-mobile, oops. Still, let's fix it here.)

gnprice · 2023-08-04T02:29:11Z

lib/model/content.dart

+  String? tryParseEmojiCodeToUnicode(String code) {
+    try {
+      return String.fromCharCodes(code.split('-').map((hex) => int.parse(hex, radix: 16)));
+    } catch (_) {


Yeah, catch (_) is a bit of a code smell :-)

Generally if there's some exception we expect to sometimes happen, and doesn't represent a bug in the code, we'll want to be explicit about what we expect and just catch that exception type specifically. That way if some unrelated exception is thrown as a result of a bug, the exception propagates just like it normally would.

gnprice · 2023-08-04T02:30:35Z

lib/widgets/app.dart

@@ -12,6 +12,7 @@ class ZulipApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    final theme = ThemeData(
+      fontFamilyFallback: const <String>['sans-serif', 'Noto Color Emoji'],


Can you say more about what this line is doing and why?

The commit message says "correctly propogate emoji font fallback", but this doesn't appear to be propagating anything; instead it's introducing something new.

ack, added a comment explaining the reason and moved to a separate commit.

gnprice · 2023-08-04T03:38:28Z

For the commit adding the Noto Color Emoji font to assets, here's a revision for the commit message:

font: Add Noto Color Emoji v2.038, in COLRv1 format

These files are from:
  https://github.com/googlefonts/noto-emoji/tree/v2.038/fonts

The main difference is just to make it a bit more explicit exactly where these files come from, because the existing URL https://github.com/googlefonts/noto-emoji/releases/tag/v2.038 doesn't provide the individual font file as a download. It does lead to the whole source tree; but there's a lot of files in there (and one wouldn't necessarily expect the font file to be checked into the source tree anyway, as presumably it's the result of building the tree), so it's not obvious from that where to find these files.

(And, mentioning this mostly for my own notes: I did just go and download those files myself, and verified that I get the same contents as what's in this PR.)

rajveermalviya · 2023-08-06T04:37:16Z

Thanks for the review @gnprice, I have rebased and updated with the suggested changes.

rajveermalviya · 2023-08-10T13:39:17Z

(Just rebased to main, and squashed some commits)

rajveermalviya · 2023-08-10T19:05:17Z

(rebased again, to fix conflicts with main branch)

rajveermalviya · 2023-08-18T18:43:42Z

(rebased to main; with updated commit messages)

gnprice

Thanks @rajveermalviya for the revision! Comments below.

gnprice · 2023-08-22T03:31:15Z

lib/model/content.dart

-  static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)?$");
+  static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)*$");


Hmm interesting, good catch.

This is basically fixing a bug, right? We would have not produced a UnicodeEmojiNode when the emoji had multiple codepoints, and instead produced an UnimplementedNode.

Let's include a test that exercises that case, in the same commit that fixes the bug.

gnprice · 2023-08-22T03:37:59Z

lib/model/content.dart

+      final className = classes
+        .firstWhere((className) => className.startsWith('emoji-'), orElse: () => '')
+        .replaceFirst('emoji-', '');


The replaceFirst means this isn't really a "class name" anymore. A good name might be emojiCode — I believe this string is now the same as the Zulip "emoji code" for the emoji.

gnprice · 2023-08-22T03:42:16Z

lib/model/content.dart

+      final className = classes
+        .firstWhere((className) => className.startsWith('emoji-'), orElse: () => '')
+        .replaceFirst('emoji-', '');
+      if (className.isEmpty) return unimplemented();


Can the orElse / className.isEmpty case here actually happen?

I think it can't, even with malformed data from the server — that is, I think our own code, and in fact the few lines of code around this spot, ensure that it can't happen. In the condition above, we check that

there are two classes

one of them is just emoji

the other matches the regexp

And I think any string that matches the regexp and isn't emoji has to start with emoji-.

So at this spot we can just assume that we do find a class starting with emoji-, and barrel ahead using the ! operator or equivalent. If the resulting logic seems unclear, we can cheerfully add an assert to explicitly verify (but at zero cost when in release mode) that we did find such a class.

gnprice · 2023-08-22T03:52:51Z

lib/model/content.dart

@@ -582,7 +584,22 @@ class _ZulipContentParser {
        && classes.length == 2
        && classes.contains('emoji')
        && classes.every(_emojiClassRegexp.hasMatch)) {
-      return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode);
+      // TODO: Check if font contains the glyph for the specific unicode codepoints;


Hmm, interesting thought.

I'm not sure we'll actually want to do this, though. The question is: if the answer is that the font doesn't seem to have the emoji, then what do we do instead?

One possible answer would be an UnimplementedNode. But in a UX for general release (i.e. post #194), I think the best option for handling that would likely be to just show the Unicode anyway. And I think this case wouldn't be real informative for #190. It'd tell us that the server has updated to a newer list of emoji than we have in our font, and we should update our emoji font. But updating emoji lists is going to be an annual housekeeping task for both Zulip server/web and us for the foreseeable future, and I think we can find a lower-effort way than this to get reminded of it.

if the answer is that the font doesn't seem to have the emoji, then what do we do instead?

We could fallback to the previous behavior (before this PR) - showing the emoji name: :wave:
But yeah the emoji list will have to be maintained accordingly.

Oh yeah we can't do that because if font doesn't have that emoji that will mean the "emoji list" won't have that new emoji the user typed too. Also if all the projects will update the emoji sets at same time - that will mean server doesn't know about the new emoji too and won't send us an emoji_name.
So, I guess we can't do this - removing the todo.

gnprice · 2023-08-22T03:58:23Z

lib/model/content.dart

+      //   eg. unicode v15 set of emojis. Note that in both of these options parsing
+      //   each character of the whole message text will be required to make the
+      //   checks reliable, because emojis can be present in non encoded form in the
+      //   message text. (For example, when server hasn't been updated to account for
+      //   newer unicode versions.)


And yeah, if there are emoji in the message as just straight-up Unicode emoji, encoded in the normal Unicode way, then I am definitely happy to pass them right through to the Text widget and not worry about them.

If they're not in a font we bundled, they may still — or may not — be in a system font on the user's device. In that way they have the same status as if the message sender had used some script we didn't plan for, out of the 100+ scripts in Unicode.

gnprice · 2023-08-22T04:07:01Z

lib/widgets/app.dart

@@ -14,6 +14,14 @@ class ZulipApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    final theme = ThemeData(
+      // This correctly sets up a the font fallback for normal text that


Suggested change

// This correctly sets up a the font fallback for normal text that

// This sets up a font fallback for normal text that

gnprice · 2023-08-22T04:12:55Z

test/model/content_test.dart

+    // "🪿"
+    '<p>🪿</p>',
+    const TextNode('\u{1fabf}'));


nit: clearest to read if these either all use the \u notation, or all have the literal emoji in the source

(either way it's the same string, just two different ways in the source code of expressing it)

gnprice · 2023-08-22T04:14:40Z

test/model/content_test.dart

    // ":thumbs_up:"
    '<p><span aria-label="thumbs up" class="emoji emoji-1f44d" role="img" title="thumbs up">:thumbs_up:</span></p>',
-    const UnicodeEmojiNode(text: ':thumbs_up:'));
+    const UnicodeEmojiNode(emojiName: ':thumbs_up:', codepoints: '\u{1f44d}'));


Hmm, I guess this example shows that the thing labeled emojiName isn't exactly the name — it's the name (which here is thumbs_up) plus colons to decorate it.

One good name would be just colonsName.

Or perhaps better yet, drop that field entirely, since we're no longer using it. (Easy to add it back later if we start using it again for something.)

That will mean squashing the commit that changes how widgets/content.dart consumes UnicodeEmojiNode into the commit that changes what those nodes contain and how we parse them. I think squashing those is probably cleanest anyway.

gnprice · 2023-08-22T04:18:06Z

lib/widgets/content.dart

@@ -620,23 +619,6 @@ class UserMention extends StatelessWidget {
 //     borderRadius: BorderRadius.all(Radius.circular(3))));
 }

-class MessageUnicodeEmoji extends StatelessWidget {


Let's squash deleting this into the same commit that makes it no longer needed.

Meanwhile, from the commit message:

This widget displayed the emoji name as a fallback in a bordered container. But now that we rely on TextSpan to display the emoji codepoints text, we can't reliably determine the fallback condition _for now_.

I don't understand the last bit here. What fallback condition are you thinking of; and what would need to change between "for now" and being able to determine it?

This commits talks about the possibility to determine if the emojiUnicode is present in a supported emoji list. If not we could fallback to showing this container. See discussion.

nvm, see comment

gnprice · 2023-08-22T04:27:24Z

lib/model/content.dart

+  // >   "unified" in the following data, with "non_qualified" taking
+  // >   precedence when both present:
+  // >   https://raw.githubusercontent.com/iamcal/emoji-data/master/emoji_pretty.json
+  String? tryParseEmojiCodeToUnicode(String code) {


nit: this puts this function in amongst the functions for parsing block content; it's parsing inline content, so it should instead go amongst the parsing of inline content.

Specifically let's put it above parseInlineContent. In general this parser class has the more general and more toward-the-root logic at the bottom, and the more specific and more toward-the-leaves logic at the top.

rajveermalviya · 2023-08-22T06:33:33Z

Thanks for the review @gnprice, revision pushed!

rajveermalviya · 2023-08-22T06:35:39Z

(and rebased to main)

gnprice

Thanks @rajveermalviya for all your work on this! Looks good with just a couple of nits (below); I'll fix and merge.

Happy to see that it works nicely — here's on my Pixel 5:

before	after

In fact, it even effectively fixes zulip/zulip#11767 ! We just yesterday merged a fix for that in the server's main branch, but old messages will remain affected. Here in this client implementation, though, because we translate the Unicode emoji right back into Unicode, even the old mojibake messages show up perfectly:

before	after

That basically comes as a bonus from our decision to use Unicode (and ship an emoji font for consistency across devices) instead of managing these glyphs as images.

gnprice · 2023-08-24T21:50:37Z

lib/model/content.dart

@@ -582,7 +604,14 @@ class _ZulipContentParser {
        && classes.length == 2
        && classes.contains('emoji')
        && classes.every(_emojiClassRegexp.hasMatch)) {
-      return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode);
+      final emojiCode = classes
+        .firstWhereOrNull((className) => className.startsWith('emoji-'))!


Can simplify slightly:

Suggested change

.firstWhereOrNull((className) => className.startsWith('emoji-'))!

.firstWhere((className) => className.startsWith('emoji-'))

gnprice · 2023-08-24T22:09:08Z

lib/widgets/content.dart

@@ -460,8 +460,7 @@ class _InlineContentBuilder {
      return WidgetSpan(alignment: PlaceholderAlignment.middle,
        child: UserMention(node: node));
    } else if (node is UnicodeEmojiNode) {
-      return WidgetSpan(alignment: PlaceholderAlignment.middle,
-        child: MessageUnicodeEmoji(node: node));
+      return TextSpan(text: node.emojiUnicode, recognizer: _recognizer);


nit in commit message:

- Generate a tappable TextSpan for each unicode emoji

It's true that this will be tappable if it's inside something tappable (i.e., a link). But this line in the commit message sounds like it's saying the span will be tappable in its own way, related somehow to its status as an emoji. Which is a plausible feature — you could imagine tapping (maybe more likely long-tapping, but possibly just tapping) to pull up the name of the emoji in a tooltip, or something like that.

So just s/tappable // clarifies it, I think.

gnprice · 2023-08-24T22:58:08Z

Looks good with just a couple of nits (below); I'll fix and merge.

Well, I ran into a small road bump: this PR was causing a test to break, once rebased atop main. The issue is the test's fault, though, not this PR's. Sent #285 to fix that, and meanwhile I'll push my rebased and nits-fixed version of this PR back to the PR branch.

These files are from: https://github.com/googlefonts/noto-emoji/tree/v2.038/fonts

An emoji with multiple codepoints is encoded as `emoji-xxxxxx-xxxxxx-..` (where xxxxxx corresponds to a hex sequence). Current regexp matches strings with either zero or single occurence of subgroup, changing it to match string with either zero or infinitely repeating subgroup.

- Parse the zulip emoji code class names to unicode codepoints - Generate a TextSpan for each unicode emoji - Remove fallback emoji name container Fixes: zulip#58

gnprice · 2023-08-25T19:42:24Z

OK, and merging! Thanks again @rajveermalviya for the contribution.

tl;dr: When this Flutter issue is resolved: flutter/flutter#134897 try adding the Noto Color Emoji font back again (COLRv1 format) and set it as a fallback font behind some higher-priority font wherever we want to show text that might contain emojis. Before we do that, we're at the mercy of the system font, which might not be updated to handle newer emojis. This seems especially true of Android devices in the wild. ----- In zulip#245, we added a COLRv1 version of this font, probably because of a warning that made us doubt if other formats would work on iOS: https://github.com/googlefonts/noto-emoji#using-notocoloremoji > NotoColorEmoji uses the CBDT/CBLC color font format, which is > supported by Android and Chrome/Chromium OS. Windows supports it > starting with Windows 10 Anniversary Update in Chrome and Edge. On > macOS, only Chrome supports it, while on Linux it will support it > with some fontconfig tweaking, see issue zulip#36. Currently we do not > build other color font formats. (It seems like that part of the README wasn't updated when the COLRv1 format was added, and hasn't been updated since.) And I guess there wasn't a clear iOS compatibility problem with the COLRv1 format, so we used that. But, in zulip#245, it seems like we didn't do a good manual test that the COLRv1 version of this font could actually handle emojis in the app on iOS. That testing shows that it can't (or at least couldn't on my phone or in a simulator). When the font is in the effective list of fallbacks, it takes responsibility for rendering the emoji, before the system font gets a chance to, but then it fails to render it, and a blank space appears. We suspect this Flutter issue ("Rendering of COLRv1 fonts is broken"): flutter/flutter#134897 We don't know why that Flutter issue doesn't, or at least doesn't always, prevent emojis from successfully rendering in the font on *Android*. I followed the issue's reproduction recipe with the COLRv1 font they specified, and indeed the glyphs weren't showing up on Android (but they did on web). So apparently the specific COLRv1 font is a variable. Since we don't know what all the variables are, and we have other short-term priorities, just take the font out of the picture for now, with a plan to reinstate it once we're sure it won't cause emojis to fail to render. Discussion: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Emoji.20rendering.20issue.20.28iOS.29/near/1683965

gnprice reviewed Aug 4, 2023

View reviewed changes

rajveermalviya force-pushed the unicode-emoji branch 2 times, most recently from 3069f62 to 721c58a Compare August 6, 2023 04:21

rajveermalviya force-pushed the unicode-emoji branch from 721c58a to b997135 Compare August 8, 2023 15:23

rajveermalviya requested a review from gnprice August 8, 2023 18:22

rajveermalviya force-pushed the unicode-emoji branch 2 times, most recently from 70be18d to cca88c2 Compare August 10, 2023 13:38

rajveermalviya force-pushed the unicode-emoji branch from cca88c2 to 8ef079b Compare August 10, 2023 19:04

rajveermalviya force-pushed the unicode-emoji branch 2 times, most recently from 312bcdb to 19bf5b2 Compare August 18, 2023 18:41

gnprice reviewed Aug 22, 2023

View reviewed changes

rajveermalviya force-pushed the unicode-emoji branch from 19bf5b2 to 6965a15 Compare August 22, 2023 06:32

rajveermalviya requested a review from gnprice August 22, 2023 06:33

rajveermalviya force-pushed the unicode-emoji branch from 6965a15 to a67621b Compare August 22, 2023 06:35

gnprice approved these changes Aug 24, 2023

View reviewed changes

gnprice mentioned this pull request Aug 24, 2023

msglist test: Make double-fetch-glitch test robust to layout changes #285

Merged

gnprice force-pushed the unicode-emoji branch from a67621b to 925a157 Compare August 24, 2023 22:58

rajveermalviya added 5 commits August 25, 2023 12:41

font: Add Noto Color Emoji v2.038, in COLRv1 format

760154d

These files are from: https://github.com/googlefonts/noto-emoji/tree/v2.038/fonts

msglist: Propagate global text styles

f41d62d

app: Set global font family fallback for unicode emojis

593d88a

content: Add support for displaying unicode emojis

9e82157

- Parse the zulip emoji code class names to unicode codepoints - Generate a TextSpan for each unicode emoji - Remove fallback emoji name container Fixes: zulip#58

gnprice force-pushed the unicode-emoji branch from 925a157 to 9e82157 Compare August 25, 2023 19:42

gnprice merged commit 9e82157 into zulip:main Aug 25, 2023

rajveermalviya deleted the unicode-emoji branch August 25, 2023 19:43

chrisbobbe mentioned this pull request Aug 31, 2023

iOS: Non-emoji text wrongly being rendered with "Noto Color Emoji" font #289

Closed

chrisbobbe mentioned this pull request Nov 21, 2023

ui: Remove "Noto Color Emoji", broken on iOS (and maybe Android?) #403

Merged

		static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)?$");
		static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)*$");

	// This correctly sets up a the font fallback for normal text that
	// This sets up a font fallback for normal text that

	.firstWhereOrNull((className) => className.startsWith('emoji-'))!
	.firstWhere((className) => className.startsWith('emoji-'))

content: implement support for unicode emojis in messages #245

content: implement support for unicode emojis in messages #245

Conversation

rajveermalviya commented Aug 1, 2023

Context

Alternatives

Testing

gnprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gnprice commented Aug 4, 2023 • edited Loading

rajveermalviya commented Aug 6, 2023

rajveermalviya commented Aug 10, 2023

rajveermalviya commented Aug 10, 2023

rajveermalviya commented Aug 18, 2023

gnprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rajveermalviya commented Aug 22, 2023

rajveermalviya commented Aug 22, 2023

gnprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gnprice commented Aug 24, 2023

gnprice commented Aug 25, 2023

gnprice commented Aug 4, 2023 •

edited

Loading