emoji: Use some more informational source for generating mapping information. #3923

HarshitOnGitHub · 2017-03-04T04:08:17Z

Fixes: #3972.

zulipbot · 2017-03-04T04:08:17Z

Hello @zulip/server-emoji members, this pull request references an issue with the area: emoji label, so you may want to check it out!

timabbott · 2017-03-05T00:25:12Z

Just took a quick look to get a feel for how you're doing this. Re; the iamcal data, I think we might want to implement a system where tools/provision does a clone+checkout of the repo, and we just have to update the commit ID in use, rather than checking in the JSON. But I'd wait to do the actual work of writing that provisioning tooling until we're confident we're happy with that data set (it's a requirement to merge, not to prototype and figure out how this should work!).

HarshitOnGitHub · 2017-03-05T18:47:21Z

@timabbott @rishig I have pushed here a working prototype to showcase what iamcal's emoji.json is offering to us. It's just a working prototype not for review. Can you have a look!?

timabbott · 2017-03-06T00:52:59Z

Cool, this seems pretty good; @rishig as our resident emoji expert, do you want to check it out as well? You need to provision, and it's only in the compose emoji picker (not reactions).

I did notice a few things:

I think this breaks the symlink farm or something, because I tried sending some emoji, and got 404s. Didn't investigate.
We don't support skin tone modifiers in our markdown right now, so we should probably hide that category until we can implement it (and we might want to make it a toggle UI-wise anyway when we do).
emoji_codes.js is still only 68K uncompressed, so while we could probably optimize it, I don't think we strictly need to for now.

In emoji_codes.js, do we want the category mapping to be to emoji names? I was kinda thinking we might want it to go like this:

We have a bunch of codepoint -> (list_of_emoji_names) objects.
Categories are represented as a map from categories to lists of codes.

but @rishig may have thoughts on the long-term structure.

timabbott · 2017-03-06T01:06:18Z

For the picker, I just noticed https://github.com/needim/wdt-emoji-bundle, which uses the iamcal data set (just saw this linked). It could be worth seeing if integrating (or forking) that would be an easier path than making a nice picker ourselves from bootstrap popovers.

If we decide we don't want to use that, I think it's reasonable for you to start working on a system for managing emoji.json from a Git ID in their repository. You can read the code deleted in ce94fb2 to see a reasonable way to do this without a lot of code.

HarshitOnGitHub · 2017-03-07T05:28:03Z

@timabbott The main problem with https://github.com/needim/wdt-emoji-bundle is that it currently has no support for custom emojis and also there are some glitches like text not showing while I type in the textbox etc.

timabbott · 2017-03-07T06:32:53Z

Yeah, OK, sounds like it may be buggy. But since it's MIT-licensed, we can probably borrow some of their CSS at least :)

rishig · 2017-03-08T00:40:09Z

In emoji_codes.js, do we want the category mapping to be to emoji names? 
I was kinda thinking we might want it to go like this:

    We have a bunch of codepoint -> (list_of_emoji_names) objects.
    Categories are represented as a map from categories to lists of codes.

but @rishig may have thoughts on the long-term structure.

yeah, I think category -> list of unicode codepoints is probably good as a long term structure.

We'll also need a category for zulip.png (the "unicode codepoint" there would be zulip) , and for any future custom zulip emoji. Let's just call that category "Zulip".

HarshitOnGitHub · 2017-03-10T19:31:19Z

I am stuck on this PR from the past one and half week without much progress. With every change that I am making I am getting a feeling that our emoji system is getting more and more complex and fragile and maintaining it in the future will be a real pain. I think we need a more structured plan to fix our emoji system from the starting itself. @timabbott @rishig thoughts?

timabbott · 2017-03-10T19:35:25Z

Well, I think once we have everything working well off the iamcal emoji data sprite sheets, I think it should be possible to remove most of the older code, and thus simplify the system. The problem is that before doing that we'll need to migrate historical messages (or otherwise put in a backwards-compatibility system), but I think that should be manageable.

But until we're ready to do that, the system will definitely increase in complexity for a while.

timabbott · 2017-03-10T19:35:56Z

What problems are you stuck on currently @HarshitOnGitHub? I can start reviewing and merging pieces if that'd be helpful.

HarshitOnGitHub · 2017-03-10T19:46:29Z

There is no specific problem as such but as I make some new change that change tends to break some other part of the code and due to the complexity it sometimes become really difficult to track if it's going to break something and most of the times it does and I need to add some kind of handcoded exceptional logic to fix it which I think will make it harder to make changes at a later stage.

timabbott · 2017-03-10T19:57:32Z

Yeah, that's life working with a complicated system that doesn't have tests :).

I suspect we can mostly solve the long-term maintainability problem by writing tests for the places where we needed to hardcode an exception (e.g. Node tests that inspect emoji_codes.js and check that certain emoji map in a certain way). I think those should be pretty quick to write.

I think the medium-term plan should be for new messages just use the iamcal data set (and not even use the old emoji-map.json) and actually just use their sprite sheets directly as well (I guess we'll have to move the zulip emoji back to an independent image), since then the only outputs we need to produce are emoji_codes.js and sprite sheet CSS output pieces. That seems like something we should be able to maintain pretty easily once we get there. But this is definitely a significant migration project.

HarshitOnGitHub · 2017-03-10T20:08:07Z

Yeah, I think removing our sprite generation code is going to reduce a lot of complexity.

rishig · 2017-03-10T20:16:03Z

I definitely remember feeling overwhelmed by the complexity of the emoji system when I was dealing with it.

For backwards compatibility, is the symlink farm sufficient? Realm emoji are just external links anyway, right?

HarshitOnGitHub · 2017-03-10T20:24:10Z

symlink farm is sufficient for maintaining the backward compatibility as long as we can generate it without errors. We are using emoji_map.json for generating the symlink farm and it differs from the iamcal's data set at a number of points. My last commit fixes just that. But I have a list of several emojis which I am not able to extract from iamcal's data set as they would require hand coded mapping and hence no symlinks are being generated for them at this stage.

rishig · 2017-03-10T22:00:23Z

There may be a misunderstanding here. I think the proposal is to have two completely independent systems:

System 1, only used for rendering old messages: symlink farm (generated by emoji_map.json), which point to png files extracted from the ttf files. No changes are needed here.
System 2, for new messages: We use the iamcal data set, and the iamcal sprite sheets. The rendered emoji look something like <span class=emoji-u1234>:simple_smile:</span>, and a bunch of generated CSS causes this to appear like 🙂 rather than :simple_smile:.

So there should be no interaction between the old names and the new names, and no symlinks should be generated from the iamcal data. Does that make sense? We can also move this to chat.zulip.com, just at-mention me there if you start a thread.

HarshitOnGitHub · 2017-03-11T07:30:47Z

@rishig

System 1, only used for rendering old messages: symlink farm (generated by emoji_map.json), which
point to png files extracted from the ttf files. No changes are needed here.

The symlinks do not directly point to the png files extracted from the ttf file rather they to the png files in out/unicode which are essentially copies of png files extracted from the ttf files after being renamed using a mapping generated from NotoColorEmoji.ttx and iamcal's dataset and there are some codepoints on which either emoji_map and iamcal's dataset don't agree or they are not bound to any short name in iamcal's dataset.

HarshitOnGitHub · 2017-09-28T18:34:33Z

@timabbott @rishig Since it was almost impossible to rebase the old branch I have redone this from scratch and this seems to be ready for a review. This apart from switching us to use sprite sheets also fixes a longstanding issue of number emojis being not searchable in emoji picker ever since we introduced emoji aliases. Also includes some fixes to notifications.py so that we don't break emoji rendering there.
Sorry for a single large commit but it was necessary to maintain commit atomicity.

P.S.: The test suite that I had added for relative URLs in digest emails is working well. It prevented me from introducing a bug related to zulip emoji.

timabbott · 2017-09-28T18:50:05Z

zerver/lib/notifications.py

+        content)
+    content = content.replace(' class="emoji"', ' style="height: 20px;"')
+
+    return content


Just a comment: I think we may want to eventually migrate this code path to use xml.ElementTree for doing the rewriting; that would likely be a lot more robust than the regular expressions we've been doing.

But it's not a regression, so OK for now :)

Yeah I was also thinking of replacing these regex with lxml while writing code for replacing spans with img tags.

timabbott · 2017-09-28T18:51:09Z

It's great that the "big commit" is only ~100LOC, much of it changes in tests. I just looked through it, and the first commit at least looks good. I'll plan to test-deploy to chat.zulip.org today.

HarshitOnGitHub · 2017-09-28T18:58:52Z

It's great that the "big commit" is only ~100LOC, much of it changes in tests.

Haha, doing it took more than I had expected. Will it make sense to make a tool for automatically updating markdown_test_cases.json fixtures once user verifies that changes are actually correct.

timabbott · 2017-09-28T19:07:16Z

I've for a while wanted a pretty-printer at least for the diff when you get a failure in markdown_test_cases failures, since it can be really hard to read as it is. I think there's an issue for it.

timabbott · 2017-09-28T21:59:02Z

A few small issues:

In the typeahead, the emoji seem like they're a pixel too low.
Also, typeahead_helper.js is missing complete node test coverage and thus failing CI.
When emojis_by_unicode is removed, you forgot to remove the definition.
We should bump PROVISION_VERSION to 10.0 somewhere in this branch, since we'll need build_emoji to be rerun when switching branches in either direction.

Also, can you summarize the impact expected from changing name_to_codepoint to be based on iamcal? I want to make sure we have thought through everything. Potential issues to think about:

Existing emoji in messages. This should be OK, since we don't access the emoji libraries at all when interacting with these
Emoji reactions. This should be OK if all of the code for what emoji to show is based on using the emoji_code, and the emoji_name is just used to print a name. I'm not sure we have that quite yet; at least, the unique_together on the data model is still tied to emoji_name.

Can we split the syntax change in the sprite sheet from that without much effort? If I'm not mistaken, 90% of the code in the first commit is the sprite sheet switch, and I'd love to be able to merge that piece right away (and get feedback) before confronting the name change problem, which is a lot more risky.

rishig · 2017-09-28T22:03:51Z

I thought the name change was only for new codepoints? I.e. if a codepoint has a name in the current system, it's keeping its name till a bigger naming overhaul that should happen in a couple of weeks.

HarshitOnGitHub · 2017-09-28T22:11:31Z

Also, typeahead_helper.js is missing complete node test coverage and thus failing CI.

I had fixed it, may be forgot to push the new version, will push it now.

When emojis_by_unicode is removed, you forgot to remove the definition.

Oops will fix it.

We should bump PROVISION_VERSION to 10.0 somewhere in this branch, since we'll need build_emoji to be rerun when switching branches in either direction.

Yeah I had realized later and added a separate commit to bump the provision version while fixing the typeahead coverage issue, will push now.

Also there are no name changes in this PR. I have used the same names as we were using earlier.

HarshitOnGitHub · 2017-09-28T22:15:09Z

Also, can you summarize the impact expected from changing name_to_codepoint to be based on iamcal? I want to make sure we have thought through everything. Potential issues to think about:

Existing emoji in messages. This should be OK, since we don't access the emoji libraries at all when interacting with these
Emoji reactions. This should be OK if all of the code for what emoji to show is based on using the emoji_code, and the emoji_name is just used to print a name. I'm not sure we have that quite yet; at least, the unique_together on the data model is still tied to emoji_name.

Another thing to think about it is notifications which we have already fixed. For emoji reactions we are already using the iamcal's version of name_to_codepoint with some hacky changes that I just removed in this PR.

HarshitOnGitHub · 2017-09-28T22:18:02Z

Actually changing the name_to_codepoint to be iamcal based is going solve a number of problems. Till now there was an inconsistency in name_to_codepoint and codepoint_to_name data structures about which I had to remember everytime I made a change but both of them are now based on iamcal.

This commit switches to use sprite sheets for rendering emojis in all the remaining places, i.e., message bodies and composebox typeahead. This commit also includes some changes to notifications.py file so that the spans used for rendering emojis can be converted to corresponding image tags so that we don't break the emoji rendering in missed message emails since we can't use sprite sheets there. Fixes: zulip#3972.

This hack was used to fix the broken flag emojis in emoji-picker. It was broken due to the incomplete migration to iamcal dataset. See issue zulip#4775 for more details.

…tly. This hack was used to fix the broken number emojis in the emoji picker. It was broken because of the partial migration to the iamcal dataset. See issue zulip#4775 for more details.

This commit removes some unused data structures: `emojis_by_unicode` and `default_unicode_emojis` to be particular.

This list contained a list of codepoints of all the emojis. We now have `codepoint_to_name` in emoji_codes.js and hence this is not required.

…hanges.

timabbott · 2017-09-29T18:23:46Z

Merged, thanks @HarshitOnGitHub! This has been one of the more epic projects in Zulip; huge thanks for all your work to make this happen!

HarshitOnGitHub force-pushed the Issue_3895 branch 2 times, most recently from fa327cd to bbe4c59 Compare March 4, 2017 13:01

HarshitOnGitHub force-pushed the Issue_3895 branch from c0de089 to 3b4a929 Compare March 5, 2017 18:38

HarshitOnGitHub mentioned this pull request Mar 5, 2017

emoji: Duplicate emoji entries in emoji picker. #3952

Closed

HarshitOnGitHub mentioned this pull request Mar 8, 2017

Change bugdown emoji rendering to use the sprite sheet rather than hardcoded URLs #3972

Closed

HarshitOnGitHub force-pushed the Issue_3895 branch 2 times, most recently from 1c3ca55 to bd13829 Compare March 10, 2017 19:28

synicalsyntax added the needs review label Mar 16, 2017

synicalsyntax assigned rishig and timabbott and unassigned rishig Mar 16, 2017

timabbott mentioned this pull request Aug 28, 2017

emoji: Fix mis-alignment of realm emoji div with other div in emoji picker. #6089

Closed

HarshitOnGitHub mentioned this pull request Sep 1, 2017

Add support for using different emoji icon sets like EmojiOne and Twemoji #209

Closed

HarshitOnGitHub force-pushed the Issue_3895 branch from e92dc55 to b22d3ce Compare September 28, 2017 18:20

zulipbot added needs more work and removed needs review labels Sep 28, 2017

timabbott reviewed Sep 28, 2017

View reviewed changes

HarshitOnGitHub force-pushed the Issue_3895 branch from b22d3ce to cce5148 Compare September 28, 2017 19:35

zulipbot added needs review and removed needs more work labels Sep 28, 2017

HarshitOnGitHub force-pushed the Issue_3895 branch from cce5148 to dd95a9c Compare September 28, 2017 22:24

HarshitOnGitHub added 7 commits September 29, 2017 17:59

build_emoji: Remove the hack used for displaying flag emojis correctly.

da6986a

This hack was used to fix the broken flag emojis in emoji-picker. It was broken due to the incomplete migration to iamcal dataset. See issue zulip#4775 for more details.

build_emoji: Remove the hack used for displaying number emojis correc…

fcfa7c3

…tly. This hack was used to fix the broken number emojis in the emoji picker. It was broken because of the partial migration to the iamcal dataset. See issue zulip#4775 for more details.

emoji.js: Remove some unused data structures.

4dcfe29

This commit removes some unused data structures: `emojis_by_unicode` and `default_unicode_emojis` to be particular.

build_emoji: Remove the redundant codepoints list.

78b7430

This list contained a list of codepoints of all the emojis. We now have `codepoint_to_name` in emoji_codes.js and hence this is not required.

version: Bump the provision version for recent emoji infrastructure c…

853e01b

…hanges.

typeahead: Fix the vertical positioning of emojis.

6b6e9d4

HarshitOnGitHub force-pushed the Issue_3895 branch from dd95a9c to 6b6e9d4 Compare September 29, 2017 17:03

timabbott closed this Sep 29, 2017

zulipbot removed the needs review label Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emoji: Use some more informational source for generating mapping information. #3923

emoji: Use some more informational source for generating mapping information. #3923

HarshitOnGitHub commented Mar 4, 2017 •

edited

zulipbot commented Mar 4, 2017

timabbott commented Mar 5, 2017

HarshitOnGitHub commented Mar 5, 2017

timabbott commented Mar 6, 2017

timabbott commented Mar 6, 2017 •

edited

HarshitOnGitHub commented Mar 7, 2017

timabbott commented Mar 7, 2017 •

edited

rishig commented Mar 8, 2017 •

edited

HarshitOnGitHub commented Mar 10, 2017

timabbott commented Mar 10, 2017

timabbott commented Mar 10, 2017

HarshitOnGitHub commented Mar 10, 2017

timabbott commented Mar 10, 2017 •

edited

HarshitOnGitHub commented Mar 10, 2017

rishig commented Mar 10, 2017

HarshitOnGitHub commented Mar 10, 2017

rishig commented Mar 10, 2017

HarshitOnGitHub commented Mar 11, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott Sep 28, 2017

HarshitOnGitHub Sep 28, 2017

timabbott commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott commented Sep 28, 2017

timabbott commented Sep 28, 2017

rishig commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott commented Sep 29, 2017

emoji: Use some more informational source for generating mapping information. #3923

emoji: Use some more informational source for generating mapping information. #3923

Conversation

HarshitOnGitHub commented Mar 4, 2017 • edited

zulipbot commented Mar 4, 2017

timabbott commented Mar 5, 2017

HarshitOnGitHub commented Mar 5, 2017

timabbott commented Mar 6, 2017

timabbott commented Mar 6, 2017 • edited

HarshitOnGitHub commented Mar 7, 2017

timabbott commented Mar 7, 2017 • edited

rishig commented Mar 8, 2017 • edited

HarshitOnGitHub commented Mar 10, 2017

timabbott commented Mar 10, 2017

timabbott commented Mar 10, 2017

HarshitOnGitHub commented Mar 10, 2017

timabbott commented Mar 10, 2017 • edited

HarshitOnGitHub commented Mar 10, 2017

rishig commented Mar 10, 2017

HarshitOnGitHub commented Mar 10, 2017

rishig commented Mar 10, 2017

HarshitOnGitHub commented Mar 11, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott Sep 28, 2017

Choose a reason for hiding this comment

HarshitOnGitHub Sep 28, 2017

Choose a reason for hiding this comment

timabbott commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott commented Sep 28, 2017

timabbott commented Sep 28, 2017

rishig commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

HarshitOnGitHub commented Sep 28, 2017

timabbott commented Sep 29, 2017

HarshitOnGitHub commented Mar 4, 2017 •

edited

timabbott commented Mar 6, 2017 •

edited

timabbott commented Mar 7, 2017 •

edited

rishig commented Mar 8, 2017 •

edited

timabbott commented Mar 10, 2017 •

edited