Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emoji: Use some more informational source for generating mapping information. #3923

Closed
wants to merge 7 commits into from

Conversation

HarshitOnGitHub
Copy link
Member

@HarshitOnGitHub HarshitOnGitHub commented Mar 4, 2017

Fixes: #3972.

@zulipbot
Copy link
Member

zulipbot commented Mar 4, 2017

Hello @zulip/server-emoji members, this pull request references an issue with the area: emoji label, so you may want to check it out!

@HarshitOnGitHub HarshitOnGitHub force-pushed the Issue_3895 branch 2 times, most recently from fa327cd to bbe4c59 Compare March 4, 2017 13:01
@timabbott
Copy link
Sponsor Member

Just took a quick look to get a feel for how you're doing this. Re; the iamcal data, I think we might want to implement a system where tools/provision does a clone+checkout of the repo, and we just have to update the commit ID in use, rather than checking in the JSON. But I'd wait to do the actual work of writing that provisioning tooling until we're confident we're happy with that data set (it's a requirement to merge, not to prototype and figure out how this should work!).

@HarshitOnGitHub
Copy link
Member Author

@timabbott @rishig I have pushed here a working prototype to showcase what iamcal's emoji.json is offering to us. It's just a working prototype not for review. Can you have a look!?

@timabbott
Copy link
Sponsor Member

Cool, this seems pretty good; @rishig as our resident emoji expert, do you want to check it out as well? You need to provision, and it's only in the compose emoji picker (not reactions).

I did notice a few things:

  • I think this breaks the symlink farm or something, because I tried sending some emoji, and got 404s. Didn't investigate.
  • We don't support skin tone modifiers in our markdown right now, so we should probably hide that category until we can implement it (and we might want to make it a toggle UI-wise anyway when we do).
  • emoji_codes.js is still only 68K uncompressed, so while we could probably optimize it, I don't think we strictly need to for now.

In emoji_codes.js, do we want the category mapping to be to emoji names? I was kinda thinking we might want it to go like this:

  • We have a bunch of codepoint -> (list_of_emoji_names) objects.
  • Categories are represented as a map from categories to lists of codes.

but @rishig may have thoughts on the long-term structure.

@timabbott
Copy link
Sponsor Member

timabbott commented Mar 6, 2017

For the picker, I just noticed https://github.com/needim/wdt-emoji-bundle, which uses the iamcal data set (just saw this linked). It could be worth seeing if integrating (or forking) that would be an easier path than making a nice picker ourselves from bootstrap popovers.

If we decide we don't want to use that, I think it's reasonable for you to start working on a system for managing emoji.json from a Git ID in their repository. You can read the code deleted in ce94fb2 to see a reasonable way to do this without a lot of code.

@HarshitOnGitHub
Copy link
Member Author

@timabbott The main problem with https://github.com/needim/wdt-emoji-bundle is that it currently has no support for custom emojis and also there are some glitches like text not showing while I type in the textbox etc.

anim

@timabbott
Copy link
Sponsor Member

timabbott commented Mar 7, 2017

Yeah, OK, sounds like it may be buggy. But since it's MIT-licensed, we can probably borrow some of their CSS at least :)

@rishig
Copy link
Member

rishig commented Mar 8, 2017

In emoji_codes.js, do we want the category mapping to be to emoji names? 
I was kinda thinking we might want it to go like this:

    We have a bunch of codepoint -> (list_of_emoji_names) objects.
    Categories are represented as a map from categories to lists of codes.

but @rishig may have thoughts on the long-term structure.

yeah, I think category -> list of unicode codepoints is probably good as a long term structure.

We'll also need a category for zulip.png (the "unicode codepoint" there would be zulip) , and for any future custom zulip emoji. Let's just call that category "Zulip".

@HarshitOnGitHub
Copy link
Member Author

I am stuck on this PR from the past one and half week without much progress. With every change that I am making I am getting a feeling that our emoji system is getting more and more complex and fragile and maintaining it in the future will be a real pain. I think we need a more structured plan to fix our emoji system from the starting itself. @timabbott @rishig thoughts?

@timabbott
Copy link
Sponsor Member

Well, I think once we have everything working well off the iamcal emoji data sprite sheets, I think it should be possible to remove most of the older code, and thus simplify the system. The problem is that before doing that we'll need to migrate historical messages (or otherwise put in a backwards-compatibility system), but I think that should be manageable.

But until we're ready to do that, the system will definitely increase in complexity for a while.

@timabbott
Copy link
Sponsor Member

What problems are you stuck on currently @HarshitOnGitHub? I can start reviewing and merging pieces if that'd be helpful.

@HarshitOnGitHub
Copy link
Member Author

There is no specific problem as such but as I make some new change that change tends to break some other part of the code and due to the complexity it sometimes become really difficult to track if it's going to break something and most of the times it does and I need to add some kind of handcoded exceptional logic to fix it which I think will make it harder to make changes at a later stage.

@timabbott
Copy link
Sponsor Member

timabbott commented Mar 10, 2017

Yeah, that's life working with a complicated system that doesn't have tests :).

I suspect we can mostly solve the long-term maintainability problem by writing tests for the places where we needed to hardcode an exception (e.g. Node tests that inspect emoji_codes.js and check that certain emoji map in a certain way). I think those should be pretty quick to write.

I think the medium-term plan should be for new messages just use the iamcal data set (and not even use the old emoji-map.json) and actually just use their sprite sheets directly as well (I guess we'll have to move the zulip emoji back to an independent image), since then the only outputs we need to produce are emoji_codes.js and sprite sheet CSS output pieces. That seems like something we should be able to maintain pretty easily once we get there. But this is definitely a significant migration project.

@HarshitOnGitHub
Copy link
Member Author

Yeah, I think removing our sprite generation code is going to reduce a lot of complexity.

@rishig
Copy link
Member

rishig commented Mar 10, 2017

I definitely remember feeling overwhelmed by the complexity of the emoji system when I was dealing with it.

For backwards compatibility, is the symlink farm sufficient? Realm emoji are just external links anyway, right?

@HarshitOnGitHub
Copy link
Member Author

symlink farm is sufficient for maintaining the backward compatibility as long as we can generate it without errors. We are using emoji_map.json for generating the symlink farm and it differs from the iamcal's data set at a number of points. My last commit fixes just that. But I have a list of several emojis which I am not able to extract from iamcal's data set as they would require hand coded mapping and hence no symlinks are being generated for them at this stage.

@rishig
Copy link
Member

rishig commented Mar 10, 2017

There may be a misunderstanding here. I think the proposal is to have two completely independent systems:

  • System 1, only used for rendering old messages: symlink farm (generated by emoji_map.json), which point to png files extracted from the ttf files. No changes are needed here.
  • System 2, for new messages: We use the iamcal data set, and the iamcal sprite sheets. The rendered emoji look something like <span class=emoji-u1234>:simple_smile:</span>, and a bunch of generated CSS causes this to appear like 🙂 rather than :simple_smile:.

So there should be no interaction between the old names and the new names, and no symlinks should be generated from the iamcal data. Does that make sense? We can also move this to chat.zulip.com, just at-mention me there if you start a thread.

@HarshitOnGitHub
Copy link
Member Author

@rishig

System 1, only used for rendering old messages: symlink farm (generated by emoji_map.json), which
point to png files extracted from the ttf files. No changes are needed here.

The symlinks do not directly point to the png files extracted from the ttf file rather they to the png files in out/unicode which are essentially copies of png files extracted from the ttf files after being renamed using a mapping generated from NotoColorEmoji.ttx and iamcal's dataset and there are some codepoints on which either emoji_map and iamcal's dataset don't agree or they are not bound to any short name in iamcal's dataset.

@HarshitOnGitHub
Copy link
Member Author

@timabbott @rishig Since it was almost impossible to rebase the old branch I have redone this from scratch and this seems to be ready for a review. This apart from switching us to use sprite sheets also fixes a longstanding issue of number emojis being not searchable in emoji picker ever since we introduced emoji aliases. Also includes some fixes to notifications.py so that we don't break emoji rendering there.
Sorry for a single large commit but it was necessary to maintain commit atomicity.

P.S.: The test suite that I had added for relative URLs in digest emails is working well. It prevented me from introducing a bug related to zulip emoji.

content)
content = content.replace(' class="emoji"', ' style="height: 20px;"')

return content
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: I think we may want to eventually migrate this code path to use xml.ElementTree for doing the rewriting; that would likely be a lot more robust than the regular expressions we've been doing.

But it's not a regression, so OK for now :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was also thinking of replacing these regex with lxml while writing code for replacing spans with img tags.

@timabbott
Copy link
Sponsor Member

It's great that the "big commit" is only ~100LOC, much of it changes in tests. I just looked through it, and the first commit at least looks good. I'll plan to test-deploy to chat.zulip.org today.

@HarshitOnGitHub
Copy link
Member Author

It's great that the "big commit" is only ~100LOC, much of it changes in tests.

Haha, doing it took more than I had expected. Will it make sense to make a tool for automatically updating markdown_test_cases.json fixtures once user verifies that changes are actually correct.

@timabbott
Copy link
Sponsor Member

I've for a while wanted a pretty-printer at least for the diff when you get a failure in markdown_test_cases failures, since it can be really hard to read as it is. I think there's an issue for it.

@timabbott
Copy link
Sponsor Member

A few small issues:

  • In the typeahead, the emoji seem like they're a pixel too low.
    image
  • Also, typeahead_helper.js is missing complete node test coverage and thus failing CI.
  • When emojis_by_unicode is removed, you forgot to remove the definition.
  • We should bump PROVISION_VERSION to 10.0 somewhere in this branch, since we'll need build_emoji to be rerun when switching branches in either direction.

Also, can you summarize the impact expected from changing name_to_codepoint to be based on iamcal? I want to make sure we have thought through everything. Potential issues to think about:

  • Existing emoji in messages. This should be OK, since we don't access the emoji libraries at all when interacting with these
  • Emoji reactions. This should be OK if all of the code for what emoji to show is based on using the emoji_code, and the emoji_name is just used to print a name. I'm not sure we have that quite yet; at least, the unique_together on the data model is still tied to emoji_name.

Can we split the syntax change in the sprite sheet from that without much effort? If I'm not mistaken, 90% of the code in the first commit is the sprite sheet switch, and I'd love to be able to merge that piece right away (and get feedback) before confronting the name change problem, which is a lot more risky.

@rishig
Copy link
Member

rishig commented Sep 28, 2017

I thought the name change was only for new codepoints? I.e. if a codepoint has a name in the current system, it's keeping its name till a bigger naming overhaul that should happen in a couple of weeks.

@HarshitOnGitHub
Copy link
Member Author

Also, typeahead_helper.js is missing complete node test coverage and thus failing CI.

I had fixed it, may be forgot to push the new version, will push it now.

When emojis_by_unicode is removed, you forgot to remove the definition.

Oops will fix it.

We should bump PROVISION_VERSION to 10.0 somewhere in this branch, since we'll need build_emoji to be rerun when switching branches in either direction.

Yeah I had realized later and added a separate commit to bump the provision version while fixing the typeahead coverage issue, will push now.

Also there are no name changes in this PR. I have used the same names as we were using earlier.

@HarshitOnGitHub
Copy link
Member Author

Also, can you summarize the impact expected from changing name_to_codepoint to be based on iamcal? I want to make sure we have thought through everything. Potential issues to think about:

Existing emoji in messages. This should be OK, since we don't access the emoji libraries at all when interacting with these
Emoji reactions. This should be OK if all of the code for what emoji to show is based on using the emoji_code, and the emoji_name is just used to print a name. I'm not sure we have that quite yet; at least, the unique_together on the data model is still tied to emoji_name.

Another thing to think about it is notifications which we have already fixed. For emoji reactions we are already using the iamcal's version of name_to_codepoint with some hacky changes that I just removed in this PR.

@HarshitOnGitHub
Copy link
Member Author

Actually changing the name_to_codepoint to be iamcal based is going solve a number of problems. Till now there was an inconsistency in name_to_codepoint and codepoint_to_name data structures about which I had to remember everytime I made a change but both of them are now based on iamcal.

This commit switches to use sprite sheets for rendering emojis
in all the remaining places, i.e., message bodies and composebox
typeahead. This commit also includes some changes to notifications.py
file so that the spans used for rendering emojis can be converted
to corresponding image tags so that we don't break the emoji rendering
in missed message emails since we can't use sprite sheets there.

Fixes: zulip#3972.
This hack was used to fix the broken flag emojis in emoji-picker.
It was broken due to the incomplete migration to iamcal dataset.
See issue zulip#4775 for more details.
…tly.

This hack was used to fix the broken number emojis in the emoji picker.
It was broken because of the partial migration to the iamcal dataset.
See issue zulip#4775 for more details.
This commit removes some unused data structures: `emojis_by_unicode` and
`default_unicode_emojis` to be particular.
This list contained a list of codepoints of all the emojis. We now have
`codepoint_to_name` in emoji_codes.js and hence this is not required.
@timabbott
Copy link
Sponsor Member

Merged, thanks @HarshitOnGitHub! This has been one of the more epic projects in Zulip; huge thanks for all your work to make this happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants