Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate emoji storage to group aliases together for each canonical name. #1084

Merged
merged 4 commits into from
Jul 23, 2021

Conversation

zee-bit
Copy link
Member

@zee-bit zee-bit commented Jul 15, 2021

What does this PR do?

Migrates our emoji storage structure in unicode_emojis.py to a new structure where
for each canonical_name we keep its aliases grouped together, instead of considering
them as separate emojis.

Tested?

  • Manually
  • Existing tests (adapted, if necessary)
  • New tests added (for any new behavior)
  • Passed linting & tests (each commit)

Notes & Questions

I need to test the reactions PR on top of this to determine the authenticity of this migration.

Interactions

@zee-bit zee-bit added further discussion required Discuss this on #zulip-terminal on chat.zulip.org PR blocks other PR PR needs review PR requires feedback to proceed labels Jul 15, 2021
setup.cfg Show resolved Hide resolved
@zee-bit zee-bit force-pushed the update-emoji-storage-by-name branch from 2fdf218 to 46a7cf3 Compare July 16, 2021 19:32
@zulipbot zulipbot added the size: XL [Automatic label added by zulipbot] label Jul 16, 2021
@neiljp neiljp mentioned this pull request Jul 18, 2021
4 tasks
Copy link
Member

@preetmishra preetmishra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zee-bit Thanks for working on this. This looks good overall. 👍 Though, as there has been a discussion on CZO, generating the new file midway or separately in a commit would bring discrepancies in model.py.

@neiljp neiljp added PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback and removed PR needs review PR requires feedback to proceed labels Jul 19, 2021
@zee-bit zee-bit force-pushed the update-emoji-storage-by-name branch 2 times, most recently from d7339ae to 1a32875 Compare July 19, 2021 13:31
@zee-bit zee-bit added PR needs review PR requires feedback to proceed and removed PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback labels Jul 19, 2021
@zee-bit zee-bit force-pushed the update-emoji-storage-by-name branch 2 times, most recently from 94c5569 to 23a95d5 Compare July 21, 2021 22:09
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zee-bit You've got the correct principle here. To ease the migration and avoid the mid-commit regression (which should be breaking tests?), a possible refactoring would be to first adjust the emoji data generation function to return an additional set of data ie. the pre-prepared emoji names, adapting the autocomplete to use it. You can then regenerate the emoji file and the emoji manipulation internal to the model at the same time in the next commit.

tools/convert-unicode-emoji-data Outdated Show resolved Hide resolved
Comment on lines 180 to 183
self.active_emoji_data = self.generate_all_emoji_data(
self.initial_data["realm_emoji"]
)
self.all_emoji_names = self.generate_all_emoji_names()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We update emoji when we get a realm emoji change event, so we at least want to call both in each case, or have one method that returns a tuple and updates both at the same time.

@neiljp neiljp added PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback and removed PR needs review PR requires feedback to proceed labels Jul 22, 2021
This commit adds a new instance variable in model that stores the list of
all emoji names generated during model initialization by the
generate_emoji_data function.

The generate_emoji_data function now return a tuple of all active emoji
data and all emoji names sorted. This also migrates the autocomplete
emoji function to use the list of all_emoji_names returned from this,
instead of generating it multiple times during runtime whenever
autocomplete emoji is called.

Tests amended.
This commit updates the convert-unicode-emoji-data script to store all the
unicode emoji codepoints in their extended format as received from the
server.

This script will map each emoji_name (aka canonical_name) to its data
containing the emoji_code and aliases that share the same emoji_code.
We sort this dictionary in ascending order of key (emoji_name) and store
it as an OrderedDict to maintain the order of sorting.

We also turn off black formatting by wrapping the file with fmt: off/on to
disable black from formatting the dictionaries afterwards - since that is
a generated file, it should not be modified.
E501 ignores max-line-length property - this is already defined by black.
We add this error code in flake8's ignore list in setup.cfg file.
This commit generates the EMOJI_DATA from tools/convert-unicode-emoji-data
in the new format that maps each emoji_name to its corresponding emoji's
code, and aliases that share the same emoji_code, stored in an OrderedDict.
The EMOJI_DATA is sorted in ascending order of emoji_name and this ordering
is maintained via OrderedDict. The generated EMOJI_DATA is stored in
unicode_emojis.py file.

This also updates the helper function in model that generates all emoji
data to follow and adapt to this new format of storing emoji's and updates
the EmojiData type to indicate each emoji now includes an aliases field.
This also migrates the emoji fixtures in conftest i.e. realm_emoji_data,
unicode_emojis and zulip_emoji to the new format of storing emoji_data.

Tests amended.
@zee-bit zee-bit force-pushed the update-emoji-storage-by-name branch from 23a95d5 to 0aecbeb Compare July 22, 2021 15:03
@zee-bit zee-bit added PR needs review PR requires feedback to proceed and removed PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback labels Jul 22, 2021
@neiljp
Copy link
Collaborator

neiljp commented Jul 23, 2021

@zee-bit This flows much cleaner with no mid-PR regressions - great! Merging this now 🎉

@neiljp neiljp merged commit 0eaa4f2 into zulip:main Jul 23, 2021
@neiljp neiljp removed the PR needs review PR requires feedback to proceed label Jul 23, 2021
@neiljp neiljp added this to the Next Release milestone Jul 23, 2021
@neiljp neiljp added area: refactoring and removed further discussion required Discuss this on #zulip-terminal on chat.zulip.org labels Jul 23, 2021
@zulipbot
Copy link
Member

Hello @zulip/server-refactoring members, this pull request was labeled with the "area: refactoring" label, so you may want to check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: refactoring PR blocks other PR size: XL [Automatic label added by zulipbot]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants