Skip to content

Commit

Permalink
🪺 Update for Unicode 14
Browse files Browse the repository at this point in the history
Unicode 14 is here, and with it are some breaking changes in `emoji-regex`, which mean we need to change how we generate our own stuff.

To accomplish this, we're moving to the slightly-more-upstream package, `emoji-test-regex-pattern`, and using its `java` regex variation.

This also means removing all but the one regular expression. We previously marked `Emoji` as deprecated, but the plan now is to mark `Text` as deprecated also.
  • Loading branch information
ticky committed Sep 29, 2021
1 parent f4a553b commit b1bfc4c
Show file tree
Hide file tree
Showing 9 changed files with 25 additions and 262 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
emoji_regex (3.2.3)
emoji_regex (14.0.0)

GEM
remote: https://rubygems.org/
Expand Down
94 changes: 8 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

[![Gem Version](https://badge.fury.io/rb/emoji_regex.svg)](https://rubygems.org/gems/emoji_regex) [![Node & Ruby CI](https://github.com/ticky/ruby-emoji-regex/workflows/Node%20&%20Ruby%20CI/badge.svg)](https://github.com/ticky/ruby-emoji-regex/actions?query=workflow%3A%22Node+%26+Ruby+CI%22)

A set of Ruby regular expressions for matching Unicode Emoji symbols.
A Ruby regular expression for matching Unicode Emoji symbols.

## Background

This is based upon the fantastic work from [Mathias Bynens'](https://mathiasbynens.be/) [`emoji-regex`](https://github.com/mathiasbynens/emoji-regex) Javascript package. `emoji-regex` is cleverly assembled based upon data from the Unicode Consortium.
This is based upon the fantastic work from [Mathias Bynens'](https://mathiasbynens.be/) [`emoji-test-regex-pattern`](https://github.com/mathiasbynens/emoji-test-regex-pattern) package. `emoji-test-regex-pattern` is cleverly assembled based upon data from the Unicode Consortium.

The regular expressions provided herein are derived from that pacakge.
The regular expressions provided herein are derived from that package.

## Installation

Expand All @@ -18,29 +18,7 @@ gem install emoji_regex

## Usage

`emoji_regex` provides these regular expressions:

* `EmojiRegex::RGIEmoji` is the regex you most likely want. It matches all emoji recommended for general interchange, as defined by [the Unicode standard's `RGI_Emoji` property](https://unicode.org/reports/tr51/#def_rgi_set). In a future version, this regular expression will be renamed to `EmojiRegex::Regex` and all other regexes removed.

* `EmojiRegex::Regex` is deprecated, and will be replaced with `RGIEmoji` in a future major version. It matches emoji which present as emoji by default, and those which present as emoji when combined with `U+FE0F VARIATION SELECTOR-16`.

* `EmojiRegex::Text` is deprecated, and will be removed in a future major version. It matches emoji which present as text by default (regardless of variation selector), as well as those which present as emoji by default.

### RGI vs Emoji vs Text Presentation

`RGI_Emoji` is a property of emoji symbols, defined in [Unicode Technical Report #51](https://unicode.org/reports/tr51/#def_rgi_set) which marks emoji as being supported by major vendors and therefore expected to be usable generally. In most cases, this is the property you will want when seeking emoji characters.

`Emoji_Presentation` is another such property, [defined in UTR#51](http://unicode.org/reports/tr51/#Emoji_Properties_and_Data_Files) which controls whether symbols are intended to be rendered as emoji by default.

Generally, for emoji which re-use Unicode code points which existed before Emoji itself was introduced to Unicode, `Emoji_Presentation` is `false`. `Emoji_Presentation` may be `true` but `RGI_Emoji` false for characters with non-standard emoji-like representations in certain conditions. Notable cases are the Emoji Keycap Sequences (#️⃣, 1️⃣, 9️⃣, *️⃣, etc.) which are sequences composed of three characters; the base character, an `U+FE0F VARIATION SELECTOR-16`, and finally the `U+20E3 COMBINING ENCLOSING KEYCAP`.

These characters, therefore, are matched to varying degrees of precision by each of the regular expressions included in this package;

- `#` is matched only by `EmojiRegex::Text` as it is considered to be a text part of a possible emoji.
- `#️` is matched by `EmojiRegex::Regex` as well as `EmojiRegex::Text` as it has `Emoji_Presentation` despite not being a generally accepted Emoji or recommended for general interchange.
- `#️⃣` is matched by all three regular expressions, as it is recommended for general interchange.

It's most likely that the regular expression you want is `EmojiRegex::RGIEmoji`! ☺️
`emoji_regex` provides the `EmojiRegex::Regex` regular expression, which matches emoji, as defined by [the Unicode standard's `emoji-test` data file](https://unicode.org/Public/emoji/14.0/emoji-test.txt).

### Example

Expand All @@ -49,78 +27,24 @@ require 'emoji_regex'

text = <<TEXT
\u{231A}: ⌚ default Emoji presentation character (Emoji_Presentation)
\u{2194}: ↔ default text presentation character
\u{2194}\u{FE0F}: ↔️ default text presentation character with Emoji variation selector
#: # default text presentation character
#\u{FE0F}: #️ default text presentation character with Emoji variation selector
#\u{FE0F}\u{20E3}: #️⃣ default text presentation character with Emoji variation selector and combining enclosing keycap
\u{1F469}: 👩 Emoji modifier base (Emoji_Modifier_Base)
\u{1F469}\u{1F3FF}: 👩🏿 Emoji modifier base followed by a modifier
TEXT

puts 'EmojiRegex::RGIEmoji'
text.scan EmojiRegex::RGIEmoji do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

puts ''

puts 'EmojiRegex::Regex'
text.scan EmojiRegex::Regex do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

puts ''

puts 'EmojiRegex::Text'
text.scan EmojiRegex::Text do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

```

Console output:

```text
EmojiRegex::RGIEmoji
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Matched sequence 👩🏿 — code points: 2
EmojiRegex::Regex
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Matched sequence 👩🏿 — code points: 2
EmojiRegex::Text
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔ — code points: 1
Matched sequence ↔ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence # — code points: 1
Matched sequence # — code points: 1
Matched sequence #️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Expand Down Expand Up @@ -161,14 +85,12 @@ bundle exec rake spec

### Versioning Policy

Since [Version 1.0.0](https://github.com/ticky/ruby-emoji-regex/releases/tag/v1.0.0), Ruby Emoji Regex's versions have followed that of the `emoji-regex` package, minus 6 major versions.
Since [Version 14.0.0](https://github.com/ticky/ruby-emoji-regex/releases/tag/v14.0.0), Ruby Emoji Regex's versions have followed that of the Unicode standard itself.

Each published version of Ruby Emoji Regex will aim to:
- Include any changes in the provided regex in a version matching that of the `emoji-regex` package, keeping the major and minor versions in step.
- When a patch revision of `emoji-regex` is released, if its changes affect the Ruby port meaningfully, a version will be released with the same or greater patch version.
- If a change is required to correct a bug specific to the Ruby port, the patch number will be incremented.
Ruby Emoji Regex is based upon the [`emoji-test-regex-pattern`](https://github.com/mathiasbynens/emoji-test-regex-pattern) package.

Likewise, and so far coincidentally, versions of Ruby Emoji Regex follow the Unicode Standard's version, minus 10 major versions. Therefore, version 1 included Unicode 11, version 2 Unicode 12, and 3 Unicode 13.
- If a patch revision of `emoji-test-regex-pattern` is released, and if its changes affect the Ruby port meaningfully, a version will be released with the same or greater patch version.
- If a change is required to correct a bug specific to the Ruby port, the patch number will be incremented.

### Ruby Compatibility Policy

Expand Down
2 changes: 1 addition & 1 deletion emoji_regex.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Gem::Specification.new do |s|
s.summary = 'Emoji Regex'
s.description = 'A set of Ruby regular expressions for matching Unicode Emoji symbols.'
s.homepage = 'https://github.com/ticky/ruby-emoji-regex'
s.version = '3.2.3'
s.version = '14.0.0'
s.authors = ['Jessica Stokes']
s.email = 'hello@jessicastokes.net'
s.license = 'MIT'
Expand Down
Loading

0 comments on commit b1bfc4c

Please sign in to comment.