Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪺 Update for Unicode 14 #33

Merged
merged 4 commits into from
Jan 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
emoji_regex (3.2.3)
emoji_regex (14.0.0.pre.1)

GEM
remote: https://rubygems.org/
Expand Down
94 changes: 8 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

[![Gem Version](https://badge.fury.io/rb/emoji_regex.svg)](https://rubygems.org/gems/emoji_regex) [![Node & Ruby CI](https://github.com/ticky/ruby-emoji-regex/workflows/Node%20&%20Ruby%20CI/badge.svg)](https://github.com/ticky/ruby-emoji-regex/actions?query=workflow%3A%22Node+%26+Ruby+CI%22)

A set of Ruby regular expressions for matching Unicode Emoji symbols.
A Ruby regular expression for matching Unicode Emoji symbols.

## Background

This is based upon the fantastic work from [Mathias Bynens'](https://mathiasbynens.be/) [`emoji-regex`](https://github.com/mathiasbynens/emoji-regex) Javascript package. `emoji-regex` is cleverly assembled based upon data from the Unicode Consortium.
This is based upon the fantastic work from [Mathias Bynens'](https://mathiasbynens.be/) [`emoji-test-regex-pattern`](https://github.com/mathiasbynens/emoji-test-regex-pattern) package. `emoji-test-regex-pattern` is cleverly assembled based upon data from the Unicode Consortium.

The regular expressions provided herein are derived from that pacakge.
The regular expressions provided herein are derived from that package.

## Installation

Expand All @@ -18,29 +18,7 @@ gem install emoji_regex

## Usage

`emoji_regex` provides these regular expressions:

* `EmojiRegex::RGIEmoji` is the regex you most likely want. It matches all emoji recommended for general interchange, as defined by [the Unicode standard's `RGI_Emoji` property](https://unicode.org/reports/tr51/#def_rgi_set). In a future version, this regular expression will be renamed to `EmojiRegex::Regex` and all other regexes removed.

* `EmojiRegex::Regex` is deprecated, and will be replaced with `RGIEmoji` in a future major version. It matches emoji which present as emoji by default, and those which present as emoji when combined with `U+FE0F VARIATION SELECTOR-16`.

* `EmojiRegex::Text` is deprecated, and will be removed in a future major version. It matches emoji which present as text by default (regardless of variation selector), as well as those which present as emoji by default.

### RGI vs Emoji vs Text Presentation

`RGI_Emoji` is a property of emoji symbols, defined in [Unicode Technical Report #51](https://unicode.org/reports/tr51/#def_rgi_set) which marks emoji as being supported by major vendors and therefore expected to be usable generally. In most cases, this is the property you will want when seeking emoji characters.

`Emoji_Presentation` is another such property, [defined in UTR#51](http://unicode.org/reports/tr51/#Emoji_Properties_and_Data_Files) which controls whether symbols are intended to be rendered as emoji by default.

Generally, for emoji which re-use Unicode code points which existed before Emoji itself was introduced to Unicode, `Emoji_Presentation` is `false`. `Emoji_Presentation` may be `true` but `RGI_Emoji` false for characters with non-standard emoji-like representations in certain conditions. Notable cases are the Emoji Keycap Sequences (#️⃣, 1️⃣, 9️⃣, *️⃣, etc.) which are sequences composed of three characters; the base character, an `U+FE0F VARIATION SELECTOR-16`, and finally the `U+20E3 COMBINING ENCLOSING KEYCAP`.

These characters, therefore, are matched to varying degrees of precision by each of the regular expressions included in this package;

- `#` is matched only by `EmojiRegex::Text` as it is considered to be a text part of a possible emoji.
- `#️` is matched by `EmojiRegex::Regex` as well as `EmojiRegex::Text` as it has `Emoji_Presentation` despite not being a generally accepted Emoji or recommended for general interchange.
- `#️⃣` is matched by all three regular expressions, as it is recommended for general interchange.

It's most likely that the regular expression you want is `EmojiRegex::RGIEmoji`! ☺️
`emoji_regex` provides the `EmojiRegex::Regex` regular expression, which matches emoji, as defined by [the Unicode standard's `emoji-test` data file](https://unicode.org/Public/emoji/14.0/emoji-test.txt).

### Example

Expand All @@ -49,78 +27,24 @@ require 'emoji_regex'

text = <<TEXT
\u{231A}: ⌚ default Emoji presentation character (Emoji_Presentation)
\u{2194}: ↔ default text presentation character
\u{2194}\u{FE0F}: ↔️ default text presentation character with Emoji variation selector
#: # default text presentation character
#\u{FE0F}: #️ default text presentation character with Emoji variation selector
#\u{FE0F}\u{20E3}: #️⃣ default text presentation character with Emoji variation selector and combining enclosing keycap
\u{1F469}: 👩 Emoji modifier base (Emoji_Modifier_Base)
\u{1F469}\u{1F3FF}: 👩🏿 Emoji modifier base followed by a modifier
TEXT

puts 'EmojiRegex::RGIEmoji'
text.scan EmojiRegex::RGIEmoji do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

puts ''

puts 'EmojiRegex::Regex'
text.scan EmojiRegex::Regex do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

puts ''

puts 'EmojiRegex::Text'
text.scan EmojiRegex::Text do |emoji|
puts "Matched sequence #{emoji} — code points: #{emoji.length}"
end

```

Console output:

```text
EmojiRegex::RGIEmoji
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Matched sequence 👩🏿 — code points: 2

EmojiRegex::Regex
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Matched sequence 👩🏿 — code points: 2

EmojiRegex::Text
Matched sequence ⌚ — code points: 1
Matched sequence ⌚ — code points: 1
Matched sequence ↔ — code points: 1
Matched sequence ↔ — code points: 1
Matched sequence ↔️ — code points: 2
Matched sequence ↔️ — code points: 2
Matched sequence # — code points: 1
Matched sequence # — code points: 1
Matched sequence #️ — code points: 2
Matched sequence #️ — code points: 2
Matched sequence #️⃣ — code points: 3
Matched sequence #️⃣ — code points: 3
Matched sequence 👩 — code points: 1
Matched sequence 👩 — code points: 1
Matched sequence 👩🏿 — code points: 2
Expand Down Expand Up @@ -161,14 +85,12 @@ bundle exec rake spec

### Versioning Policy

Since [Version 1.0.0](https://github.com/ticky/ruby-emoji-regex/releases/tag/v1.0.0), Ruby Emoji Regex's versions have followed that of the `emoji-regex` package, minus 6 major versions.
Since [Version 14.0.0](https://github.com/ticky/ruby-emoji-regex/releases/tag/v14.0.0), Ruby Emoji Regex's versions have followed that of the Unicode standard itself.

Each published version of Ruby Emoji Regex will aim to:
- Include any changes in the provided regex in a version matching that of the `emoji-regex` package, keeping the major and minor versions in step.
- When a patch revision of `emoji-regex` is released, if its changes affect the Ruby port meaningfully, a version will be released with the same or greater patch version.
- If a change is required to correct a bug specific to the Ruby port, the patch number will be incremented.
Ruby Emoji Regex is based upon the [`emoji-test-regex-pattern`](https://github.com/mathiasbynens/emoji-test-regex-pattern) package.

Likewise, and so far coincidentally, versions of Ruby Emoji Regex follow the Unicode Standard's version, minus 10 major versions. Therefore, version 1 included Unicode 11, version 2 Unicode 12, and 3 Unicode 13.
- If a patch revision of `emoji-test-regex-pattern` is released, and if its changes affect the Ruby port meaningfully, a version will be released with the same or greater patch version.
- If a change is required to correct a bug specific to the Ruby port, the patch number will be incremented.

### Ruby Compatibility Policy

Expand Down
2 changes: 1 addition & 1 deletion emoji_regex.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Gem::Specification.new do |s|
s.summary = 'Emoji Regex'
s.description = 'A set of Ruby regular expressions for matching Unicode Emoji symbols.'
s.homepage = 'https://github.com/ticky/ruby-emoji-regex'
s.version = '3.2.3'
s.version = '14.0.0.pre.1'
s.authors = ['Jessica Stokes']
s.email = 'hello@jessicastokes.net'
s.license = 'MIT'
Expand Down
Loading