Skip to content
This repository has been archived by the owner on Sep 18, 2021. It is now read-only.

Add twttr.txt.modifyIndices{FromUTF16ToUnicode(), FromUnicodeToUTF16()} #39

Merged
merged 2 commits into from
Feb 7, 2012

Conversation

keitaf
Copy link
Contributor

@keitaf keitaf commented Jan 31, 2012

extract*() in twitter-text-js extracts entities with UTF-16 based indices where Unicode supplementary characters are counted as two characters.

However, Twitter API and twitter-text-rb produces indices based on Unicode where Unicode supplementary characters are counted as single characters.

This will add 2 new methods, twttr.txt.modifyIndicesFromUTF16ToUnicode() and twttr.txt.modifyIndicesFromUnicodeToUTF16(), which can be used to modify indices from UTF-16 based to Unicode based, and vise versa.

var c1 = text.charCodeAt(i);
var c2 = text.charCodeAt(i + 1);
if (0xD800 <= c1 && c1 <= 0xDBFF && 0xDC00 <= c2 && c2 <= 0xDFFF) {
// supplementary character
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An i++ here would make explicit that we have already dealt with the next character as well.

@j3h
Copy link

j3h commented Feb 6, 2012

LGTM

keitaf pushed a commit that referenced this pull request Feb 7, 2012
Add twttr.txt.modifyIndices{FromUTF16ToUnicode(), FromUnicodeToUTF16()}
@keitaf keitaf merged commit 3347d04 into punct_before_url Feb 7, 2012
@caniszczyk caniszczyk deleted the unicode_supplementary branch March 19, 2014 21:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants