Normalization - NFD #17

Merged
merged 24 commits into from Apr 27, 2012

Conversation

Projects
None yet
4 participants
Contributor

timothyandrew commented Apr 14, 2012

I've written a basic implementation for the NFD Normalization here. I'm testing this using NormalizationTest.txt, and the tests are passing for all but the Korean and Hangul characters, which apparently need a different approach to decompose.
The file is 2.3MB, so we'll probably have to find a better way to test this. Any ideas?

There are 5 columns for each character in the file, c1 through c5. For NFD, the following must hold true:

c3 ==  toNFD(c1) ==  toNFD(c2) ==  toNFD(c3)
c5 ==  toNFD(c4) ==  toNFD(c5)

A bunch of these comparisons lookup code points that don't exist in UnicodeData.txt (such as 6BCD). Is this expected, or am I missing something? I've solved this for the moment by using an Array of 15 empty strings when the code_point isn't found:

unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point) || Array.new(size=15, obj="")

But I'm not entirely sure if this is the best way to do it.

Contributor

KL-7 commented Apr 14, 2012

@timothyandrew, is that tabs in both specs and base.rb? I think you should re-indent these files with two spaces and check settings of your editor.

@KL-7 KL-7 and 1 other commented on an outdated diff Apr 14, 2012

lib/normalizers/base.rb
@@ -0,0 +1,21 @@
+# encoding: UTF-8
+
+module TwitterCldr
+ module Normalizers
+ class Base
+ class << self
+ def code_point_to_char(code_point)
+ [code_point.upcase.hex].pack('U*')
+ end
+ def char_to_code_point(char)
+ code_point = char.unpack('U*').first.to_s(16).upcase
+ #Pad to atleast 4 digits
+ until code_point.length >= 4
@KL-7

KL-7 Apr 14, 2012

Contributor

Wouldn't it be better to do smth like code_point = '0' * (4 - code_point.size) + code_point if code_point.size < 4 instead of looping and maybe extract that 4 into some constant with a meaningful name?

@KL-7

KL-7 Apr 14, 2012

Contributor

As you're returning from the method right after that you can even do

code_point.size >= 4 ? code_point : '0' * (4 - code_point.size) + code_point
@timothyandrew

timothyandrew Apr 18, 2012

Contributor

That looks much better.

@timothyandrew

timothyandrew Apr 18, 2012

Contributor

Whoops, looks like there's a builtin String function for this. So I guess we can do

code_point.rjust(4, '0')
@KL-7

KL-7 Apr 18, 2012

Contributor

Cool! I suspected there should be smth, but I must have been to lazy at the moment to find it :)

@KL-7 KL-7 and 1 other commented on an outdated diff Apr 14, 2012

lib/normalizers/canonical/nfd.rb
@@ -0,0 +1,48 @@
+# encoding: UTF-8
+
+module TwitterCldr
+ module Normalizers
+ class NFD < Base
+ class << self
+ def normalize_code_points(code_points)
+ code_points = code_points.map { |code_point| decompose code_point }.flatten
+ reorder code_points
+ code_points
+ end
+
+ #Recursively replace the given code point with the values in its Decomposition_Mapping property
+ def decompose(code_point)
+ unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point) || Array.new(size=15, obj="")
@KL-7

KL-7 Apr 14, 2012

Contributor

What kind of syntax is Array.new(size=15, obj="")? I'm afraid I've never seen passing arguments like that in ruby. And as far as I understand all you need from that array is 5th element that you retrieve in the next line. Is the any point of creating an array at all in that case?

@timothyandrew

timothyandrew Apr 17, 2012

Contributor

Array.new(size=15, obj="") creates an array of 15 empty strings. I chose to create the entire array just so the next statement would work whether TwitterCldr::Shared::UnicodeData returned nil or not:

decomposition_mapping = unicode_data[5].split

I think it'd require more code to perform the nil check and then return an empty string if it is nil:

unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point)
if unicode_data
  decomposition_mapping = unicode_data[5].split
else
  decomposition_mapping = "".split
end

What do you think?

@KL-7

KL-7 Apr 17, 2012

Contributor

First, I'm still interested in what kind of syntax is Array.new(size=15, obj="")? I assume it's equivalent to Array.new(15, ""), but I've never seen it before in Ruby. Second, without going into the depth of Ruby implementation, that I'm not very familiar, don't you think creating a whole new array of object is not worth reducing the amount of code you need to perform this check? I think it's really nice to make smth like element = hash.fetch('key', []) to get an empty array if the key is not present in the hash, but in this particular case creating an array of 15 elements seems like too much for me. Anyway, I'm just sharing my opinion.

Regarding your changes, why "".split? It'll simply return an empty array anyway. How about that:

unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point)
decomposition_mapping = unicode_data ? unicode_data[5].split : []
@timothyandrew

timothyandrew Apr 17, 2012

Contributor

On closer inspection, it seems like this is something that UnicodeData#for_code_point should handle, since the "missing" code points I'm trying to handle by using Array.new(size=15, obj="") actually do exist, but are not explicitly specified. I've detailed my reasoning here.

@KL-7 KL-7 and 1 other commented on an outdated diff Apr 14, 2012

lib/normalizers/canonical/nfd.rb
+ #Swap any two adjacent code points A & B if ccc(A) > ccc(B) > 0
+ def reorder(code_points)
+ (code_points.size).times do
+ code_points.each_with_index do |cp, i|
+ unless cp == code_points.last
+ ccc_a, ccc_b = combining_class_for(cp), combining_class_for(code_points[i+1])
+ if (ccc_a > ccc_b) and (ccc_b > 0)
+ code_points[i], code_points[i+1] = code_points[i+1], code_points[i]
+ end
+ end
+ end
+ end
+ end
+
+ def combining_class_for(code_point)
+ (TwitterCldr::Shared::UnicodeData.for_code_point(code_point) || Array.new(size=15, obj=""))[3].to_i
@KL-7

KL-7 Apr 14, 2012

Contributor

Same concern about array creation here. It might not make a big impact on performance, memory or smth, but I think it's unnecessary here. We can check that for_code_point returned an array and get its 3rd element, but if it returned nil simply return an empty string. What do you think?

@timothyandrew

timothyandrew Apr 17, 2012

Contributor

Please see comment above.

@KL-7 KL-7 commented on the diff Apr 14, 2012

lib/normalizers/canonical/nfd.rb
+ unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point) || Array.new(size=15, obj="")
+ decomposition_mapping = unicode_data[5].split
+
+ #Return the code point if compatibility mapping or if no mapping exists
+ if decomposition_mapping.first =~ /<.*>/ or decomposition_mapping.empty?
+ code_point
+ else
+ decomposition_mapping.map do |decomposition_code_point|
+ decompose(decomposition_code_point)
+ end.flatten
+ end
+ end
+
+ #Swap any two adjacent code points A & B if ccc(A) > ccc(B) > 0
+ def reorder(code_points)
+ (code_points.size).times do
@KL-7

KL-7 Apr 14, 2012

Contributor

Why parenthesis here?

@timothyandrew

timothyandrew Apr 17, 2012

Contributor

Just thought it made it a little more readable than code_points.size.times do

@KL-7

KL-7 Apr 17, 2012

Contributor

I see, but afaik people usually chain methods as much as they need without adding any unnecessary parenthesis.

@KL-7 KL-7 and 1 other commented on an outdated diff Apr 14, 2012

lib/normalizers/canonical/nfd.rb
+ module Normalizers
+ class NFD < Base
+ class << self
+ def normalize_code_points(code_points)
+ code_points = code_points.map { |code_point| decompose code_point }.flatten
+ reorder code_points
+ code_points
+ end
+
+ #Recursively replace the given code point with the values in its Decomposition_Mapping property
+ def decompose(code_point)
+ unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point) || Array.new(size=15, obj="")
+ decomposition_mapping = unicode_data[5].split
+
+ #Return the code point if compatibility mapping or if no mapping exists
+ if decomposition_mapping.first =~ /<.*>/ or decomposition_mapping.empty?
@KL-7

KL-7 Apr 14, 2012

Contributor

There are a lot of disputes in Ruby community about using and and or in conditions. These operators have precedence different from their alternatives && and || that requires more careful usage of them in conditions. As the result, some companies, like Github, simply forbid using or and and at all (see styleguide), others leave them for chaining operations like validate and save. There are still some ppl that use them just because they are more verbose, but I believe these proud guys are in the minority. Honestly, I used to use them myself but under the pressure of the community switched to && and || in conditions.

@camertron, do you have some recommendations on that topic in Twitter's styleguides?

@timothyandrew

timothyandrew Apr 17, 2012

Contributor

I did not know this! I've changed the code to use && and ||.

Collaborator

camertron commented Apr 16, 2012

@KL-7 Twitter uses || and && almost exclusively because of the precedence reasons you just stated. The only time I've seen them used is to do something like product.save and return. Personally, I like the look and feel of and and or, but it's usually better to use the bitwise ones instead.

@timothyandrew any word on getting this to work with Hangul characters?

Contributor

timothyandrew commented Apr 17, 2012

@camertron Not yet, I've been sick with fever the last couple of days; haven't really gotten much work done. But I will keep pushing to this branch as I commit so you can keep track of my progress.

Contributor

timothyandrew commented Apr 17, 2012

KL-7 commented:
@timothyandrew, is that tabs in both specs and base.rb? I think you should re-indent these files with two spaces and check settings of your editor.

@KL-7 Good catch! I'd forgotten to set the global preference for spaces in Sublime.

Contributor

timothyandrew commented Apr 17, 2012

As I've just found out, not all code points are explicitly mentioned in UnicodeData.txt. Some code points implicitly exist; they are part of a range of code points, all with the same data. UnicodeData.txt just includes the first and last element of such ranges.

For example,

4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;

This indicates a range (the names are enclosed in < >, first name ends in First and the second name ends in Last). This implicitly means that all code points between 4E00 and 9FCC exist, containing the same data as the first and last elements.

Therefore, UnicodeData.code_point_for('4E11') should return something like:

["4E11", "<CJK Ideograph>", "Lo", "0", "L", "", "", "", "", "N", "", "", "", "", ""]

But it actually returns nil.
Fixed in 2d7a38b.

Contributor

timothyandrew commented Apr 18, 2012

@camertron I've got a few questions regarding this:

  1. We can't package NormalizationTest.txt as a part of the library because it's 2.3MB. Also, the test using it (in nfd_spec.rb) runs for about 30 seconds. I guess we can't test the entire file, so maybe we can test a relevant subset of it? I'm not sure what to do here.

  2. The code right now normalizes code points, not strings. We need an interface to Normalizers::NFD.normalize_code_points. Is it better to add a method to the String class:

    "café".normalize(:NFD)

    or simply use something like this?

Normalizers::NFD.normalize("café")

@KL-7 KL-7 and 1 other commented on an outdated diff Apr 18, 2012

lib/shared/unicode_data.rb
@@ -12,8 +12,25 @@ def for_code_point(code_point)
range.include? code_point.to_i(16)
end
- TwitterCldr.get_resource("unicode_data", target.first)[code_point.to_sym] if target
- end
+ if target
+ block_data = TwitterCldr.get_resource("unicode_data", target.first)
+ block_data[code_point.to_sym] or get_range_start(code_point, block_data)
@KL-7

KL-7 Apr 18, 2012

Contributor

How about || for consistency?

@timothyandrew

timothyandrew Apr 19, 2012

Contributor

As I understood from the link you posted, || and && are for boolean expressions, and and and or are for chaining expressions together. Since this is an instance of the latter, wouldn't or be correct?

@KL-7

KL-7 Apr 19, 2012

Contributor

Again, I'm just sharing my opinion, but in this case you're not chaining actions, but check that value is here and then process it. For such cases I usually use &&. Honestly, my attitude changed a bit later and looks like I use && form almost in all cases now. So it's up to you.

What really surprised me, now, when I looked at it closer, is that you're checking existence of a value in a hash at some key and then pass both hash and the key further. That feels strange. Basically, you check some value and then do nothing specifically about this value. Wouldn't it be better to move that check into the get_range_start method? I didn't read this method yet, so it's just an idea for now.

@KL-7

KL-7 Apr 19, 2012

Contributor

Ah, sorry @timothyandrew, I completely misunderstood this line. You're returning either the value from the hash directly or pass this hash into the other method in order to find it inside of one of the range-blocks. I think || is more appropriate here, but Hash#fetch with block calling this method might be even better here.

@timothyandrew

timothyandrew Apr 19, 2012

Contributor

If I use Hash#fetch with a block, wouldn't I be able to access only the code_point from inside the block? I don't see a way to pass block_data to get_range_start without making block_data a class variable.

block_data.fetch(code_point.to_sym) do |code_point_sym|
  #Can't access block_data from here, only code_point
end
@timothyandrew

timothyandrew Apr 19, 2012

Contributor

Oops, I can just use block_data inside the block, right? Sorry!
I guess Hash#fetch works well for this case, then.

@KL-7

KL-7 Apr 19, 2012

Contributor

Just try it:

block_data.fetch(code_point.to_sym) { get_range_start(code_point, block_data) }

Ruby blocks are closures that means you can access from inside of them anything that was available in the context where they were created.

@KL-7

KL-7 Apr 19, 2012

Contributor

@timothyandrew, no problem. You can even store that block in a variable, pass it into other methods several times and it still will be able to reference the method and both arguments you need to pass into it. That's an incredibly powerful feature of Ruby blocks.

@camertron camertron commented on the diff Apr 19, 2012

lib/normalizers/canonical/nfd.rb
+
+module TwitterCldr
+ module Normalizers
+ class NFD < Base
+ @@hangul_constants = {:SBase => "AC00".hex, :LBase => "1100".hex, :VBase => "1161".hex, :TBase => "11A7".hex,
+ :Scount => 11172, :LCount => 19, :VCount => 21, :TCount => 28, :NCount => 588, :Scount => 1172}
+ class << self
+ def normalize_code_points(code_points)
+ code_points = code_points.map { |code_point| decompose code_point }.flatten
+ reorder code_points
+ code_points
+ end
+
+ #Recursively replace the given code point with the values in its Decomposition_Mapping property
+ def decompose(code_point)
+ unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point)
@camertron

camertron Apr 19, 2012

Collaborator

Wouldn't it be cool if for_code_point returned an instance of something like TwitterCldr::Shared::UnicodeData::CodePoint? That way we wouldn't have to use array indices to access the code point data. In other words, you could do unicode_data.code_point instead of unicode_data[0]. What do you think?

@timothyandrew

timothyandrew Apr 19, 2012

Contributor

That's a great idea…I actually started implementing something like this, but let it go early. In my version, for_code_point returns a hash of values. I can just zip up a list of keys with the values returned by for_code_point to get me my hash:

keys = [:codepoint, :name, :category, :combining_class, :bidi_class, :decomposition, :digit_value, :non_decimal_digit_value, :numeric_value, :bidi_mirrored, :unicode1_name, :iso_comment, :simple_uppercase_map, :simple_lowercase_map, :simple_titlecase_map]
Hash[keys.zip UnicodeData.for_code_point('1F3E9')]

which gives me:

{:codepoint=>"1F3E9", :name=>"LOVE HOTEL", :category=>"So", :combining_class=>"0", :bidi_class=>"ON", :decomposition=>"", :digit_value=>"", :non_decimal_digit_value=>"", :numeric_value=>"", :bidi_mirrored=>"N", :unicode1_name=>"", :iso_comment=>"", :simple_uppercase_map=>"", :simple_lowercase_map=>"", :simple_titlecase_map=>""}

Wouldn't that be simpler than returning an instance of TwitterCldr::Shared::UnicodeData::CodePoint?

@KL-7

KL-7 Apr 19, 2012

Contributor

@timothyandrew, you can use Struct class for that purpose. Creating CodePoint struct will be pretty easy, but in return you'll be able to do unicode_data.codepoint and any mistype in the name of the attribute won't stay unnoticed.

@timothyandrew

timothyandrew Apr 19, 2012

Contributor

@KL-7 Yeah, I think that's the perfect solution for this. Thanks for the idea; I had no idea ruby had something like this! :)

@timothyandrew

timothyandrew Apr 19, 2012

Contributor

The change is at pull request #19.

Collaborator

camertron commented Apr 19, 2012

@timothyandrew,

  1. Ah yes, that's definitely too large. I would suggest only selecting ten or so of the tests from each code point block and writing a nice big comment in the test file explaining what you've done. You might consider including the URL to the full test file so that curious users can re-run the tests on the full suite if they want to. I'd be happy to post the full test file under the Downloads section on Github too :)
  2. The convention that's already established for TwitterCLDR is to extend native classes like String and have them return a localized instance, for example an instance of LocalizedString. Fair warning, @KL-7 is working on plurals right now and has already added a LocalizedString class that we'll have to intelligently merge with when your normalization work is complete. LocalizedString should have a method called normalize that simply delegates to Normalizers::NFD.normalize. Then you can do something like this:
"café".localize.normalize(:NFD)
Contributor

timothyandrew commented Apr 19, 2012

@camertron,

  1. That sounds great, I'll work on cutting down NormalizationTest.txt to a saner number of tests.
  2. All right, so I'll just write Normalizers::NFD.normalize for now, and we'll add LocalizedString#normalize when @KL-7 is done with plurals. That okay?

@timothyandrew timothyandrew commented on the diff Apr 19, 2012

spec/normalizers/canonical/nfd_spec.rb
@@ -0,0 +1,49 @@
+# encoding: UTF-8
+
+require 'spec_helper'
+
+include TwitterCldr::Normalizers
+
+describe NFD do
+ describe "#normalize" do
+ NFD.normalize("庠摪饢鼢豦樄澸脧鱵礩翜艰").should == "庠摪饢鼢豦樄澸脧鱵礩翜艰"
+ NFD.normalize("䷙䷿").should == "䷙䷿"
+ NFD.normalize("ᎿᎲᎪᏨᎨᏪᎧᎵᏥ").should == "ᎿᎲᎪᏨᎨᏪᎧᎵᏥ"
+ NFD.normalize("ᆙᅓᆼᄋᇶ").should == "ᆙᅓᆼᄋᇶ"
+ NFD.normalize("…‾⁋
⁒‒′‾⁖").should == "…‾⁋
⁒‒′‾⁖"
+ NFD.normalize("ⶾⷕⶱⷀ").should == "ⶾⷕⶱⷀ"
@timothyandrew

timothyandrew Apr 19, 2012

Contributor

I used some Java code to generate these strings.

@camertron

camertron Apr 21, 2012

Collaborator

Ha! Awesome.

timothyandrew added some commits Apr 19, 2012

@timothyandrew timothyandrew Use Hash#fetch instead of `or` 2a260d6
@timothyandrew timothyandrew Reduce the size of NormalizationTest.txt
Leave Part0 intact.
Choose 10 elements randomly for each block from Part1.
Choose 10 elements randomly from Part2.
Choose 10 elements randomly from Part3.
7b0b34c
Contributor

timothyandrew commented Apr 19, 2012

I used this script to reduce the size of NormalizationTest.txt.
Part0 is intact, 10 cases are picked randomly from Part1 for each block in Blocks.txt and 10 cases are picked randomly from Part2 & Part 3 (10 each).

I've added a notice at the top of the file detailing the changes I've made to it.

The test takes about a second and a half to run:

± % time rspec **/nfd*spec*rb
rspec **/nfd*spec*rb  1.39s user 0.12s system 99% cpu 1.511 total

And all the tests run in about 2 seconds:

± % time rake
rake  1.90s user 0.22s system 99% cpu 2.122 total

@camertron is this still too slow?

Contributor

timothyandrew commented Apr 20, 2012

I wanted to stress-test my implementation of the NFD algorithm, so I wrote up a small test (running on jRuby) that takes random unicode strings and normalizes them using the Java normalizer as well as this normalizer and compares the results.

The test fails occasionally, but each and every time it's because of a code point that's been introduced after Unicode 4.0.0, which is the version the Java normalizer is apparently based on.

EDIT: Oops, forgot to mention. The random string generator throws up a lot of characters that are valid unicode code points, but code points that have nothing assigned to them. So they don't exist in UnicodeData.txt. The Java normalizer simply returns these code points as is, so I've made that change here as well.

EDIT 2: I guess the Normalizer in Java 7 has been updated to work with Unicode 6. The tests aren't failing at all on JDK 1.7.0_04.

Collaborator

camertron commented Apr 21, 2012

@timothyandrew Great work stress-testing and investigating the issues with the JDK, I appreciate your attention to detail :) Two seconds, while not ideal, is perfectly fine in my book for spec performance, especially considering running them all took 30 seconds - that's a big improvement even if the tests aren't completely comprehensive. The Translation Center's specs take over 10 minutes to run if that's any consolation ^_^

In answer to your previous question, yes, let's wait for @KL-7's changes to LocalizedString and then we can merge this in (he's almost done).

I've read over all the code comments between you and @KL-7, but I must admit I'm starting to get lost in all the discussion threads. Where are we with this PR? Could you summarize a bit?

@camertron camertron and 1 other commented on an outdated diff Apr 21, 2012

lib/normalizers/base.rb
@@ -0,0 +1,18 @@
+# encoding: UTF-8
+
+module TwitterCldr
+ module Normalizers
+ class Base
+ class << self
+ def code_point_to_char(code_point)
+ [code_point.upcase.hex].pack('U*')
+ end
+ def char_to_code_point(char)
+ code_point = char.unpack('U*').first.to_s(16).upcase
+ #Pad to atleast 4 digits
@camertron

camertron Apr 21, 2012

Collaborator

Spelling ^_^

@timothyandrew

timothyandrew Apr 22, 2012

Contributor

Oops. Saying it that way is more like the norm here in India. 😃
Will make the change.

Contributor

timothyandrew commented Apr 22, 2012

@camertron,

Great. :)

The discussions we had were mostly about stylistic changes. In addition to the NFD algorithm, this pull request changes UnicodeData#for_code_point so that it supports the character ranges in UnicodeData.txt. More info here.

Apart from that, I think this pull request is ready to go. Once it's merged in, I'll update PR #19 so that this code uses the Struct instead of the array indices.

Also, my JRuby tests fail very rarely on Java 7. Maybe one test case fails for every 1000. I'm not sure if this is because of Java 7 using a slightly outdated version of Unicode, or if it's a bug in my code. I will look into it, but I don't think it's a big enough issue to warrant delaying merging in this PR.

timothyandrew added some commits Apr 22, 2012

@timothyandrew timothyandrew Fix spelling mistake in comment. d5c4955
@timothyandrew timothyandrew [NFD Bug] Compare indexes instead of elements.
Because the elements keep moving around, we need to use indexes to find
the last element of the array and not run the swapping code for it.
7dfd9f2
Contributor

timothyandrew commented Apr 22, 2012

I did find a minor bug in my NFD implementation, which I fixed in 7dfd9f2.
I'm assuming that the Java 7 normalizer is based on Unicode 6.0.0, not 6.1.0, because the only time a test case failed (in around 10000 cases) was:

String -> "響溺茶輸撚祖盛郞勒刺轢"
Code points -> ["FACA", "F9EC", "F9FE", "FAC2", "F991", "FA50", "FAA7", "FA2E", "F952", "F9FF", "F98D"]
Normalized using Ruby -> ["97FF", "6EBA", "8336", "8F38", "649A", "7956", "76DB", "90DE", "52D2", "523A", "8F62"]
Normalized using Java -> ["97FF", "6EBA", "8336", "8F38", "649A", "7956", "76DB", "FA2E", "52D2", "523A", "8F62"]

The only discrepancy between the Java and Ruby versions is 8th code point. In the Java normalizer, FA2E stays FA2E, but in the Ruby normalizer, it gets decomposed to 90DE. FA2E was introduced in Unicode 6.1.0 (link).

Collaborator

camertron commented Apr 27, 2012

@timothyandrew hahaaa!! We're better than Java. Are you ready for this to be merged in? I think you'll need to merge master into your branch first - Github is saying it can't automatically merge (conflicts no doubt). I'd be happy to merge it by hand if you like. Also, could you combine this PR and PR 19 so we don't have merged code depending on unmerged code? Thanks!

Contributor

timothyandrew commented Apr 27, 2012

@camertron Yeah, I pulled master in, so this should be ready to go.

No part of this code depends on PR 19; it still uses array indices to access code point data. Once this is merged in, I can update PR 19 so all references to UnicodeData#for_code_point treat is at a Struct rather than an Array.
I think keeping it separate might be better just because changing the format of how we treat a code point isn't directly related to this PR. What do you think?

@camertron camertron added a commit that referenced this pull request Apr 27, 2012

@camertron camertron Merge pull request #17 from timothyandrew/nfd-normalizer
Adding Unicode NFD normalization capabilities.
a04853a

@camertron camertron merged commit a04853a into twitter:master Apr 27, 2012

Collaborator

camertron commented Apr 27, 2012

Ok, you're all clear for PR 19. Thanks for the clarification, btw.

Collaborator

camertron commented Apr 27, 2012

Yup, I'm on it.

On Fri, Apr 27, 2012 at 3:59 PM, Chris Aniszczyk <
reply@reply.github.com

wrote:

build is broken now: http://travis-ci.org/#!/twitter/twitter-cldr-rb


Reply to this email directly or view it on GitHub:
#17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment