Permalink
Browse files

Fix for issue #4

'ഒ' was incorrectly being decoded as equivalent to 'ഒ'. This
was due to the two-pass decoding of named entities followed by numeric
entities. This is fixed by moving to a single gsub for both.
  • Loading branch information...
threedaymonk committed Jan 29, 2011
1 parent be3462b commit 9e6c585079df8b5281b574c3ff7ebb98ac447fbe
Showing with 15 additions and 4 deletions.
  1. +8 −4 lib/htmlentities/decoder.rb
  2. +7 −0 test/entities_test.rb
@@ -7,10 +7,14 @@ def initialize(flavor)
end
def decode(source)
- source.to_s.gsub(@named_entity_regexp) {
- (codepoint = @map[$1]) ? [codepoint].pack('U') : $&
- }.gsub(/&#(?:([0-9]{1,7})|x([0-9a-f]{1,6}));/i) {
- $1 ? [$1.to_i].pack('U') : [$2.to_i(16)].pack('U')
+ source.to_s.gsub(/#{@named_entity_regexp}|&#(?:([0-9]{1,7})|x([0-9a-f]{1,6}));/i) {
+ if $1
+ (codepoint = @map[$1]) ? [codepoint].pack('U') : $&
+ elsif $2
+ [$2.to_i].pack('U')
+ else
+ [$3.to_i(16)].pack('U')
+ end
}
end
View
@@ -194,6 +194,13 @@ def test_should_encode_without_error_when_KCODE_is_not_UTF_8
end
end
+ # Reported by ckruse
+ def test_should_decode_only_first_element_in_masked_entities
+ input = 'ഒ'
+ expected = 'ഒ'
+ assert_decode expected, input
+ end
+
def test_should_ducktype_parameter_to_string_before_encoding
pseudo_string = PseudoString.new('foo')
assert_decode('foo', pseudo_string)

0 comments on commit 9e6c585

Please sign in to comment.