Feature request: only encode HTML special characters #10

andrewhavens · 2012-08-13T16:15:42Z

I'm not sure how to do this with your library, but I would like the ability to only encode special characters. For example, I have a block of HTML which has UTF-8 characters, but I don't what to encode the HTML tags. Is there a way that we can pass a option, or configure the encoder to skip the < and > characters which make up an html tag?

threedaymonk · 2012-08-18T15:06:04Z

It can't do it. That's a bit of an ill-defined problem, though. The good news is that if you're doing what I think you're doing, there may be a very simple solution.

If you just want to make your HTML ASCII-safe, by encoding everything above 0x7F, that's really easy (in Ruby 1.9):

s = "<em>日本語</em>"
s.gsub(/\P{ASCII}/){ "&##{$&.unpack('U').first};" }
# => "<em>&#26085;&#26412;&#35486;</em>"

batter · 2014-02-20T23:15:23Z

@threedaymonk - Nice gem. Sorry for asking, but do you know if there's a way to accomplish the same thing you did with your code sample above in Ruby 1.8.7? The Regexp class doesn't have all the nice character classes and properties in 1.8.7 like 1.9 does.

We have an application currently stuck on Ruby18 and we're working towards moving it to Ruby19 or higher but in the meantime I'm trying to determine what the ideal solution for encoding characters (but omitting HTML tags) is, and finding it difficult to work with encoded characters as a whole when compared to using Ruby19+. Any advice would be appreciated!

threedaymonk · 2014-02-21T00:54:32Z

@batter You can probably do it using something like this (not tested on 1.8.7, but works on 2.1 and doesn't use any post-1.8 features as far as I know):

s.unpack("U*").map { |c| c > 0x7F ? "&##{c};" : [c].pack("c") }.join("")

batter · 2014-02-21T15:05:19Z

@threedaymonk - Thanks a bunch! I'm going to do some more extensive testing but that seems to do the trick and it isn't raising any errors on 1.8.7. Any tips or pointers on documentation I can read to try to become a little more well versed with working with encodings in Ruby?

threedaymonk closed this as completed Oct 19, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: only encode HTML special characters #10

Feature request: only encode HTML special characters #10

andrewhavens commented Aug 13, 2012

threedaymonk commented Aug 18, 2012

batter commented Feb 20, 2014

threedaymonk commented Feb 21, 2014

batter commented Feb 21, 2014

Feature request: only encode HTML special characters #10

Feature request: only encode HTML special characters #10

Comments

andrewhavens commented Aug 13, 2012

threedaymonk commented Aug 18, 2012

batter commented Feb 20, 2014

threedaymonk commented Feb 21, 2014

batter commented Feb 21, 2014