Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: only encode HTML special characters #10

Closed
andrewhavens opened this issue Aug 13, 2012 · 4 comments
Closed

Feature request: only encode HTML special characters #10

andrewhavens opened this issue Aug 13, 2012 · 4 comments

Comments

@andrewhavens
Copy link

I'm not sure how to do this with your library, but I would like the ability to only encode special characters. For example, I have a block of HTML which has UTF-8 characters, but I don't what to encode the HTML tags. Is there a way that we can pass a option, or configure the encoder to skip the < and > characters which make up an html tag?

@threedaymonk
Copy link
Owner

It can't do it. That's a bit of an ill-defined problem, though. The good news is that if you're doing what I think you're doing, there may be a very simple solution.

If you just want to make your HTML ASCII-safe, by encoding everything above 0x7F, that's really easy (in Ruby 1.9):

s = "<em>日本語</em>"
s.gsub(/\P{ASCII}/){ "&##{$&.unpack('U').first};" }
# => "<em>&#26085;&#26412;&#35486;</em>"

@batter
Copy link

batter commented Feb 20, 2014

@threedaymonk - Nice gem. Sorry for asking, but do you know if there's a way to accomplish the same thing you did with your code sample above in Ruby 1.8.7? The Regexp class doesn't have all the nice character classes and properties in 1.8.7 like 1.9 does.

We have an application currently stuck on Ruby18 and we're working towards moving it to Ruby19 or higher but in the meantime I'm trying to determine what the ideal solution for encoding characters (but omitting HTML tags) is, and finding it difficult to work with encoded characters as a whole when compared to using Ruby19+. Any advice would be appreciated!

@threedaymonk
Copy link
Owner

@batter You can probably do it using something like this (not tested on 1.8.7, but works on 2.1 and doesn't use any post-1.8 features as far as I know):

s.unpack("U*").map { |c| c > 0x7F ? "&##{c};" : [c].pack("c") }.join("")

@batter
Copy link

batter commented Feb 21, 2014

@threedaymonk - Thanks a bunch! I'm going to do some more extensive testing but that seems to do the trick and it isn't raising any errors on 1.8.7. Any tips or pointers on documentation I can read to try to become a little more well versed with working with encodings in Ruby?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants