-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: only encode HTML special characters #10
Comments
It can't do it. That's a bit of an ill-defined problem, though. The good news is that if you're doing what I think you're doing, there may be a very simple solution. If you just want to make your HTML ASCII-safe, by encoding everything above 0x7F, that's really easy (in Ruby 1.9): s = "<em>日本語</em>"
s.gsub(/\P{ASCII}/){ "&##{$&.unpack('U').first};" }
# => "<em>日本語</em>" |
@threedaymonk - Nice gem. Sorry for asking, but do you know if there's a way to accomplish the same thing you did with your code sample above in Ruby 1.8.7? The We have an application currently stuck on Ruby18 and we're working towards moving it to Ruby19 or higher but in the meantime I'm trying to determine what the ideal solution for encoding characters (but omitting HTML tags) is, and finding it difficult to work with encoded characters as a whole when compared to using Ruby19+. Any advice would be appreciated! |
@batter You can probably do it using something like this (not tested on 1.8.7, but works on 2.1 and doesn't use any post-1.8 features as far as I know): s.unpack("U*").map { |c| c > 0x7F ? "&##{c};" : [c].pack("c") }.join("") |
@threedaymonk - Thanks a bunch! I'm going to do some more extensive testing but that seems to do the trick and it isn't raising any errors on 1.8.7. Any tips or pointers on documentation I can read to try to become a little more well versed with working with encodings in Ruby? |
I'm not sure how to do this with your library, but I would like the ability to only encode special characters. For example, I have a block of HTML which has UTF-8 characters, but I don't what to encode the HTML tags. Is there a way that we can pass a option, or configure the encoder to skip the < and > characters which make up an html tag?
The text was updated successfully, but these errors were encountered: