Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to exlude some characters from being decoded #18

Open
gucki opened this issue Dec 12, 2014 · 8 comments
Open

Option to exlude some characters from being decoded #18

gucki opened this issue Dec 12, 2014 · 8 comments

Comments

@gucki
Copy link

gucki commented Dec 12, 2014

It'd be great if one could specify that some characters should be excluded from decoding. For example when trying to sanitize/normalize HTML, < and > are good examples to be excluded.

@gucki
Copy link
Author

gucki commented Dec 12, 2014

My current workaround is:

class HTMLEntities
  FLAVORS << "xhtml1_customized"

  MAPPINGS["xhtml1_customized"] = MAPPINGS["xhtml1"].dup
  MAPPINGS["xhtml1_customized"].reject!{ |v| ["lt", "gt", "amp"].include?(v) }
end

And then do HTMLEntities.new("xhtml1_customized").decode(string).

@threedaymonk
Copy link
Owner

That sounds like a really nice feature. Do you have any ideas about what the interface might ideally look like?

@gucki
Copy link
Author

gucki commented Dec 12, 2014

Not really, but probably something really simple like TMLEntities.new("xhtml1").decode(string, :exclude => ["lt", "gt", "amp"])?

@Jerska
Copy link

Jerska commented Mar 22, 2015

👍 That's the only missing thing in this library.
But I would specify the list the other way around, since for a same character you can have multiple encodings HTMLEntities.new.decode(str, exlude: ['<', '>', '&'])

@sb4m
Copy link

sb4m commented Aug 18, 2015

+1

2 similar comments
@skatiruas
Copy link

+1

@zstrad44
Copy link

zstrad44 commented Feb 1, 2019

+1

@smithtim
Copy link

How about a parameter to only decode "safe" codepoints?

The encode method already makes a distinction between "safe" and "unsafe" codepoints. By default, it only encodes unsafe codepoints:

string = "<élan>"
coder.encode(string) # => "&lt;élan&gt;"

The decode method could take a parameter to only decode safe codepoints:

string = "&lt;&eacute;lan&gt;"
coder.decode(string, :safe) # => "&lt;élan&gt;"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants