-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add *_with_custom_entities
versions of all `unescape_*\ methods
#261
Conversation
these versions expect an additional parameter: a hashmap containing custom entity definitions. A typical use-case is for the user to parse those entities definitions from the DOCTYPE. An example is provided: examples/custom_entities.rs .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've made a few comments. Also I feel like there is a lot of code repeat. Not a big deal but if you find a way to make it a little more DRY would be much appreciated!
src/escapei.rs
Outdated
parse_hexadecimal(&bytes[2..]) | ||
} else if bytes.starts_with(b"#") { | ||
parse_decimal(&bytes[1..]) | ||
bytes if bytes[0] == b'#' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes if bytes.starts_with(b'#')
(to accommodate when bytes is empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely!
src/escapei.rs
Outdated
parse_hexadecimal(&bytes[2..]) | ||
} else if bytes.starts_with(b"#") { | ||
parse_decimal(&bytes[1..]) | ||
bytes if bytes[0] == b'#' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
let decoded = reader.decode(&*self.value); | ||
let unescaped = | ||
unescape_with(decoded.as_bytes(), custom_entities).map_err(Error::EscapeError)?; | ||
String::from_utf8(unescaped.into_owned()).map_err(|e| Error::Utf8(e.utf8_error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure it'll be utf8? Shouldn't we use the reader.decode
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards keeping things simple: what about requiring that custom_entities
contains UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright!
About making the code more DRY: yes, I would be better, but I wanted to test the idea with you before refactoring... I'll go make the change (as well as your comments above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
let decoded = reader.decode(&*self.value); | ||
let unescaped = | ||
unescape_with(decoded.as_bytes(), custom_entities).map_err(Error::EscapeError)?; | ||
String::from_utf8(unescaped.into_owned()).map_err(|e| Error::Utf8(e.utf8_error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright!
these versions expect an additional parameter:
a hashmap containing custom entity definitions.
A typical use-case is for the user to parse those entities definitions
from the DOCTYPE.
An example is provided: examples/custom_entities.rs .