-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PHP html_entity_decode function #18092
base: master
Are you sure you want to change the base?
Optimize PHP html_entity_decode function #18092
Conversation
…mize scanning for '&' and ';' using memchr Use memcpy instead of character-by-character copying language
d166abe
to
66f5709
Compare
ext/standard/html.c
Outdated
char *output_ptr = ZSTR_VAL(output); | ||
int doctype = flags & ENT_HTML_DOC_TYPE_MASK; | ||
|
||
assert(*input_end == '\0'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change the input
parameter to a zend_string*
you'd be guaranteed this. Moreover, please use ZEND_ASSERT()
instead.
ext/standard/html.c
Outdated
|
||
unsigned code = 0, code2 = 0; | ||
const char *entity_end_ptr = NULL; | ||
int valid_entity = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int valid_entity = 1; | |
bool valid_entity = true; |
@Girgias are you going to review the logic as well? Just checking if I should look into this or if you are happy to handle it all? |
Please do review the logic, I only had a cursory glance :) |
Ok I will check it out next week if no one is quicker. |
fix logic
9b3e96d
to
f093c30
Compare
Improvements affect the C function
traverse_for_entities
:memchr
to search for '&' instead of scanning character by character.memchr
to locate ';' to determine potential entity boundaries instead ofprocess_named_entity_html
, avoiding unnecessary per-character validations.memcpy
instead of character-by-character copying.Benchmark for 4K-character strings :
All tests are passed!