-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preg_match_all returns false with PREG_BAD_UTF8_ERROR (4) #105
Comments
In extract_headings, you can test the subject during debugging with: |
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
|
Help or assistance is needed from developers that are experienced working with UTF8 and PHP. |
I'm not experienced enough in the matter to provide a solution to the original problem, but I know that parsing HTML with regex is generally considered a bad idea. Maybe switching to some DOM library will solve both the incorrect UTF-8 problem and general limitations of HTML regex-parsing? |
|
@hieptd This code won't do what @zedzedzed needs. |
dsent is correct. Additionally, a solution to the code snippet provided was in WordPress's remove_accents function as mentioned in #70 and rolled out in version 1509. I'm troubleshooting why preg_match_all fails completely when there is a bad UTF character in the subject. Also after options, opinions, thought and alternatives that aren't too slow (costly to compute). I know there have been big improvements in PHP7 but deferring to its release cannot be an option until WordPress core requires it as a minimum. |
preg_match_all with the u switch returns false when there is bad UTF8 characters in the pattern or subject and stops the matching process resulting in no matches and hence no TOC for the page. In all cases, it has been caused by bad characters in the subject.
Is there a way to suppress the error and continue regardless?
Is there a WordPress core function that may be useful to filter the_content?
Why isn't it failing for other WordPress core things considering it is the_content afterall?
The text was updated successfully, but these errors were encountered: