New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Multibyte Strings #703

Closed
raamdev opened this Issue Mar 4, 2016 · 31 comments

Comments

Projects
None yet
2 participants
@raamdev
Contributor

raamdev commented Mar 4, 2016

Forked from #635.

@jaswsinc writes...

We should do a better job (even better than WP itself) in dealing with UTF-8. Given the amount of string manipulation that occurs in Comet Cache, we should:

  • Adopt the use of all mb_*() functions where applicable.
  • Always pass the /u flag to any preg_*() functions that we use.
  • Ensure that WP PHP RV requires the mbstring extension to prevent issues with sites that don't have it installed.

This work can probably be moved to a separate issue though; i.e., we should have a separate issue that will be for UTF-8 enhancements. See this article.

Related GitHub issue: #640 (comment)


Also noting here at an announcement was made in Comet Cache v160222 to any site owner that did not have the mbstring extension installed that the PHP mbstring extension would be required by Comet Cache after March 1st.

@raamdev raamdev added the todo label Mar 4, 2016

@jaswrks jaswrks referenced this issue Mar 4, 2016

Closed

PHP v5.4+ Codebase Enhancements #635

5 of 5 tasks complete
@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

A first and very easy step here would be to search for all calls to:

  • preg_match
  • preg_replace
  • preg_replace_callback

and on the end of the regex pattern add a u modifier.

e.g., /pattern/ becomes /pattern/u
e.g., /pattern/i becomes /pattern/ui

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

Next, we need to search for instances of:

  • strpos
  • stripos

and a decision needs to be made that is based on the following.

  • If strpos is being used to detect the number of bytes, leave it as-is.
  • If strpos is being used to detect the number of characters (or to search for another string, most prevalent in Comet Cache), change it to mb_strpos or mb_stripos respectively.
@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

Next, search for:

  • strtolower
  • strtoupper

And change all of those to mb_strtolower() or mb_strtoupper() respectively.

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

Same thing for substr; i.e., should become mb_substr.

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

Next, we will need to see which other UTF-8-incompatible functions we are using in Comet Cache; i.e., see which functions we use in PHP core for which there is no UTF-8 mb_* variation available. In those cases, we will need to build a replacement of our own as a polyfill.

e.g., There is no mb_str_ireplace, and there is no mb_str_pad() either. PHP is also missing mb_substr_replace(). I'm not sure if Comet Cache needs any of these (or others), but we need to dig through and see which might need polyfills.

Examples of a UTF-8 compatible polyfills:

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

Referencing full list of existing mb_ functions provided by the mbstring extension.
http://php.net/manual/en/ref.mbstring.php

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • Noting that the trim(), rtrim() and ltrim() PHP functions are generally safe to use on multibyte data. However, if we find instances of these functions where we are attempting to trim characters outside of the ASCII range, we will need to use preg_replace() with the /u modifier instead.

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Mar 4, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented Mar 4, 2016

If strpos is being used to detect the number of bytes, leave it as-is.
If strpos is being used to detect the number of characters (or to search for another string, most prevalent in Comet Cache), change it to mb_strpos or mb_stripos respectively.

Can you give an example of those two? I'm not seeing anything that looks like strpos() is being used to detect the number of bytes...

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 4, 2016

  • [x]

Sorry, I stated that incorrectly. What I should have been talking about there was strlen() instead of strpos(). All instances of strpos() and stripos() should be updated to their mb_ counterparts.


Examples of strlen() looking for (or testing) the number of bytes:

if(!strlen($variable)) {
   // Empty.
}
header('Content-Length: '.strlen($export));

Examples of strlen() looking for the total number of characters.

if (strlen($string) > $max_length) {
$hook = GLOBAL_NS.substr($hook, strlen('zencache'));

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Mar 4, 2016

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Mar 4, 2016

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Mar 4, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented Mar 6, 2016

@jaswsinc writes...

Noting that the trim(), rtrim() and ltrim() PHP functions are generally safe to use on multibyte data. However, if we find instances of these functions where we are attempting to trim characters outside of the ASCII range, we will need to use preg_replace() with the /u modifier instead.

Can you give me an example of where we're "attempting to trim characters outside of the ASCII range"? Trim functions are usually used on variables that have hard-to-predict (dynamic) values, so this seems like a case of challenging best-guess work.

@raamdev raamdev modified the milestone: Next Release Mar 6, 2016

@jaswrks

This comment has been minimized.

Member

jaswrks commented Mar 6, 2016

Example where it should be updated:

$string = trim($string, '®');

Should become:

$string = preg_replace('/^®++$/u', '', $string);

I'm not sure if there are any cases like this in Comet Cache, so there might be nothing to do.


Examples that are fine like they are:

$string = trim($string);
$string = rtrim($string, '/');
$string = ltrim($string, '/');
$string = trim($string, '-');

@raamdev raamdev modified the milestones: Future Release, Next Release Apr 4, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented May 11, 2016

Reviewed all instances of trim(), rtrim() and ltrim(); not seeing anything that needs to be changed.

raamdev added a commit to websharks/comet-cache-pro that referenced this issue May 11, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented May 11, 2016

@jaswsinc writes in #703 (comment)...

There is no mb_str_ireplace, and there is no mb_str_pad() either. PHP is also missing mb_substr_replace(). I'm not sure if Comet Cache needs any of these (or others), but we need to dig through and see which might need polyfills.

I reviewed occurrences of str_ireplace() and substr_replace() in the codebase and none seemed to require Multibyte replacements.

I think the work to add support for multibyte strings is complete except for the analysis of any other areas where we might need polyfills. Would you mind taking a look when you get a chance and let me know if I missed anything here?

Once you're done, I think websharks/comet-cache-pro#225 is ready to merge if you agree (all that's left is to add the check to WP PHP RV).

raamdev added a commit to websharks/comet-cache-pro that referenced this issue May 11, 2016

@raamdev raamdev modified the milestones: Next Release, Future Release May 11, 2016

@jaswrks

This comment has been minimized.

Member

jaswrks commented May 17, 2016

Outstanding! I just reviewed the changes that you introduced and those all look great to me. In particular, I think that this line is going to be a very important fix that will prevent corruption of the wp-config.php file in certain cases where it contains UTF-8 chars. I've seen that reported now and again in both CC and in s2Member, and I think this will fix that.

I'm taking a closer look now and the rest of the codebase, so I will post another update shortly. This file in particular needs some work; i.e., substr_replace() there is problematic. As you mentioned, a polyfill will be needed, or an alternate approach maybe.

@raamdev

This comment has been minimized.

Contributor

raamdev commented Jun 3, 2016

@jaswsinc writes...

I'm taking a closer look now and the rest of the codebase, so I will post another update shortly

Ping. ↑ 😄

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 3, 2016

I reviewed occurrences of str_ireplace() and substr_replace() in the codebase and none seemed to require Multibyte replacements.

Anything which currently uses str_ireplace() can be updated to use preg_replace('//ui', ...) instead, for unicode compatibility. So for instance: https://github.com/websharks/comet-cache-pro/search?utf8=%E2%9C%93&q=str_ireplace

$wp_path_rel_to_home = str_ireplace($home, '', $siteurl); /* $siteurl - $home */

Becomes:

$wp_path_rel_to_home = preg_replace('/'.preg_quote($home, '/').'/ui', '', $siteurl); /* $siteurl - $home */

That way if the URL contains a multibyte character it won't get mangled when doing string replacements.

In WordPress (as it stands), this is not going to do anything super useful at the moment, because WordPress is not perfect in the way it deals with UTF-8 yet, which means that $home containing unicode may break, regardless, in various areas of the WP core.

However, from my perspective it's the difference between wrong and right in the way this is handled. As WordPress improves its support for unicode throughout, it will make it possible for WordPress home/blog URLs to contain unicode characters (e.g., emojis) and when that happens we want Comet Cache to be prepared for it.


I have some other comments I'll post shortly.

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

This line can become:

$cache_dir_tmp_regex = preg_replace('/'.preg_quote(preg_quote($cache_dir.'/', '/'), '/').'/ui', '', $cache_dir_tmp_regex, 1);

This line can become:

$_host_cache_dir_tmp_regex = preg_replace('/'.preg_quote(preg_quote($_host_cache_path.'/', '/'), '/').'/ui', '', $_host_cache_dir_tmp_regex, 1);

This line can become:

$_host_cache_dir_tmp_regex = preg_replace('/'.preg_quote(preg_quote($_host_cache_dir.'/', '/'), '/').'/ui', '', $_host_cache_dir_tmp_regex, 1);

And then, this file could be removed and/or deprecated, because those methods are no longer used by anything.

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

The stripos() examples in this file could be updated to show mb_stripos().
https://github.com/websharks/comet-cache-pro/blob/feature/703/src/includes/templates/ac-plugin.txt

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

These lines can be replaced with:

$advanced_cache_contents = preg_replace(
    [
        '/'.preg_quote("'%%".GLOBAL_NS.'_'.$_option."%%'", '/').'/ui',
        '/'.preq_quote("'%%".GLOBAL_NS.'_'.preg_replace('/^cache_/ui', '', $_option)."%%'", '/').'/ui',
    ],
    $_value,
    $advanced_cache_contents
);
@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

This line can be replaced with:

$advanced_cache_contents = preg_replace('/'.preg_quote("'%%".GLOBAL_NS."_PLUGIN_FILE%%'", '/').'/ui', $plugin_file, $advanced_cache_contents);
@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

In this file there are a few instances of strtolower() that still should be changed to mb_strtolower().

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

In this file also. Convert all instances of strtolower() into mb_strtolower().

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 4, 2016

Everything else looks great. I'd like to do another round of testing once the above changes are implemented though.

@raamdev raamdev self-assigned this Jun 6, 2016

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Jun 7, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented Jun 7, 2016

@jaswsinc This is ready for another round of tests. Your latest suggestions have been committed in websharks/comet-cache-pro@edadf24.

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 7, 2016

@raamdev The code sample I gave here has a misspelled word preq instead of preg.

raamdev added a commit to websharks/comet-cache-pro that referenced this issue Jun 7, 2016

@raamdev

This comment has been minimized.

Contributor

raamdev commented Jun 7, 2016

a misspelled word preq instead of preg.

Ha, glad somebody caught that. 😳

@raamdev

This comment has been minimized.

Contributor

raamdev commented Jun 13, 2016

@jaswsinc writes...

Everything else looks great. I'd like to do another round of testing once the above changes are implemented though.

Were you able to run those tests? Can I merge now, or would you like to do some more testing?

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 14, 2016

I'm running some additional tests now...

@jaswrks

This comment has been minimized.

Member

jaswrks commented Jun 16, 2016

I ran some tests on this branch and I haven't seen any problems thus far. Looks good-to-go.

@raamdev

This comment has been minimized.

Contributor

raamdev commented Jun 20, 2016

Next Release Changelog:

  • Enhancement: Added full support for UTF-8 (multibyte strings). This release adds full support for UTF-8 throughout the Comet Cache codebase, greatly enhancing Comet Cache's ability to deal with file paths and URLs that may contain UTF-8 characters. Props @jaswsinc. Issue #703.

@raamdev raamdev closed this Jun 20, 2016

raamdev added a commit that referenced this issue Jul 7, 2016

Phing release of v160706 with the following changes:
- **New Feature! Apache Optimizations.** This release includes a completely new option panel for Apache Performance Tuning. Current options for Apache tuning include GZIP Compression, Leverage Browser Caching, Enforce Canonical URLs, and Send Access-Control-Allow-Origin Header (for Static CDN Filters). These options automatically add or remove from your `.htaccess` file the appropriate configuration based on the options you enable or disable (all options are disabled by default, so your `.htaccess` file is not modified unless you say so). If you prefer to update your `.htaccess` file manually, the necessary configuration can be viewed beneath each option. Props @jaswsinc, @renzms. See [Issue #789](#789).
- **New Feature!** A new "Enable GZIP Compression" option has been added to the new Apache Optimizations panel. This option will automatically add the appropriate configuration to your `.htaccess` file to enable GZIP compression. This option is disabled by default. The old "GZIP Compression" panel has been removed in favor of the new option inside Apache Optimizations. Props @renzms, @jaswsinc. See [Issue #764](#764).
- **New Feature!** Multisite Host Exclusion Patterns. It's now possible to exclude entire sites from the cache in a Multisite Network environment. Domain mapping is also supported! See _Comet Cache → Plugin Options → Host Exclusion Patterns_. If you're running a Multisite Network with Sub-Directories, you can exclude sites using the existing URI Exclusion Patterns feature. Props @kristineds. See [Issue #754](#754).
- **New Feature (Pro)!** A new "Leverage Browser Caching" option has been added to the new Apache Optimizations panel. This option will automatically add the appropriate configuration to your `.htaccess` file to enable Browser Caching. This option is disabled by default. Props @renzms, @jaswsinc. See [Issue #764](#764).
- **New Feature (Pro)!** A new "Enforce Canonical URLs" option has been added to the new Apache Optimizations panel. This options adds the appropriate `.htaccess` code to enforce the correct canonical URLs according to your WordPress Permalink settings (Comet Cache detects if the Permalink Structure ends with a trailing slash, or without a trailing slash). Props @renzms, @jaswsinc. See [Issue #554](#554).
- **Bug Fix**: In some scenarios the Cron Event that cleans up expired cache files (`_cron_comet_cache_cleanup`) would never run, or the Next Run time would constantly reset to 1 minute away from running every time a page was reloaded. We suspect this is a race condition and in attempt to work around this issue we now skip all of our Cron-related checks if Cron is currently in the middle of running a process. Props @xberg and @lkraav for help reporting. See [Issue #653](#653).
- **Bug Fix**: If your site uses aliased domains, Comet Cache now properly considers all possible domain variations when it clears the cache on WP Standard installations. Props @kristineds, @jaswsinc, @yoffe, and @VR51. See [Issue #608](#608).
- **Bug Fix** (Pro): Fixed a bug where Comet Cache would appear to prevent WordPress from redirecting Permalinks that don't include a trailing slash, to the URL that does include a trailing slash. This was due to the fact that Comet Cache loads very early on (for caching purposes) and as a result the WordPress `redirect_canonical()` function never gets run. This was fixed by adding an option to the new Apache Optimizations panel that allows you to Enforce Canonical URLs. Props @renzms, @jaswsinc. See [Issue #554](#554).
- **UX Bug Fix** (Pro): If you had your WordPress Dashboard login details saved by your browser, the browser autofill would automatically fill in the Pro Plugin Updater fields with those details, which then needed to be replaced with your actual Pro license details. The browser autofill has been disabled for those fields (tested in Chrome, Firefox, and Safari). Props @renzms. See [Issue #741](#741).
- **Enhancement**: Added links the Options Page for the Comet Cache [Twitter](http://twitter.com/cometcache) and [Facebook](http://facebook.com/cometcache) accounts. Props @renzms. [Issue #771](#771).
- **Enhancement:** Added full support for UTF-8 (multibyte strings). This release adds full support for UTF-8 throughout the Comet Cache codebase, greatly enhancing Comet Cache's ability to deal with file paths and URLs that may contain UTF-8 characters. Props @jaswsinc. [Issue #703](#703).
- **UI Enhancements**: Improved the Logged-In Users and the Client-Side Caching options panels to dim additional options when the feature is disabled. Additionally, the "Enable HTML Compression for Logged-In Users?" option has been relocated from the HTML Compressor option panel to the more appropriate Logged-In Users option panel. See [Issue #768](#768).
- **UX Enhancement**: Improved the inline docs for Auto-Clear List of Custom URLs to clarify that full URLs must be provided. Props @renzms. See [Issue #781](#781).
- **Enhancement** (Pro): The Pro Plugin Updater has been improved to allow for better compatibility with hosting platforms that use Apache's ModSecurity. In some cases, site owners were seeing a 404 error when attempting to update the Pro version using the Pro Plugin updater because certain ModSecurity rules were blocking the Pro Updater requests. The Pro Plugin Updater now uses WP Transients to store the necessary metadata, which works around the issue with ModSecurity. Props to @seozones for reporting and @jaswsinc for help fixing this. [Issue #416](#416).
- **Enhancement** (Pro): When Static CDN Filters are enabled, it's now possible to disable the automatic insertion of rules into your `.htaccess` file that are designed to prevent issues with [CORS](https://cometcache.com/kb-article/what-are-cross-origin-request-blocked-errors/). See _Apache Optimizations → Send Access-Control-Allow-Origin Header?_ See [Issue #787](#787).
- **Enhancement** (Pro): The HTML Notes added to the bottom of a cached page now specify if the page was cached as the result of an HTTP Request or if it was cached by the Auto-Cache Engine. Props @kristineds. See [Issue #292](#292).
- **Enhancement** (Pro): The Auto-Cache Engine now supports a fallback to cURL using the WP HTTP API. If your PHP configuration has `allow_fopen_url=0`, the Auto-Cache Engine will use the fallback to download the XML Sitemap and parse it from a temporary file. If you want to force the use of this fallback even when `allow_fopen_url=1`, you can use [a filter](#440 (comment)). See [Issue #440](#440).
- **UI Enhancement** (Pro): A second button has been added to the bottom of the Pro Plugin Updater page that allows you to "Save and Update Comet Cache Pro" in one step. Props @renzms. See [Issue #741](#741).
- **UI Enhancement** (Pro): The "Cache Stats" button in Admin Bar is now linked to the Cache Stats page. Instead of hovering over the button and then clicking "More Info" inside the popup panel, you can now just click the "Cache Stats" button to go directly to the Cache Stats page. Props @Presskopp, @renzms. See [Issue #780](#780).
- **Comment Mail Compatibility:** Improved compatibility with the Comment Mail plugin by automatically clearing the cache whenever Comment Mail options are changed. Many of the Comment Mail options affect front-end portions of the site, so it's important that the cache is cleared whenever Comment Mail options change. See [Comment Mail Issue #278](websharks/comment-mail#278 (comment)).
- **PHP Compatibility:** Improved compatibility back to PHP 5.2 (the lowest version allowed by WordPress). Comet Cache still requires PHP 5.4+, but if you install Comet Cache on a site running PHP 5.2, it will now fail gracefully with a Dashboard notice indicating PHP 5.4+ is required, instead of producing a fatal error. See [Issue #784](#784).
- **WP-CLI Compatibility**: Fixed a bug with deactivating Comet Cache using WP-CLI. Doing so was producing a "Invalid argument; host token empty!" error message. This has been resolved. Props @MarioKnight @jaswsinc @renzms. See [Issue #728](#728).
- Renamed `COMET_CACHE_ALLOW_BROWSER_CACHE` constant to `COMET_CACHE_ALLOW_CLIENT_SIDE_CACHE`. Backwards compatibility has been maintained.
- Renamed `allow_browser_cache` plugin option to `allow_client_side_cache`.
@raamdev

This comment has been minimized.

Contributor

raamdev commented Jul 7, 2016

Comet Cache v160706 has been released and includes changes from this GitHub Issue. See the v160706 announcement for further details.


This issue will now be locked to further updates. If you have something to add related to this GitHub Issue, please open a new GitHub Issue and reference this one (#703).

@websharks websharks locked and limited conversation to collaborators Jul 7, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.