New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Compressor + antispambot() with umlauts results in cached blank page #871

Open
code-flow opened this Issue Jan 25, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@code-flow
Copy link

code-flow commented Jan 25, 2017

Hey guys,
I hope you're doing well today. I've interesting bug report for you.

It seems that the latest version (I'm using CometCache Pro version 161227) has problems when a website uses WordPress' internal antispambot() function and when an e-mail address has umlauts in it.

Here's an example code you can use in your themes functions.php:

add_action('init', function(){
	add_shortcode('testmail', function($atts, $content, $name){
		return sprintf('<a href="mailto:%s">Testmail</a>', antispambot('info@hägar.de'));
	});
});

Then enter the shortcode [testmail] into any blogpost. You will see that CometCache will return an empty page. Unfortunately this empty page gets cached.

The problem seems to be in the tokenizeGlobalExclusions() function of the \WebSharks\HtmlCompressor\Core class. And there in the preg_replace_callback() function that is using the "u" modifier since you updated the plugin last time.

The preg_replace_callback() returns an empty string. I'm not sure if the lite version has the same issue.

Hope this helps making CometCache even more better! ;)

Greetings

@raamdev

This comment has been minimized.

Copy link
Contributor

raamdev commented Jan 27, 2017

@code-flow Thank you for the report! :-) However, I haven't been able to reproduce this issue following your instructions above. I'm using Comet Cache Pro v161227 + HTML Compressor, on WP v4.7.2 running the Twenty Fifteen theme on PHP v7.0.12 and Nginx v1.11.3.

The page with the shortcode gets cached properly and subsequent visits to that cached page do not show up as blank, but rather the cached page gets loaded as expected:

2017-01-26_21-37-15

Any other ideas how I can reproduce this problem?

@code-flow

This comment has been minimized.

Copy link

code-flow commented Jan 27, 2017

Hey @raamdev. You're welcome. Thanks for having a look into that. I have this issue on a customer website as well as on my local machine.

  • Live Site is PHP/7.0.13 on Apache/2.4.20.
  • Dev Site is PHP/7.0.1 on nginx/1.10.1.
  • CometCache-Version is always 161227.

I've tested again with the following:

  • WP 4.7.2
  • fresh installation
  • no plugins except Comet Cache CometCache 161227
  • Set Option "Yes, I want to compress HTML/CSS/JS" on settings page.
  • shortcode included in functions.php of TwentyFifteen

Same issue again. Blank site. No PHP-Errors. The only message I get on this blank site when I look at the sourcecode is:
<!-- Comet Cache HTML Compressor took 0.00088 seconds (overall). -->

@raamdev

This comment has been minimized.

Copy link
Contributor

raamdev commented Jan 27, 2017

@code-flow Can you confirm that disabling the HTML Compressor resolves the issue? I.e., that this seems specifically related to the HTML Compressor?

@code-flow

This comment has been minimized.

Copy link

code-flow commented Jan 30, 2017

Yes, when HTML-Compressor is off, everything works just fine.

@code-flow

This comment has been minimized.

Copy link

code-flow commented Jan 30, 2017

I'm not sure if the following helps:

When using
antispambot('info@hägar.de', 0) // --> blank page
the output of the above shortcode is:
<a href="mailto:in&#102;&#111;&#64;hä&#103;&#97;&#114;.&#100;e">Testmail</a>
And I get a blank page.

The following works (hex-encoded). However the E-Mail would then be sent to "info@hägar.de".
antispambot('info@hägar.de', 1) // --> works
The output:
<a href="mailto:%69n%66&#111;%40%68�&#164;g%61%72%2ed%65">Testmail</a>

I guess that the umlaut is the problem. Unfortunately I don't know why and if it's a problem of the compressor or the antispambot() function.

Were you be able to reproduce it?

@raamdev

This comment has been minimized.

Copy link
Contributor

raamdev commented Jan 30, 2017

I was able to reproduce this issue, yes. It looks like the blank page issue is not consistent—it only occurs about 50% of the time. I'm guessing there's some character that antispambot() is sometimes using (the specific characters change and which part of the email address that gets encoded also changes) that the HTML Compressor is choking on.

When I got the blank cached page, here's what the cache file contained:

a:7:{i:0;s:12:"HTTP/1.1 200";i:1;s:38:"Expires: Wed, 11 Jan 1984 05:00:00 GMT";i:2;s:51:"Cache-Control: no-cache, must-revalidate, max-age=0";i:3;s:16:"Pragma: no-cache";i:4;s:38:"Content-Type: text/html; charset=UTF-8";i:5;s:69:"Link: ; rel="https://api.w.org/"";i:6;s:57:"Link: ; rel=shortlink";} 

So this definitely looks like an HTML Compressor bug. Thanks so much @code-flow for reporting this! We'll work on a bug fix.

@raamdev raamdev added this to the Future Release milestone Jan 30, 2017

@raamdev raamdev changed the title Latest version caches empty page when using antispambot() with umlauts. HTML Compressor + antispambot() with umlauts results in cached blank page Jan 30, 2017

jaswrks pushed a commit to wpsharks/html-compressor that referenced this issue Apr 20, 2017

jaswrks

jaswrks pushed a commit to wpsharks/html-compressor that referenced this issue Apr 20, 2017

jaswrks

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Apr 20, 2017

jaswrks

jaswrks added a commit to wpsharks/comet-cache-pro that referenced this issue Apr 20, 2017

@jaswrks

This comment has been minimized.

Copy link

jaswrks commented Apr 20, 2017

My investigation shows that antispambot() uses zeroise() and at times it produces an invalid UTF-8 sequence; i.e., that function is buggy. Sometimes the output it generates is fine, other times it generates an invalid UTF-8 sequence. As a result, the invalid UTF-8 sequence is passed to the HTML Compressor and run through preg_replace(), which chokes on the invalid UTF-8 sequence and returns an empty string.

To avoid this pitfall, the HTML Compressor now checks for invalid UTF-8 before it begins and will refuse to compress an HTML document that contains an invalid UTF-8 sequence. Not compressing is better than failing with the empty document in a case such as this.

Example Output in An Invalid UTF-8 Scenario

<!-- Comet Cache HTML Compressor did not run; HTML contains invalid UTF-8. -->

<!-- *´¨)
     ¸.•´¸.•*´¨) ¸.•*¨)
     (¸.•´ (¸.•` ¤ Comet Cache is Fully Functional ¤ ´¨) -->

<!-- Cache File User Token:         1 -->
<!-- Cache File Version Salt:       n/a -->

<!-- Cache File URL:                http://dev.jaswrks.com/test-page/ -->
<!-- Cache File Path:               /cache/comet-cache/cache/http/dev-jaswrks-com/test-page.u/1.html -->

<!-- Cache File Generated Via:      HTTP request -->
<!-- Cache File Generated On:       Apr 20th, 2017 @ 6:52 am UTC -->
<!-- Cache File Generated In:       0.15189 seconds -->

<!-- Cache File Expires On:         Apr 27th, 2017 @ 6:52 am UTC -->
<!-- Cache File Auto-Rebuild On:    Apr 27th, 2017 @ 6:52 am UTC -->

<!-- Loaded via Cache On:    Apr 20th, 2017 @ 6:53 am UTC -->
<!-- Loaded via Cache In:    0.02980 seconds -->

raamdev added a commit to wpsharks/comet-cache-pro that referenced this issue May 19, 2017

@code-flow

This comment has been minimized.

Copy link

code-flow commented Jun 29, 2017

Has this issue been merged to the pro version already? We've updated today to version 170220 but the problem still exists :-(

@raamdev

This comment has been minimized.

Copy link
Contributor

raamdev commented Jun 29, 2017

@code-flow This issue has been resolved in the dev-branch and will go out with the next official release. We're hoping to get the next release out within a week or two. This GitHub issue will be updated once the release occurs.

Note: If you're interested in testing a beta release of Comet Cache before the next version comes out, please sign-up to be a beta tester here or see Comet Cache → Plugin Updater → Beta Testers to automatically receive Release Candidate updates.

raamdev added a commit that referenced this issue Aug 8, 2017

Phing release of v170808-RC with the following changes:
- **New Feature: Memcached / RAM** (Pro): Comet Cache Pro now includes support for Memcached / AWS ElastiCache to serve the cache directly from RAM. This allows for a faster cache delivery than what is possible when serving the cache via disk. Memcached / AWS ElastiCache can be configured from **Dashboard → Comet Cache Pro → Plugin Options → RAM / Memcached**. See [Issue #47](#47)
- **Enhancement**: Added `Referrer-Policy` to whitelist for cachable HTTP headers. See [Issue #892](#892).
- **Bug Fix** (Pro): The Cache Statistics feature was broken when the PHP `disk_total_space()` and/or `disk_free_space()` functions were disabled by the PHP configuration. Comet Cache now handles this scenario gracefully by hiding disk-related stats when those functions are not allowed. See [Issue #775](#775)
- **Bug Fix** (Pro): The HTML Compressor was returning an empty string upon encountering an invalid UTF-8 sequence. See [Issue #871](#871) reported by a Comet Cache user.
- **Compatibility** (Pro): Many improvements to the Pro software update system, including changes to the API Endpoints and the Proxy Fallback endpoint. See [Issue #879](#879) and [Issue #315](wpsharks/comet-cache-pro#315) for full details.
- **Compatibility**: Fixed a WooCommerce compatibility issue that was generating a "Notice: id was called incorrectly. Product properties should not be accessed directly." Props @vestaxpdx. See [Issue #896](#896).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment