-
Notifications
You must be signed in to change notification settings - Fork 84
Very inefficient decoding of large chunked messages! #112
Conversation
Requires a performance test to prevent regressions. Unsure how to achieve that besides using relative timing of the regex applied to some data, and a large chunk of text passed to this method. |
I added a performance test, by crafting a "worst case" packet and measuring relative timing. Verified that test fails on old version: Edit: Fixed the formatting. |
src/Response.php
Outdated
|
||
while (true) { | ||
if (! preg_match("/^([\da-fA-F]+)[^\r\n]*\r\n/sm", $body, $m, 0, $offset)) { | ||
if (! empty(trim(substr($body, $offset)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! empty
seems to be useless here
src/Response.php
Outdated
if (! empty(trim(substr($body, $offset)))) { | ||
// Message was not consumed completely! | ||
throw new Exception\RuntimeException( | ||
"Error parsing body - doesn't seem to be a chunked message" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use single quotes around the error message string above?
src/Response.php
Outdated
throw new Exception\RuntimeException( | ||
"Error parsing body - doesn't seem to be a chunked message" | ||
); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need else
statement here.
test/ResponseTest.php
Outdated
@@ -176,6 +177,54 @@ public function testChunkedResponseCaseInsensitiveZF5438() | |||
$this->assertEquals('c0cc9d44790fa2a58078059bab1902a9', md5($res->getContent())); | |||
} | |||
|
|||
/** | |||
* @param number $chunksize the data size of the chunk to create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use int
param type instead of number
test/ResponseTest.php
Outdated
|
||
/** | ||
* @param Response $response | ||
* @return the time that calling the getBody function took on the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add return type.
test/ResponseTest.php
Outdated
private function makeChunk($chunksize) | ||
{ | ||
$chunkdata = str_repeat("W", $chunksize); | ||
return "$chunksize\r\n$chunkdata\r\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure here, but maybe sprintf
and/or PHP_EOL
?
test/ResponseTest.php
Outdated
*/ | ||
private function getTimeForGetBody(Response $response) | ||
{ | ||
$time_start = microtime(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use camelCase variable names.
test/ResponseTest.php
Outdated
$time2 = $this->getTimeForGetBody($response); | ||
|
||
// Make sure that the worst case packet will have an equal timing as the baseline | ||
$errMsg = "Chunked response is not parsing large packets efficiently: " . ($time2 / $time1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use sprintf
there and single quotes.
test/ResponseTest.php
Outdated
|
||
// Make sure that the worst case packet will have an equal timing as the baseline | ||
$errMsg = "Chunked response is not parsing large packets efficiently: " . ($time2 / $time1); | ||
$this->assertTrue(2 > ($time2 / $time1), $errMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use here assertLessThan
?
Brackets around $time2 / $time1
are useless.
test/ResponseTest.php
Outdated
@@ -10,6 +10,7 @@ | |||
namespace ZendTest\Http; | |||
|
|||
use Zend\Http\Response; | |||
use Zend\Http\Headers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have all imports in alphabetical order.
Fixed formatting comments. |
Hmm ... the test seems flaky, failed under PHP 7 for the last build ... I cannot think of a way to make the test more stable except by increasing its runtime. (More data, increase ratio) Any ideas? |
@domoran flakiness can probably be avoided by repeating the test, but indeed, it's a time-based test, so it will always be a risky one. Make sure the first iteration is removed from measurement to avoid autolading and other possible performance issues there. |
test/ResponseTest.php
Outdated
@@ -176,6 +177,54 @@ public function testChunkedResponseCaseInsensitiveZF5438() | |||
$this->assertEquals('c0cc9d44790fa2a58078059bab1902a9', md5($res->getContent())); | |||
} | |||
|
|||
/** | |||
* @param number $chunksize the data size of the chunk to create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
, not number
test/ResponseTest.php
Outdated
$headers = file_get_contents(__DIR__ . '/_files/response_chunked_head'); | ||
$response->setHeaders(Headers::fromString($headers)); | ||
|
||
// *** craft a special 'worst case' response, where 1000 1 Byte chunks are followed by a 1 MB Chunk *** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the test method docblock
test/ResponseTest.php
Outdated
} | ||
|
||
/** | ||
* @small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why @small
?
@@ -0,0 +1,6 @@ | |||
Date: Sun, 25 Jun 2006 19:55:19 GMT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving this into the test method, via <<<'HEADER'
(and remove all bits that aren't strictly needed)
Sorry for the previous commits - previous review from me was never submitted. |
} | ||
|
||
/** | ||
* @small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's @small
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test fail with the previous code? (just asking, since can't try it on my machine atm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine the test does not fail at all. Neither with php7 nor php 5.6 ... small will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domoran I was asking if the test fails when removing the patch from the chunk decoding block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without the path the ratios are on my machine 2000-5000 under PHP7 and about 250 on PHP5.6 ... Test runs > 60 secs.
I can increase the ratio here to make the test pass, but there must be some other problem. I would not expect a ratio much larger than 1, after all this is only one iteration more than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, going up with those timings is not gonna help, so I suppose that a different approach needs to be taken.
Could you poke me on Friday? I can probably work on it while traveling.
*/ | ||
private function makeChunk($chunksize) | ||
{ | ||
$chunkdata = str_repeat('W', $chunksize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be inlined
I dont get it. There must be a logic error in the test or a problem on php7. How can one more regexp iteration inside the decodeChunkedBody be factor 30-50(!) slower than the baseline? This happens only on travis-ci and only with php 7 ... On my machine I cannot reproduce. Even with php7 I get a ratio of 3-5 ... But with php5.6 there is the expected runtime ratio of 1 ... Maybe an issue with the regexp implementation on PHP 7? Oh boy. Committing to Zend Framework is hard work. |
Nah, you just found an annoyingly hard edge case. Did you check whether you are running with xdebug or not? Try running tests with If we can help out, let us know, we can certainly test on other environments as well. PHP 7 has indeed changed many things in the regex engine, btw. |
…ent on large requests!) Instead: Use offset parameter on preg_match() to iterate efficiently over the raw body.
To test performance problems with copying data in memory around, we craft a special worst case packet. 1000 chunks of 1 byte followed by 1 chunk of 1MB. In case of the old code, the 1MB chunk will be copied around in memory a thousand times, leading to a large performance drop. We test that the worst case packet will be parsed in the same time as a "baseline" packet (without the large chunk at the end).
- Make test more stable by increasing expected performance ratio
Ratio for old version now > 200.
I've rebased and pushed back to your branch; let's see if the tests present the same issue... |
I have incorporated feedback in the merge.
Thanks, @domoron! We have updated the test suite recently, and those updates ensured your performance-based tests now pass reliably! Merged for next release. |
In one of our legacy projects we download a ~200MB file in memory using chunked encoding. Calling getBody() on this (which calls decodeChunkedBody()) will take > 5 minutes for processing the response() since creates very inefficient copies of the body with each chunk (substr).
To make this more efficient we can use the offset parameter, to avoid copying the body during iteration.
Tests still passed.