Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Very inefficient decoding of large chunked messages! #112

Closed
wants to merge 9 commits into from

Conversation

domoran
Copy link
Contributor

@domoran domoran commented Feb 7, 2017

In one of our legacy projects we download a ~200MB file in memory using chunked encoding. Calling getBody() on this (which calls decodeChunkedBody()) will take > 5 minutes for processing the response() since creates very inefficient copies of the body with each chunk (substr).

To make this more efficient we can use the offset parameter, to avoid copying the body during iteration.

Tests still passed.

@Ocramius
Copy link
Member

Ocramius commented Feb 7, 2017

Requires a performance test to prevent regressions. Unsure how to achieve that besides using relative timing of the regex applied to some data, and a large chunk of text passed to this method.

@domoran
Copy link
Contributor Author

domoran commented Feb 7, 2017

I added a performance test, by crafting a "worst case" packet and measuring relative timing.

Verified that test fails on old version:
git checkout e89dac7 src/Response.php

Edit: Fixed the formatting.

src/Response.php Outdated

while (true) {
if (! preg_match("/^([\da-fA-F]+)[^\r\n]*\r\n/sm", $body, $m, 0, $offset)) {
if (! empty(trim(substr($body, $offset)))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! empty seems to be useless here

src/Response.php Outdated
if (! empty(trim(substr($body, $offset)))) {
// Message was not consumed completely!
throw new Exception\RuntimeException(
"Error parsing body - doesn't seem to be a chunked message"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use single quotes around the error message string above?

src/Response.php Outdated
throw new Exception\RuntimeException(
"Error parsing body - doesn't seem to be a chunked message"
);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need else statement here.

@@ -176,6 +177,54 @@ public function testChunkedResponseCaseInsensitiveZF5438()
$this->assertEquals('c0cc9d44790fa2a58078059bab1902a9', md5($res->getContent()));
}

/**
* @param number $chunksize the data size of the chunk to create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use int param type instead of number


/**
* @param Response $response
* @return the time that calling the getBody function took on the response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add return type.

private function makeChunk($chunksize)
{
$chunkdata = str_repeat("W", $chunksize);
return "$chunksize\r\n$chunkdata\r\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure here, but maybe sprintf and/or PHP_EOL?

*/
private function getTimeForGetBody(Response $response)
{
$time_start = microtime(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use camelCase variable names.

$time2 = $this->getTimeForGetBody($response);

// Make sure that the worst case packet will have an equal timing as the baseline
$errMsg = "Chunked response is not parsing large packets efficiently: " . ($time2 / $time1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use sprintf there and single quotes.


// Make sure that the worst case packet will have an equal timing as the baseline
$errMsg = "Chunked response is not parsing large packets efficiently: " . ($time2 / $time1);
$this->assertTrue(2 > ($time2 / $time1), $errMsg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use here assertLessThan?
Brackets around $time2 / $time1 are useless.

@@ -10,6 +10,7 @@
namespace ZendTest\Http;

use Zend\Http\Response;
use Zend\Http\Headers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have all imports in alphabetical order.

@domoran
Copy link
Contributor Author

domoran commented Feb 7, 2017

Fixed formatting comments.

@domoran
Copy link
Contributor Author

domoran commented Feb 7, 2017

Hmm ... the test seems flaky, failed under PHP 7 for the last build ... I cannot think of a way to make the test more stable except by increasing its runtime. (More data, increase ratio) Any ideas?

@Ocramius
Copy link
Member

Ocramius commented Feb 8, 2017

@domoran flakiness can probably be avoided by repeating the test, but indeed, it's a time-based test, so it will always be a risky one. Make sure the first iteration is removed from measurement to avoid autolading and other possible performance issues there.

@@ -176,6 +177,54 @@ public function testChunkedResponseCaseInsensitiveZF5438()
$this->assertEquals('c0cc9d44790fa2a58078059bab1902a9', md5($res->getContent()));
}

/**
* @param number $chunksize the data size of the chunk to create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int, not number

$headers = file_get_contents(__DIR__ . '/_files/response_chunked_head');
$response->setHeaders(Headers::fromString($headers));

// *** craft a special 'worst case' response, where 1000 1 Byte chunks are followed by a 1 MB Chunk ***
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the test method docblock

}

/**
* @small
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why @small?

@@ -0,0 +1,6 @@
Date: Sun, 25 Jun 2006 19:55:19 GMT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this into the test method, via <<<'HEADER' (and remove all bits that aren't strictly needed)

@Ocramius
Copy link
Member

Ocramius commented Feb 8, 2017

Sorry for the previous commits - previous review from me was never submitted.

}

/**
* @small
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's @small here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test fail with the previous code? (just asking, since can't try it on my machine atm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine the test does not fail at all. Neither with php7 nor php 5.6 ... small will be removed.

Copy link
Member

@Ocramius Ocramius Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domoran I was asking if the test fails when removing the patch from the chunk decoding block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without the path the ratios are on my machine 2000-5000 under PHP7 and about 250 on PHP5.6 ... Test runs > 60 secs.

I can increase the ratio here to make the test pass, but there must be some other problem. I would not expect a ratio much larger than 1, after all this is only one iteration more than before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, going up with those timings is not gonna help, so I suppose that a different approach needs to be taken.

Could you poke me on Friday? I can probably work on it while traveling.

*/
private function makeChunk($chunksize)
{
$chunkdata = str_repeat('W', $chunksize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be inlined

@domoran
Copy link
Contributor Author

domoran commented Feb 8, 2017

I dont get it. There must be a logic error in the test or a problem on php7.

How can one more regexp iteration inside the decodeChunkedBody be factor 30-50(!) slower than the baseline? This happens only on travis-ci and only with php 7 ...

On my machine I cannot reproduce. Even with php7 I get a ratio of 3-5 ... But with php5.6 there is the expected runtime ratio of 1 ... Maybe an issue with the regexp implementation on PHP 7?

Oh boy. Committing to Zend Framework is hard work.

@Ocramius
Copy link
Member

Ocramius commented Feb 8, 2017

Oh boy. Committing to Zend Framework is hard work.

Nah, you just found an annoyingly hard edge case.

Did you check whether you are running with xdebug or not? Try running tests with php -n (will probably fail on some curl-based scenarios).

If we can help out, let us know, we can certainly test on other environments as well.

PHP 7 has indeed changed many things in the regex engine, btw.

…ent on large requests!)

Instead: Use offset parameter on preg_match() to iterate efficiently over the raw body.
To test performance problems with copying data in memory around, we craft a special worst case packet. 1000 chunks of 1 byte followed by 1 chunk of 1MB. In case of the old code, the 1MB chunk will be copied around in memory a thousand times, leading to a large performance drop.

We test that the worst case packet will be parsed in the same time as a "baseline" packet (without the large chunk at the end).
- Make test more stable by increasing expected performance ratio
@weierophinney
Copy link
Member

I've rebased and pushed back to your branch; let's see if the tests present the same issue...

@weierophinney weierophinney dismissed Ocramius’s stale review April 26, 2018 15:51

I have incorporated feedback in the merge.

weierophinney added a commit that referenced this pull request Apr 26, 2018
@weierophinney
Copy link
Member

Thanks, @domoron! We have updated the test suite recently, and those updates ensured your performance-based tests now pass reliably! Merged for next release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants