Add documentation about sorting param #1419

polarathene · 2019-06-12T13:55:13Z

Latency Top 10:

Language (Runtime)	Framework (Middleware)	Average	50th percentile	90th percentile	99th percentile	99.9th percentile	Standard deviation
rust (1.35)	nickel (0.11)	0.07 ms	0.06 ms	0.10 ms	0.12 ms	0.99 ms	24.33
ruby (2.6)	roda (3.2)	3.50 ms	0.17 ms	13.67 ms	32.96 ms	82.73 ms	7372.33
ruby (2.6)	rack-routing (0.0)	4.66 ms	0.22 ms	18.59 ms	41.10 ms	107.99 ms	9556.00
rust (1.35)	iron (0.6)	0.36 ms	0.36 ms	0.60 ms	0.87 ms	11.31 ms	205.00
php (7.3)	symfony (4.3)	93.23 ms	0.37 ms	216.72 ms	1956.34 ms	6787.76 ms	368518.33
php (7.3)	laravel (5.8)	154.07 ms	0.39 ms	330.01 ms	3101.21 ms	6937.61 ms	549852.00
php (7.3)	slim (3.12)	127.39 ms	0.39 ms	266.58 ms	2402.54 ms	6879.72 ms	458913.00
php (7.3)	zend-expressive (3.2)	170.55 ms	0.39 ms	304.31 ms	3452.25 ms	6962.37 ms	626142.00
ruby (2.6)	flame (4.18)	7.22 ms	0.39 ms	25.05 ms	52.38 ms	138.96 ms	12378.00
php (7.3)	lumen (5.8)	141.59 ms	0.39 ms	279.21 ms	3045.51 ms	6945.63 ms	525800.67

The only column that seems to define the rank is the 50th Percentile? Could the README mention that's how the ranking is being done?

iron definitely seems ahead of rack-routing. flame looks like it should be ahead of the 3 PHP results above it, they're all tying on 0.39ms 50th percentile, but flame's standard deviation score should give it the upper hand here. There are other cases like this further down the chart, like agoo-c vs rocket:

Language (Runtime)	Framework (Middleware)	Average	50th percentile	90th percentile	99th percentile	99.9th percentile	Standard deviation
rust (nightly)	rocket (0.4)	106.24 ms	1.58 ms	71.40 ms	2257.22 ms	4945.40 ms	441283.00
c (11)	agoo-c (0.5)	2.96 ms	1.90 ms	6.70 ms	14.38 ms	105.26 ms	3145.33

Here there is a slight lead by the 50th percentile by rocket, but agoo-c seems better overall?

Would it be ok to use a 2nd value with some weighting to get a better ranking? I'm not stats savvy, but looking over the table, if you weighted 50th percentile by 90% to 10% of the Average you'd get an order that seems more representative of the performance, rather than those that lose consistency and skew quite poorly over that half way mark.

The text was updated successfully, but these errors were encountered:

waghanza · 2019-06-12T18:14:52Z

Hi @polarathene,

You're right.

The rank (if we can say so) is computed from the 50th percentile.

The main idea was to be as closest as we can to decide which language/framework to use.

As my background is aws / eu-west-1, the figure of 50th percentile seems to reflect the real performances ;-)

BTW, any idea / recommendation is ❤️

PS : I will edit the title to reflect main idea -> the sorting param (50th percentile) SHOULD be reflected on the README

polarathene · 2019-06-13T04:25:18Z

As my background is aws / eu-west-1, the figure of 50th percentile seems to reflect the real performances ;-)

I was just referring to the table results listed, not in relation to performance/experiences elsewhere.

As noted above with iron, flame and agoo-c, just basing on the median value(50th percentile column) does not seem to ideally represent how they should be ranked/sorted regarding performance?

It's true that as the only metric, the top 50% of responses have better latency, but the 2nd half of the results tell a very different story. I think consistency/stability of the low latency throughout should have some weight towards the scoring. The standard deviation shows how some of the current positions are getting away with having half of their responses(slower) ignored.

I have asked a statistics community for their input on a proper way to improve the scoring/ranking and I will let you know if they make any suggestions. For now, assigning a small bit of weighting to the Average(mean) value seems to cause no negative impact, but leads to much more ideal ranking of the results.

Framework (Middleware)	Average	50th percentile	Weighted Score
roda (3.2)	3.50 ms	0.17 ms	0.50 ms
rack-routing (0.0)	4.66 ms	0.22 ms	0.66 ms
iron (0.6)	0.36 ms	0.36 ms	0.36 ms
rocket (0.4)	106.24 ms	1.58 ms	12.05 ms
agoo-c (0.5)	2.96 ms	1.90 ms	2.01 ms

Where Weighted Score is 90% of the median(50th Percentile) + 10% of the mean(Average), rounded to nearest 10th of a ms(2 decimal places). And now if we rank by the Weighted Score value instead:

Framework (Middleware)	Average	50th percentile	Weighted Score
iron (0.6)	0.36 ms	0.36 ms	0.36 ms
roda (3.2)	3.50 ms	0.17 ms	0.50 ms
rack-routing (0.0)	4.66 ms	0.22 ms	0.66 ms
agoo-c (0.5)	2.96 ms	1.90 ms	2.01 ms
rocket (0.4)	106.24 ms	1.58 ms	12.05 ms

The ranking seems to better represent performance by giving a small bit of weight to the last half of latency results.

I did not include flame vs the PHP frameworks as that one should be self-explanatory, if only ranking by the median, you should have a 2ndry sorting factor too for when there is ties.

waghanza · 2019-06-13T07:13:03Z

I have asked a statistics community for their input on a proper way to improve the scoring/ranking and I will let you know if they make any suggestions

🎉 Thanks for this

However, be aware that the results are not very accurate. I mean, all of this are actually running on a local docker and docker mess-up the results. After some documentation PR and #1011, I will work on #632 so as the results are not messed-up anymore.

polarathene · 2019-06-13T07:49:35Z

However, be aware that the results are not very accurate. I mean, all of this are actually running on a local docker and docker mess-up the results.

Yes I understand, there is a clear warning up top on the README pointing that out. But that would not change anything regarding how results are sorted.

I do understand that the actual results themselves are not stable presently as evident with the test result history in past commits varying widely. I am just interested in more accurately representing how well a framework has performed based on the given data.

The weighted score suggestion above, seems to work well?

Off-topic to the issue:

running on a local docker and docker mess-up the results.

I don't quite follow how Docker messes up results here. Is it because of the different base images? Docker if anything should be a useful tool to get consistency. On bare metal, you're dealing with the distro environment and it's own package manager, not all distros are the same, there are many other factors that can impact results.

Users systems likewise aren't likely to be at parity with where you run the tests. But the results provide some insight, and the user can then verify on their own systems if the results are similar(easier to do with using the same Docker images, followed by adapting to their own needs/environment after confirmation).

On bare metal, you can do some things to better ensure consistency, pinning CPU cores to the processes involved(Docker would again be useful here afaik), where you can also isolate the CPU cores so that nothing else on the system is permitted to use those cores for processing.

Once you involve a network externally, that's a different variable that you might not have much control over and lack any consistency with. It's useful information to include and can still be achieved with Docker, the quality of the network is going to vary for users though, just like other parts of the environment, so local tests are still useful imo. You can configure a network that has characteristics of what you'd get from an external network being involved too.

waghanza · 2019-06-13T12:30:16Z

Feel free to suggest any idea about how to rank/sort ❤️

I have taken ideas from #670 and #223, but no preference for me 😛

running on a local docker and docker mess-up the results.

I mean, that metrics are computed in a way that prevent any framework to be push on its performance limits :

because docker engine is here (adding flexibility but decrease raw performance)
because local network
because parallelization, the sieger (wrk) targets multiple hosts at once, instead of one per one

My bad, this is NOT only about docker BUT more about local docker

waghanza changed the title ~~Rank position seems off~~ Add documentation about sorting param Jun 12, 2019

waghanza mentioned this issue Jul 31, 2019

Tasks before stabilization #1618

Closed

waghanza closed this as completed Dec 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation about sorting param #1419

Add documentation about sorting param #1419

polarathene commented Jun 12, 2019

waghanza commented Jun 12, 2019

polarathene commented Jun 13, 2019

waghanza commented Jun 13, 2019

polarathene commented Jun 13, 2019

waghanza commented Jun 13, 2019

Add documentation about sorting param #1419

Add documentation about sorting param #1419

Comments

polarathene commented Jun 12, 2019

waghanza commented Jun 12, 2019

polarathene commented Jun 13, 2019

waghanza commented Jun 13, 2019

polarathene commented Jun 13, 2019

waghanza commented Jun 13, 2019