Load testing #7

rjackson · 2017-11-29T12:58:52Z

What's the typical traffic we currently receive (users/sec, insight from Google Analytics)
How many requests-per-second does that equate to?
What is our target per-pod traffic? Keeping this low for mediawiki minimises the impact of individual failures. Having it low for varnish will result in additional backend hits, however. 🤔
How much resources does a single pod handling that target traffic require? This will provide answers for resource limits, for efficient bin-packing during heavy load.
Minimum replicas across stack (redundant nodes + redundancy per node == 4?)
Maximum replicas? Order of magnitude over typical traffic? What do our historic traffic spikes look like?

The above will also provide the insight required to configure node autoscaling.

Once the above resource limits have been found out & set, we can load test the following scenarios, using Google Analytics to provide insight:

Average browsing
Major update browsing
Any other scenarios?

The text was updated successfully, but these errors were encountered:

rjackson · 2018-01-09T14:21:42Z

WIP comment; updating as per findings

What's the typical traffic we currently receive (users/sec, insight from Google Analytics)

From Google Analytics' "Audience" data over the last 2 years, 2016-01-09 to 2018-01-09, converting to per-timeframe equivalents:

Metric	Value	Per day	Per hour	Per minute	Per second
Sessions	34,556,894	47,338	1,972	32	0.54
Users	14,372,905	19,689	820	13	0.21
Page views	119,635,151	163,884	6,829	114	1.9
Pages/session	3.46
Avg. Session Duration	00:04:26

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic this average traffic in a load test, we would have to create a user which browses 3.5 pages every 4.5 minutes (1 page per 90 seconds):

class AverageUser(HttpLocust):
    """ Emulate an average user according to Google Analytics data collected between
        2016-01-09 and 2018-01-09 """

    task_set = Top10Pages

    # Average of 90 seconds, but include += 50% variance
    avg_wait = 90 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait

From the average session duration, we can also figure out how many simultaneous visitors the website serves: 32 sessions per minute * 4.5 minute average session duration = 144 simultaneous sessions.

How many requests-per-second does that equate to?

Running a load test (tfwiki/load-tests) which models this average user behaviour, operating 144 simultaneous users, with the load test also loading page resources (images, stylesheets, javascript) we see our typical traffic generates approximately 40 requests per second to the server.

This appears to be severely slowed down by the MediaWiki pod's handling images (we're seeing the pod I/O bound instead of CPU bound). It may be worth re-evaluating this without handling images to get a better idea of raw MediaWiki performance.

rjackson · 2018-01-10T13:46:50Z

What does a major update spike look like?

The Pyromania Update caused the largest single-day traffic spike in the Wiki's history on June 28, 2012. Let's create a "Major Update" traffic model based on traffic on that day:

Metric	Value	Per hour	Per minute	Per second
Sessions	447,249	18,635.38	310.59	5.18
Users	250,774	10,448.92	174.15	2.9
Page views	3,257,921	135,746.71	2,262.45	37.71
Pages/session	7.28
Avg. Session Duration	00:08:02

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic these users in a load test, we would have to create a user which browses 7.5 pages every 8 minutes (1 page per 64 seconds):

class PyromaniacUser(HttpLocust):
    """ Emulate Pyromania update users, according to behaviour observed on 2012-06-28 """

    task_set = PyromaniaTop10Pages

    # Average of 64 seconds, but include += 50% variance
    avg_wait = 64 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait

From the average session duration, we can also figure out how many simultaneous visitors the website served during this event: 311 sessions per minute * 8 minute average session duration = 2488 simultaneous sessions.

rjackson · 2018-01-10T20:14:48Z

Yeeaah, current set up with 4 Varnish instances (1 per server) currently handles a boat load of traffic perfectly fine – I've got it handles 10,000 simulated Pyromania users and its not breaking a sweat. So no need to worry about getting resource limits perfect yet, we can handle that down the line.

rjackson · 2018-09-02T17:03:59Z

Don't care about these reports any more. Live traffic never quite matched up due to the Wiki having a lot of pages, and thus a lot of uncached content when we went live.

So the numbers seemed impressive, but weren't realistic.

With typical traffic nowadays, 4 Mediawiki containers just about manage the non-Cached traffic. Clearing Varnish significantly increases load, and Kubernetes' horizontal pod autoscaler can be slow to update accordingly, leading to single-digit minutes worth of perceived downtime. In those scenarios, manually scaling MW up to 8 containers seems to typical traffic well enough (likely more than needed), and Kubernetes will auto scale back down to the minimum of 4 containers when Varnish has taken over the brunt of the impact.

rjackson self-assigned this Nov 29, 2017

rjackson added this to the MVP milestone Jan 10, 2018

rjackson removed this from the MVP milestone Jan 10, 2018

rjackson closed this as completed Sep 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load testing #7

Load testing #7

rjackson commented Nov 29, 2017 •

edited

Loading

rjackson commented Jan 9, 2018 •

edited

Loading

rjackson commented Jan 10, 2018

rjackson commented Jan 10, 2018

rjackson commented Sep 2, 2018

Load testing #7

Load testing #7

Comments

rjackson commented Nov 29, 2017 • edited Loading

rjackson commented Jan 9, 2018 • edited Loading

rjackson commented Jan 10, 2018

rjackson commented Jan 10, 2018

rjackson commented Sep 2, 2018

rjackson commented Nov 29, 2017 •

edited

Loading

rjackson commented Jan 9, 2018 •

edited

Loading