Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load testing #7

Closed
4 of 8 tasks
rjackson opened this issue Nov 29, 2017 · 4 comments
Closed
4 of 8 tasks

Load testing #7

rjackson opened this issue Nov 29, 2017 · 4 comments
Assignees

Comments

@rjackson
Copy link
Member

rjackson commented Nov 29, 2017

  • What's the typical traffic we currently receive (users/sec, insight from Google Analytics)
  • How many requests-per-second does that equate to?
  • What is our target per-pod traffic? Keeping this low for mediawiki minimises the impact of individual failures. Having it low for varnish will result in additional backend hits, however. 🤔
  • How much resources does a single pod handling that target traffic require? This will provide answers for resource limits, for efficient bin-packing during heavy load.
  • Minimum replicas across stack (redundant nodes + redundancy per node == 4?)
  • Maximum replicas? Order of magnitude over typical traffic? What do our historic traffic spikes look like?

The above will also provide the insight required to configure node autoscaling.

Once the above resource limits have been found out & set, we can load test the following scenarios, using Google Analytics to provide insight:

@rjackson rjackson self-assigned this Nov 29, 2017
@rjackson
Copy link
Member Author

rjackson commented Jan 9, 2018

WIP comment; updating as per findings

What's the typical traffic we currently receive (users/sec, insight from Google Analytics)

From Google Analytics' "Audience" data over the last 2 years, 2016-01-09 to 2018-01-09, converting to per-timeframe equivalents:

Metric Value Per day Per hour Per minute Per second
Sessions 34,556,894 47,338 1,972 32 0.54
Users 14,372,905 19,689 820 13 0.21
Page views 119,635,151 163,884 6,829 114 1.9
Pages/session 3.46
Avg. Session Duration 00:04:26

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic this average traffic in a load test, we would have to create a user which browses 3.5 pages every 4.5 minutes (1 page per 90 seconds):

class AverageUser(HttpLocust):
    """ Emulate an average user according to Google Analytics data collected between
        2016-01-09 and 2018-01-09 """

    task_set = Top10Pages

    # Average of 90 seconds, but include += 50% variance
    avg_wait = 90 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait
    

From the average session duration, we can also figure out how many simultaneous visitors the website serves: 32 sessions per minute * 4.5 minute average session duration = 144 simultaneous sessions.

How many requests-per-second does that equate to?

Running a load test (tfwiki/load-tests) which models this average user behaviour, operating 144 simultaneous users, with the load test also loading page resources (images, stylesheets, javascript) we see our typical traffic generates approximately 40 requests per second to the server.

This appears to be severely slowed down by the MediaWiki pod's handling images (we're seeing the pod I/O bound instead of CPU bound). It may be worth re-evaluating this without handling images to get a better idea of raw MediaWiki performance.

@rjackson rjackson added this to the MVP milestone Jan 10, 2018
@rjackson
Copy link
Member Author

What does a major update spike look like?

The Pyromania Update caused the largest single-day traffic spike in the Wiki's history on June 28, 2012. Let's create a "Major Update" traffic model based on traffic on that day:

Metric Value Per hour Per minute Per second
Sessions 447,249 18,635.38 310.59 5.18
Users 250,774 10,448.92 174.15 2.9
Page views 3,257,921 135,746.71 2,262.45 37.71
Pages/session 7.28
Avg. Session Duration 00:08:02

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic these users in a load test, we would have to create a user which browses 7.5 pages every 8 minutes (1 page per 64 seconds):

class PyromaniacUser(HttpLocust):
    """ Emulate Pyromania update users, according to behaviour observed on 2012-06-28 """

    task_set = PyromaniaTop10Pages

    # Average of 64 seconds, but include += 50% variance
    avg_wait = 64 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait

From the average session duration, we can also figure out how many simultaneous visitors the website served during this event: 311 sessions per minute * 8 minute average session duration = 2488 simultaneous sessions.

@rjackson
Copy link
Member Author

Yeeaah, current set up with 4 Varnish instances (1 per server) currently handles a boat load of traffic perfectly fine – I've got it handles 10,000 simulated Pyromania users and its not breaking a sweat. So no need to worry about getting resource limits perfect yet, we can handle that down the line.

127 0 0 1_58906_

@rjackson rjackson removed this from the MVP milestone Jan 10, 2018
@rjackson
Copy link
Member Author

rjackson commented Sep 2, 2018

Don't care about these reports any more. Live traffic never quite matched up due to the Wiki having a lot of pages, and thus a lot of uncached content when we went live.

So the numbers seemed impressive, but weren't realistic.

With typical traffic nowadays, 4 Mediawiki containers just about manage the non-Cached traffic. Clearing Varnish significantly increases load, and Kubernetes' horizontal pod autoscaler can be slow to update accordingly, leading to single-digit minutes worth of perceived downtime. In those scenarios, manually scaling MW up to 8 containers seems to typical traffic well enough (likely more than needed), and Kubernetes will auto scale back down to the minimum of 4 containers when Varnish has taken over the brunt of the impact.

@rjackson rjackson closed this as completed Sep 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant