Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to understand the `Latencies` in the report? #278

Closed
cj1128 opened this issue Mar 12, 2018 · 9 comments
Closed

How to understand the `Latencies` in the report? #278

cj1128 opened this issue Mar 12, 2018 · 9 comments

Comments

@cj1128
Copy link

@cj1128 cj1128 commented Mar 12, 2018

  1. What version of the project are you using?

    go get from github repo, commit hash: f30b7da.

  2. What operating system and processor architecture are you using?

    Mac OS, 10.13.3.

  3. What did you do?

    setup a simple server and use vegeta to load test it.

    package main
    
    import "net/http"
    
    func main() {
        http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    	    w.Write([]byte("hello world"))
        })
        http.ListenAndServe(":8080", nil)
    }

    test command:

    echo "GET http://localhost:8080" | vegeta attack -rate RATE -duration=5s | vegeta report
  4. What did you expect to see?

    the bigger the rate, the bigger the latencies.

  5. What did you see instead?

    when the rate is 100:

    Latencies     [mean, 50, 95, 99, max]  1.6292ms, 1.588029ms, 2.887475ms, 3.105071ms, 3.886444ms
    

    when the rate is 1000:

    Latencies     [mean, 50, 95, 99, max]  431.97µs, 418.454µs, 586.022µs, 774.319µs, 4.518077ms
    

I think the latency means the time interval betweens request sent and response received, so when the request rate is bigger, the server has more work to do, the latency should be bigger, but the result is opposite.

So, how should I understand the latencies?

@tsenart

This comment has been minimized.

Copy link
Owner

@tsenart tsenart commented May 20, 2018

I think the latency means the time interval betweens request sent and response received, so when the request rate is bigger, the server has more work to do, the latency should be bigger, but the result is opposite.

Oh the wonderful world of micro-performance analysis :-) I can reproduce your results and I'm led to believe there's more to this than at first meets the eye. Without any proof whatsoever, I have the feeling we're observing some sort of caching phenomena that influences these results.

To be continued... I don't have a good answer to this yet.

@tsenart

This comment has been minimized.

Copy link
Owner

@tsenart tsenart commented May 24, 2018

This happens on Linux too. Raw results files saved and uploaded here for future debugging.
results.tar.gz

$ echo "GET http://localhost:8080" | vegeta attack -rate=100 -duration=5s | tee 100.bin | vegeta report
Requests      [total, rate]            500, 100.20
Duration      [total, attack, wait]    4.99074455s, 4.989999756s, 744.794µs
Latencies     [mean, 50, 95, 99, max]  591.146µs, 586.039µs, 738.259µs, 960.467µs, 3.644614ms
Bytes In      [total, mean]            5500, 11.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:500  
Error Set:

$ echo "GET http://localhost:8080" | vegeta attack -rate=1000 -duration=5s | tee 1000.bin | vegeta report
Requests      [total, rate]            5000, 1000.20
Duration      [total, attack, wait]    4.999451257s, 4.998999759s, 451.498µs
Latencies     [mean, 50, 95, 99, max]  422.181µs, 419.69µs, 490.308µs, 543.746µs, 5.437834ms
Bytes In      [total, mean]            55000, 11.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:5000  
Error Set:

$ echo "GET http://localhost:8080" | vegeta attack -rate=5000 -duration=5s | tee 5000.bin | vegeta report
Requests      [total, rate]            25000, 5000.20
Duration      [total, attack, wait]    5.000013276s, 4.999799679s, 213.597µs
Latencies     [mean, 50, 95, 99, max]  217.194µs, 207.483µs, 236.087µs, 544.931µs, 6.657334ms
Bytes In      [total, mean]            275000, 11.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:25000  
Error Set:

$ echo "GET http://localhost:8080" | vegeta attack -rate=10000 -duration=5s | tee 10000.bin | vegeta report
Requests      [total, rate]            50000, 10000.20
Duration      [total, attack, wait]    5.000069802s, 4.999899817s, 169.985µs
Latencies     [mean, 50, 95, 99, max]  187.518µs, 168.72µs, 223.909µs, 788.912µs, 6.745903ms
Bytes In      [total, mean]            550000, 11.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:50000  
Error Set:

$ uname -a
Linux 4.16.10-1-ARCH #1 SMP PREEMPT Mon May 21 07:57:23 UTC 2018 x86_64 GNU/Linux

$ vegeta -version
Version: 7.0.3
Commit: f5e8a53b900b1509b8719aa94c30d4cf087a77e9
Runtime: go1.10.2 linux/amd64
Date: 2018-05-23T15:09:17Z"
@tsenart

This comment has been minimized.

Copy link
Owner

@tsenart tsenart commented May 24, 2018

@tsenart

This comment has been minimized.

Copy link
Owner

@tsenart tsenart commented May 25, 2018

I've modified the plot reporter to overlay multiple attacks' results with relative time offset. The visualisation matches the text report. Download the archive for interactive analysis on the browser. plot.tar.gz

vegeta-plot

@seebs

This comment has been minimized.

Copy link

@seebs seebs commented Oct 23, 2018

So, CPU frequency changes. Hypothesis: If CPU is "bored" (in a lower-power state and running at a lower clock speed), it's not necessarily going to ramp up to a higher speed immediately upon getting work. If it's consistently handling all incoming requests and ending up with idle time, it might well stay in a lower-power state. It'd be nice to graph this against CPU frequency values, but my guess is that there's a pattern of the CPU spinning up once it gets close to saturated, which means that it will process requests faster when busy.

@dgryski

This comment has been minimized.

Copy link

@dgryski dgryski commented Oct 23, 2018

So disabling CPU throttling and rerunning the test should give different answers.

@tsenart

This comment has been minimized.

Copy link
Owner

@tsenart tsenart commented Oct 30, 2018

@fate-lovely: The latest experiments in https://gist.github.com/tsenart/e90f11a47b84d8073b5254176f23e7f1 indicate that this is related to either dynamic CPU clock frequency, CPU caching effects or a combination of both.

Load testing your server locally will reveal these effects if your server is doing nothing that takes too long.

@tsenart tsenart closed this Oct 30, 2018
@cj1128

This comment has been minimized.

Copy link
Author

@cj1128 cj1128 commented Oct 31, 2018

@tsenart This makes sense, thank you!

@seebs

This comment has been minimized.

Copy link

@seebs seebs commented Oct 31, 2018

FWIW, I did some more testing using a variant: a program with long sleeps, and another program with different code which was running rapidly. the first program's reported latencies dropped sharply when the second program showed up, suggesting CPU frequency rather than cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.