/
2011-plot-benchmark-results-matplotlib.html
286 lines (232 loc) · 10.3 KB
/
2011-plot-benchmark-results-matplotlib.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
---
title: 'Plot benchmark results with Matplotlib'
uuid: 94d04fda-8ec1-4177-b3d0-5cf8532eec22
attachments:
"HTTPS_to_HTTP,_nginx,_4_workers,_AES128-SHA1.pdf": example of plot
tags:
- outdated
- programming-python
---
In the past week, I ran a lot of benchmarks using a
[Spirent Avalanche][avalanche] which is an appliance providing
performance testing of network related products (a load balancer, a
router, a web server, ...). The reporting module does not provide a
lot of flexibility and the plots are not the most beautiful
ones. Fortunately, the results are also exported as CSV.
[Matplotlib][mplib] is a python plotting library which produces great
figures without making things hard when they should be easy. It is a
good replacement for [gnuplot][] and you don't need a lot of
Python knowledge to use it. The documentation includes a fine
[user's guide][users] that should get you started in less than 20
minutes.
!!! "Update (2017-04)" This article is quite old and covers an
outdated version of *Matplotlib*.
# Quick introduction
You can use *Matplotlib* from [IPython][ipython] to experiment:
::console
$ ipython -pylab
Python 2.6.7 (r267:88850, Jul 10 2011, 08:11:54)
Type "copyright", "credits" or "license" for more information.
Welcome to pylab, a matplotlib-based Python environment.
For more information, type 'help(pylab)'.
> In [1]: plot([1,1,4,5,10,11])
When you are ready, you can build a simple Python script:
::python2
#!/usr/bin/env python
from matplotlib.pylab import *
plot([1,1,4,5,10,11])
savefig("my-plot.pdf")
# Grabbing results from CSV
Data are contained into a file named `realtime.csv`. It contains the
raw data as well as some general information about the benchmark (test
name, description, parameters...). We need to skip them. Fortunately,
`csv2rec()` allows us to load data from a CSV file into a record array
and skip the first rows if necessary.
::python2
from matplotlib.pylab import *
import sys, os
import gzip
skip = 0
for line in gzip.open(sys.argv[1]):
if line.startswith("Seconds Elapsed,"):
break
skip = skip + 1
ava = csv2rec(gzip.open(sys.argv[1]),skiprows=skip)
Now, the "Elapsed seconds" column can be accessed with
`ava['elapsed_seconds']`.
# General structure
I need 4 plots:
* successful, unsuccessful and attempted transactions per second,
* minimum, average and maximum response time per page,
* Avalanche CPU usage,
* incoming and outgoing bandwidth.
The most important plot is the first one. The second one is less
important and the last ones are here only to check we did not hit some
bottleneck during the benchmark. We want to produce a page like this:
![General structure][s1]
[s1]: [[!!images/matplotlib/structure.png]] "General plot layout"
*Matplotlib* allows us to plot subfigures. We create 4 subfigures sharing
the same X axis (which the number of seconds elapsed since the
beginning of the benchmark). We save the result to PDF.
::python2
# Create the figure (A4 format)
figure(num=None, figsize=(8.27, 11.69), dpi=100)
ax1 = subplot2grid((4, 2), (0, 0), rowspan=2, colspan=2)
# […]
ax2 = subplot2grid((4, 2), (2, 0), colspan=2, sharex=ax1)
# […]
ax3 = subplot2grid((4, 2), (3, 0), sharex=ax1)
# […]
ax4 = subplot2grid((4, 2), (3, 1), sharex=ax1)
# […]
# Save to PDF
savefig("%s.pdf" % TITLE)
# Bandwidth plot
Let's start with the easiest plot: bandwidth usage.
::python2
# Plot 4: Bandwidth
ax4 = subplot2grid((4, 2), (3, 1), sharex=ax1)
plot(ava['seconds_elapsed'], ava['incoming_traffic_kbps']/1000.,
'b-', label='Incoming traffic')
plot(ava['seconds_elapsed'], -ava['outgoing_traffic_kbps']/1000.,
'r-', label='Outgoing traffic')
grid(True, which="both", linestyle="dotted")
ylabel("Mbps", fontsize=7)
xticks(fontsize=7)
yticks(fontsize=7)
Here, in the suplot positioned in `(3,1)`, we plot the number of
seconds elapsed versus the incoming traffic with a blue line (`b-`). We
also plot the seconds elapsed versus the outgoing traffic with a red
line (`r-`).
Here is the result:
![Bandwidth plot][s2]
[s2]: [[!!images/matplotlib/bandwidth.png]] "Bandwidth plot"
Most functions of *Matplotlib* are exposed as a method of the object
they refer to and as a global function. In the latest case, the
function is applied to the latest created figure or plotting area. For
example, `plot()` is called a function and therefore refer to the
plotting area `ax4`. We could have written `ax4.plot()` instead.
# CPU plot
The average CPU utilization data available in the CSV file needs to be
normalized. We assume the Avalanche to be mostly idle on start. The
plotting part is pretty similar to our previous case.
::python2
# CPU
max = np.max(ava['average_cpu_utilization'])
order = 10**np.floor(np.log10(max))
max = np.ceil(max/order)*order
cpu = (max - ava['average_cpu_utilization'])*100/max
# Plot 3: CPU
ax3 = subplot2grid((4, 2), (3, 0), sharex=ax1)
plot(ava['seconds_elapsed'],
cpu,
'r-', label="Avalanche CPU")
grid(True, which="both", linestyle="dotted")
ylabel("Avalanche CPU%", fontsize=7)
xticks(fontsize=7)
yticks(fontsize=7)
# Response time
We would like to plot response time. We have three corresponding
metrics: minimum response time, average response time, maximum
response time. Because these metrics start from 0 ms up to several
seconds, a logarithmic scale is used:
::python2
# Plot 2: response time
ax2 = subplot2grid((4, 2), (2, 0), colspan=2, sharex=ax1)
plot(ava['seconds_elapsed'], ava['minimum_response_time_per_page_msec'],
'b-', label="Minimum response time")
plot(ava['seconds_elapsed'], ava['maximum_response_time_per_page_msec'],
'r-', label="Maximum response time")
plot(ava['seconds_elapsed'], ava['average_response_time_per_page_msec'],
'g-', linewidth=2, label="Average response time")
legend(loc='upper left', fancybox=True, shadow=True, prop=dict(size=8))
grid(True, which="major", linestyle="dotted")
yscale("log")
ylabel("Response time (msec)", fontsize=9)
xticks(fontsize=9)
yticks(fontsize=9)
This is also the first graphic with a legend.
![Response time plot][s3]
[s3]: [[!!images/matplotlib/response.png]] "Response time plot"
# Transactions per second
The most important plot is the number of transactions per second.
::python2
# Plot 1: TPS
ax1 = subplot2grid((4, 2), (0, 0), rowspan=2, colspan=2)
plot(ava['seconds_elapsed'], ava['desired_load_transactionssec'],
'-', color='0.7', label="Desired Load")
plot(ava['seconds_elapsed'], ava['successful_transactionssecond'],
'g:', label="Successful")
plot(ava['seconds_elapsed'], smooth(ava['successful_transactionssecond']),
'g-', linewidth=2)
plot(ava['seconds_elapsed'], ava['attempted_transactionssecond'],
'b-', label="Attempted")
plot(ava['seconds_elapsed'], ava['aborted_transactionssecond'],
'k-', label="Aborted")
plot(ava['seconds_elapsed'][:-1], ava['unsuccessful_transactionssecond'][:-1],
'r-', label="Unsuccessful")
legend(loc='upper left', fancybox=True, shadow=True, prop=dict(size=10))
grid(True, which="both", linestyle="dotted")
ylabel("Transactions/s")
The number of successful transactions is plotted twice: when the
benchmarked equipment becomes overloaded, we get a lot of noise in
this metric and it can be difficult to read. Therefore, we plot a
smoothed version with the help of [NumPy][numpy]:
::python2
import numpy as np
def smooth(x, win=4):
s = np.r_[x[win-1:0:-1],x,x[-1:-win:-1]]
w = np.ones(win, 'd')
y = np.convolve(w/w.sum(),s,mode='valid')
return y[(win-1)/2:-(win-1)/2]
`np.r_()` is just here to extend our data by the size of the
window. `np.ones()` build a weight vector of the size of the
window. If the window is 4, we get `[0.25, 0.25, 0.25, 0.25]`. We use
this vector to apply a [convolution][convolution] to the original
data. Here is the result:
![Transactions plot][s4]
[s4]: [[!!images/matplotlib/tps.png]] "TPS plot"
The original data is a green dotted line while the smoothed one is a
green thick line. What about the three annotations? *Matplotlib*
allows us to put annotations on a figure. Here is how this is done:
::python2
# Noticeable points
count = 0
def highlight(index, reason):
global count
if index and index > 0:
x,y = (ava['seconds_elapsed'][index],
smooth(ava['successful_transactionssecond'])[index])
plot([x], [y], 'ko')
annotate('%d TPS\n(%s)' % (y,reason), xy=(x,y),
xytext=(20, -(count+4.7)*22), textcoords='axes points',
arrowprops=dict(arrowstyle="-",
connectionstyle="angle,angleA=0,angleB=80,rad=10"),
horizontalalignment='left',
verticalalignment='bottom',
fontsize=8)
count = count + 1
highlight(np.argmax(smooth(ava['successful_transactionssecond'])), "Max TPS")
highlight(np.argmax(cpu > 99), "CPU>99%")
highlight(np.argmax(ava['average_response_time_per_page_msec'] > 500), ">500ms")
highlight(np.argmax(ava['average_response_time_per_page_msec'] > 100), ">100ms")
`np.argmax()` returns the index of the first maximum value. The trick
here is that when I write `ava['average_response_time_per_page_msec'] > 100`,
I get an array with 1 when the value is more than 100 and 0
otherwise. Therefore, `np.argmax()` will return the first index where
the value is superior to 100 ms.
The `highlight()` function will add a point (`plot([x], [y], 'ko')`)
on the smoothed successful transactions par second plot and add an
annotation with some fancy arrow.
Look at this [benchmark of nginx as TLS termination][pdf] for a
complete output of this script.
*[CSV]: Comma-separated values
*[PDF]: Portable Document File
[pdf]: [[!!files/HTTPS_to_HTTP,_nginx,_4_workers,_AES128-SHA1.pdf]]
[numpy]: https://numpy.org/
[mplib]: https://matplotlib.org/
[users]: https://matplotlib.org/users/index.html
[gnuplot]: http://www.gnuplot.info/
[ipython]: https://ipython.org/
[avalanche]: https://web.archive.org/web/2011/https://www.spirent.com/products/avalanche.aspx
[convolution]: https://en.wikipedia.org/wiki/Convolution