Performance Tool

Michael Behlendorf edited this page Jan 16, 2017 · 9 revisions


This wiki intends to document how to use the new benchmark tool for Voldemort. This tool should help potential users to get performance numbers and be able to judge whether Voldemort fits their requirement.

The tool currently supports running in either of the following two modes – remote and local. The local mode allows the user to do a pure storage engine test without incurring the network / routing cost. This mode would particularly be useful when new storage engines are plugged into Voldemort and need to be compared with existing ones. The remote mode allows the user to test against an existing running cluster of nodes with running Voldemort servers.

The execution of each of the above mentioned modes is broken up into 2 phases – warm-up and benchmark phase. During the warmup phase, we insert arbitrary records into the storage engine / cluster. The warm-up phase is optional and can be ignored if the cluster already has data. We can then run multiple iterations of benchmark phase wherein different mix of operations (delete, read, write, update, plugin operation) can be performed depending on the target application’s requirements.

Local Mode

Let us start by explaining options available for the local mode only

Option                            Description
------                            -----------
--storage-configuration-class     The class of the storage engine configuration to use.
--keyType                         The serialization format of the key (string, json-int, json-string, 
                                  identity). We generate arbitrary keys during the warm-up and 
                                  benchmark phase.

Remote Mode

The following commands allow us to run our tests against an existing store on a remote Voldemort cluster.

Option               Description
------               -----------
--url                The Voldemort server url (Example : tcp://
--store-name         The name of the store (Duh!).
--handshake          Performs some basic tests against the store to check if it is 
                     running and we have the correct key/value serialization format.

(Warning: the performance tool currently requires “string”-serialized values, unless the —plugin switch is used. Also, —handshake is not supported by 0.96.)


The following options are applicable to both remote and local mode.

a) Basic features

Option               Description
------               -----------
--ops-count <no>     The total number of operations (delete, read, write, update) 
                     to perform during the benchmark phase
--record-count <no>  The total number of records to insert during the warm-up phase. 
                     If we have the data pre-loaded in the cluster, we can set this 
                     value to 0 thereby ignoring the warm-up phase completely.
--iterations <no>    While the warm-up phase can be run atmost one time, the 
                     benchmark phase can be repeated multiple times. Default is 1
--threads <no>       This represents the number of client threads we use during the 
                     complete test.
--value-size <no>    The size of the value in bytes. We use this during the warm-up 
                     phase to generate random values and also during write operations 
                     of the benchmark phase. Default is set to 1024 bytes.

b) Operations

Option               Description
------               -----------
-d <percent>         execute delete operation [<percent> : 0 - 100]
-m <percent>         execute update (read+write) operation  [<percent> : 0 - 100]
-r <percent>         execute read operations [<percent> : 0 - 100]
-w <percent>         execute write operations [<percent> : 0 - 100]

The sum of all the above numbers () should be 100. This is the percentage of the —ops-count highlighted in (a). For example if the target application is read intensive (like photo tagging), we can set the -r 95 -m 5. If the application is write intensive, we can set -w 95 -r 5.

c) Record selection

Option               Description
------               -----------
--record-selection   Selection of the key during benchmark phase can follow a certain 
                     distribution. We support zipfian, latest (zipfian except that the 
                     most recently inserted records are in the head of the distribution) 
                     and uniform <default>.
--start-key-index    Index to start inserting from during the warm-up phase.
--request-file       This is a file containing a list of keys (one key per line) on which 
                     we would like to run the benchmark phase. The benchmark phase 
                     generates its ops-count number of operations from list of keys only. 
                     Setting this file overrides the --record-selection parameter.

For example, if we are using Voldemort for storing all the status updates of a social network, the record selection would match the ‘latest’ distribution since people tend to read the latest statuses. Also since these apps tend to be read heavy the parameters would be -r 95 -w 5 —record-selection latest

d) Monitoring and Results

Option               Description
------               -----------
--interval <sec>     Since both the warm-up and benchmark phase tend to be time 
                     consuming, we can get status information (number of operations 
                     completed and current throughput) at intervals of <sec> seconds.
--ignore-nulls       Certain reads may result in null values since they don't exist. We 
                     can neglect them from the final results calculation by setting 
-v                   Verbose - Most of the exceptions are curbed by the tool, unless 
                     -v is mentioned
--verify             Verify the read values - This works only when the warm-up phase 
                     was run by us. If the value is wrong we log it as an error
--metric-type        The final results at the end of benchmark can be dumped as a 
                     latency histogram or only summary (mean, median, 95th, 99th 
                     latency). In the histogram mode we also display the number of 
                     errors found during verification of reads (--verify). 
                     Options - [histogram | summary]

e) Extra

Option                    Description
------                    -----------
--percent-cached <no>     We can make some percentage of the requests to come from 
                          a previously requested key whose value would be 
                          cached in the tool itself. This can help in simulating applicat-
                          -ions with a caching layer. The value can range form 0-100 
                          [Default - 0 (no caching)]
--target-throughput <no>  This specifies the target number of operations per second. 
                          By default, the tool will try to do as many operations as it 
                          can. However, you can throttle the target number of operations 
                          per second. For example, to generate a latency versus 
                          throughput curve, you can try different target throughputs, and 
                          measure the resulting latency for each.
--prop-file               All the above mentioned properties can be mentioned in a single 
                          property file. The format of the file is 
                          "key1=value1 \n key2=value2 ..."

f) Plugins

Option                    Description
------                    -----------
--plugin-class            The name of the class which implements the "WorkloadPlugin"
                          interface. Providing an implementation allows you to take 
                          control of the operations during the benchmark phase and 
                          also the records inserted during warm-up phase.

This functionality was added to the tool to allow complex single store operations which go beyond read, write, update or delete. For example, we can now do a read, followed by manipulation of the value as one single operation.


No documentation is complete without an example showing the tool in action. For this example say we intend to benchmark Voldemort for storing profiles of users. A profile is roughly 10 KB and production throughput has been observed to be around 100 profiles/sec.

a) In the warm-up phase we’ll insert 500,000 profiles (—record-count 500000) with value size 10KB. ( \—value-size 10240 )
b) In the benchmark phase we’ll run a million operations ( \—ops-count 1000000 ). In this scenario we would expect a read heavy workload and a small proportion of updates (since profiles are rarely updated but read multiple times) (-r 90 \-m 10)
c) We want to find the latency figures while keeping the throughput fixed ( \—target-throughput 100)
d) And since this is run against a production cluster with an existing store … ( \—url tcp://prod:6666 —store-name profiles)

The overall command would be

./bin/ --record-count 500000
                                    --value-size 10240
                                    --ops-count 1000000
                                    --target-throughput 100
                                    --url tcp://prod:6666
                                    --store-name profiles
                                    -r 90 -m 10

The result would look as follows. Lines starting with \[reads\] are all figures related the 900000 (90% of million operations) read operation, while \[transactions\] are the ones related to updates.

[reads]	Operations: 900000
[reads]	Average(ms): 19.95947426067908
[reads]	Min(ms): 0
[reads]	Max(ms): 292
[reads]	Median(ms): 2
[reads]	95th(ms): 256
[reads]	99th(ms): 276
[transactions]	Operations: 100000
[transactions]	Average(ms): 21.689655172413794
[transactions]	Min(ms): 1
[transactions]	Max(ms): 312
[transactions]	Median(ms): 6
[transactions]	95th(ms): 48
[transactions]	99th(ms): 312

In Code

Some simulations tend to be feature heavy and it gets difficult to maintain the command line options. The solution to this is to incorporate the tool in other programs where we can change the parameters programmatically. This can also help if we intend to run multiple simulations.

The following example helps us draw a plot of median latency versus number of client threads (for read operations only):

Benchmark benchmark = new Benchmark();

// Write to file "latencyVsThreads.txt" which we can use to plot 
PrintStream outputStream = new PrintStream(new File("latencyVsThreads.txt")); 

Props workLoadProps = new Props();
workLoadProps.put("record-count", 1000000);  // Insert million records during warm-up phase
workLoadProps.put("ops-count", 1000000); // Run million operations during benchmark phase
workLoadProps.put("r", 95); // Read intensive program with 95% read ...
workLoadProps.put("w", 5); // ...and 5% writes ...
workLoadProps.put("record-selection", "uniform"); // ...with keys being selected using uniform distribution

// Run tool on Voldemort server running on localhost with store-name "read-intensive"
workLoadProps.put("url", "tcp://localhost:6666");
workLoadProps.put("store-name", "read-intensive");

// Initialize benchmark
benchmark.initializeStore(workLoadProps.with("threads", 100));

// Run the warm-up phase
long warmUpRunTime = benchmark.runTests(false);

// Change the number of client threads and capture the median latency
for(int threads = 10; threads <= 100; threads += 10) {
    benchmark.initializeStore(workLoadProps.with("threads", threads));

    long benchmarkRunTime = benchmark.runTests(true);
    HashMap<String, Results> resultsMap = Metrics.getInstance().getResults();
    if(resultsMap.containsKey(VoldemortWrapper.Operations.Read.getOpString())) {
	Results result = resultsMap.get(VoldemortWrapper.Operations.Read.getOpString());
	outputStream.println(VoldemortWrapper.Operations.Read.getOpString() + "\t" + String.valueOf(threads)
			     + "\t" + result.medianLatency);

// Close the benchmark


  • The tool currently supports working on a single store only. This may not be helpful in scenarios wherein the target application does multi-store transactions.