Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Performance Tool

afeinberg edited this page · 8 revisions

Introduction

This wiki intends to document how to use the new benchmark tool for Voldemort. This tool should help potential users to get performance numbers and be able to judge whether Voldemort fits their requirement.

The tool currently supports running in either of the following two modes – remote and local. The local mode allows the user to do a pure storage engine test without incurring the network / routing cost. This mode would particularly be useful when new storage engines are plugged into Voldemort and need to be compared with existing ones. The remote mode allows the user to test against an existing running cluster of nodes with running Voldemort servers.

The execution of each of the above mentioned modes is broken up into 2 phases – warm-up and benchmark phase. During the warmup phase, we insert arbitrary records into the storage engine / cluster. The warm-up phase is optional and can be ignored if the cluster already has data. We can then run multiple iterations of benchmark phase wherein different mix of operations (delete, read, write, update, plugin operation) can be performed depending on the target application’s requirements.

Local Mode

Let us start by explaining options available for the local mode only


Option               Description
------               -----------
--storage-engine     The storage engine which we would like to benchmark ( mysql, bdb, memory).
--keyType            The serialization format of the key (string, json-int, json-string, identity). We generate arbitrary keys during the warm-up and benchmark phase.
Remote Mode

The following commands allow us to run our tests against an existing store on a remote Voldemort cluster.


Option               Description
------               -----------
--url                The Voldemort server url (Example : tcp://hogwards.edu:6666)
--store-name         The name of the store (Duh!)
--handshake          Performs some basic tests against the store to check if it is running and we have the correct key/value serialization format.
Others

The following options are applicable to both remote and local mode.

a) Basic features


Option               Description
------               -----------
--ops-count <no>     The total number of operations (delete, read, write, update) to perform during the benchmark phase
--record-count <no>  The total number of records to insert during the warm-up phase. If we have the data pre-loaded in the cluster, we can set this value to 0
                     thereby ignoring the warm-up phase completely.
--iterations <no>    While the warm-up phase can be run atmost one time, the benchmark phase can be repeated multiple times. Default is 1 iteration
--threads <no>       This represents the number of client threads we use during the complete test.
--value-size <no>    The size of the value in bytes. We use this during the warm-up phase to generate random values and also during write operations of the benchmark
                     phase. Default is set to 1024 bytes.

b) Operations


Option               Description
------               -----------
-d <percent>         execute delete operation [<percent> : 0 - 100]
-m <percent>         execute update (read+write) operation  [<percent> : 0 - 100]
-r <percent>         execute read operations [<percent> : 0 - 100]
-w <percent>         execute write operations [<percent> : 0 - 100]

The sum of all the above numbers () should be 100. This is the percentage of the \—ops-count highlighted in (a). For example if the target application is read intensive (like photo tagging), we can set the \-r 95 \-m 5. If the application is write intensive, we can set \-w 95 \-r 5.

c) Record selection


Option               Description
------               -----------
--record-selection   Selection of the key during benchmark phase can follow a certain distribution. We support zipfian, latest (zipfian except that the most recently
                     inserted records are in the head of the distribution) and uniform <default>.
--start-key-index    Index to start inserting from during the warm-up phase.
--request-file       This is a file containing a list of keys (one key per line) on which we would like to run the benchmark phase. The benchmark phase generates its
                     ops-count number of operations from list of keys only. Setting this file overrides the --record-selection parameter.

For example, if we are using Voldemort for storing all the status updates of a social network, the record selection would match the ‘latest’ distribution since people tend to read the latest statuses. Also since these apps tend to be read heavy the parameters would be \-r 95 \-w 5 \—record-selection latest

d) Monitoring and Results


Option               Description
------               -----------
--interval <sec>     Since both the warm-up and benchmark phase tend to be time consuming, we can get status information (number of operations completed and current throughput)
                     at intervals of <sec> seconds.
--ignore-nulls       Certain reads may result in null values since they don't exist. We can neglect them from the final results calculation by setting --ignore-nulls
-v                   Verbose - Most of the exceptions are curbed by the tool, unless -v is mentioned
--verify             Verify the read values - This works only when the warm-up phase was run by us. If the value is wrong we log it as an error
--metric-type        The final results at the end of benchmark can be dumped as a latency histogram or only summary (mean, median, 95th, 99th latency). In the
                     histogram mode we also display the number of errors found during verification of reads (--verify). Options - [histogram | summary]

e) Extra


Option                    Description
------                    -----------
--percent-cached <no>     We can make some percentage of the requests to come from a previously requested key whose value would be cached in the tool itself. This can help in
                          simulating applications with a caching layer. The value can range form 0-100 [Default - 0 (no caching)]
--target-throughput <no>  This specifies the target number of operations per second. By default, the tool will try to do as many operations as it can. For example, if each
                          operation takes 100 milliseconds on average, the tool will do about 10 operations per second per worker thread. However, you can throttle the target
                          number of operations per second. For example, to generate a latency versus throughput curve, you can try different target throughputs, and measure the
                          resulting latency for each.
--prop-file               All the above mentioned properties can be mentioned in a single property file. The format of the file is "key1=value1 \n key2=value2 ..."

f) Plugins


Option                    Description
------                    -----------
--plugin-class            The name of the class which implements the "WorkloadPlugin" interface. Providing an implementation allows you to take control of the operations
                          during the benchmark phase and also the records inserted during warm-up phase.

This functionality was added to the tool to allow complex single store operations which go beyond read, write, update or delete. For example, we can now do a read, followed by manipulation of the value as one single operation. The

Example

No documentation is complete without an example showing the tool in action. For this example say we intend to benchmark Voldemort for storing profiles of users. A profile is roughly 10 KB and production throughput has been observed to be around 100 profiles/sec.

a) In the warm-up phase we’ll insert 500,000 profiles with value size 10KB. (—record-count 500000 \—value-size 10240 )
b) In the benchmark phase we’ll run a million operations (—ops-count 1000000). In this scenario we would expect a read heavy workload and a small proportion of updates (since profiles are rarely updated but read multiple times) (-r 90 \-m 10)
c) We want to find the latency figures while keeping the throughput fixed (—target-throughput 100)
d) And since this is run against a production cluster with an existing store … (—url tcp://prod:6666 \—store-name profiles)

The overall command would be


./bin/voldemort-performance-tool.sh --record-count 500000
                                    --value-size 10240
                                    --ops-count 1000000
                                    --target-throughput 100
                                    --url tcp://prod:6666
                                    --store-name profiles
                                    -r 90 -m 10

The result would look as follows. Lines starting with \[reads\] are all figures related the 900000 (90% of million operations) read operation, while \[transactions\] are the ones related to updates.

[reads]	Operations: 900000
[reads]	Average(ms): 19.95947426067908
[reads]	Min(ms): 0
[reads]	Max(ms): 292
[reads]	Median(ms): 2
[reads]	95th(ms): 256
[reads]	99th(ms): 276
[transactions]	Operations: 100000
[transactions]	Average(ms): 21.689655172413794
[transactions]	Min(ms): 1
[transactions]	Max(ms): 312
[transactions]	Median(ms): 6
[transactions]	95th(ms): 48
[transactions]	99th(ms): 312
In Code

Some simulations tend to be feature heavy and it gets difficult to maintain the command line options. The solution to this is to incorporate the tool in other programs where we can change the parameters programmatically. This can also help if we intend to run multiple simulations.

The following example helps us draw a plot of median latency versus number of client threads (for read operations only):


Benchmark benchmark = new Benchmark();
Props workLoadProps = new Props();

// Write to file “latencyVsThreads.txt” which we can use to plot
PrintStream outputStream = new PrintStream(new File(“latencyVsThreads.txt”));

// Insert million records during warm-up phase
workLoadProps.put(“record-count”, 1000000);

// Run million operations during benchmark phase
workLoadProps.put(“ops-count”, 1000000);

// Read intensive program with 95% read and 5% writes with keys being selected using uniform distribution
workLoadProps.put(“r”, 95);
workLoadProps.put(“w”, 5);
workLoadProps.put(“record-selection”, “uniform”);

// Run tool on Voldemort server running on localhost with store-name “read-intensive”

workLoadProps.put(“url”, “tcp://localhost:6666”);
workLoadProps.put(“store-name”, “read-intensive”);

// Initialize benchmark
benchmark.initializeStore(workLoadProps.with(Benchmark.THREADS, 100));
benchmark.initializeWorkload(workLoadProps);

// Run the warm-up phase
long warmUpRunTime = benchmark.runTests(false);

// Change the number of client threads and capture the median latency
for(int threads = 10; threads <= 100; threads += 10) {
benchmark.initializeStore(workLoadProps.with(Benchmark.THREADS, threads));
benchmark.initializeWorkload(workLoadProps);

long benchmarkRunTime = benchmark.runTests(true); HashMap<String, Results> resultsMap = Metrics.getInstance().getResults(); if(resultsMap.containsKey(VoldemortWrapper.READS_STRING)) {

Results result = resultsMap.get(VoldemortWrapper.READS_STRING);
outputStream.println(VoldemortWrapper.READS_STRING + “\t” + String.valueOf(threads)
+ “\t” + result.medianLatency);
}
}
// Close the benchmark
benchmark.close();

Restrictions
  • The tool currently supports working on a single store only. This may not be helpful in scenarios wherein the target application does multi-store transactions.

Todo

  • Show code snippet for Plugin
Something went wrong with that request. Please try again.