cassandra-stress is modeled after the stress.py script in Apache Cassandra's source distribution.
cassandra-stress is built on top of Hector, a well-tested and widely deployed Java client for Apache Cassandra. The benefits of using Hector for a tool like this are many:
- JMX hooks into Hector for visibility into runtime statistics
- Extensive configurability of which node(s) to target
- Verbose logging output available
- Hooks into performance counters to correllate client and server performance
cassandra-stress is a currently just a command line app which supports the following commands (syntactically identical to stress.py where possible):
usage: stress [options]... url1,[[url2],[url3],...] -b,--batch-size <arg> The number of rows in the batch_mutate call -c,--columns <arg> The number of columsn to create per key -o,--operation <arg> One of insert, read, rangeslice, multiget -t,--threads <arg> The number of client threads to create -h,--help Print this help message and exit -D,--discovery-delay <arg> The amount of time to wait between runs of Auto host discovery. Providing a value enables this service -h,--help Print this help message and exit -L,--consistency-levels <arg> Defaults to QUORUM for R+W, specified in the form of [read]:[write] eg. '-L ONE:ONE' -M,--max-wait <arg> The Maximum time to wait on aquiring a connection from the pool (maxWaitTimeWhenExhausted). Default is forever. -m,--unframed Disable use of TFramedTransport -n,--num-keys <arg> The number of keys to create -o,--operation <arg> The type of operation: insert or select -R,--retry-delay <arg> The amount of time to wait between runs of Downed host retry delay execution. 30 seconds by default. -S,--skip-retry-delay Disable downed host retry service execution. -T,--thrift-timeout <arg> The ThriftSocketTimeout value. -t,--threads <arg> The number of client threads we will create -w,--colwidth <arg> The widht of the column in bytes. Default is 16
This project now runs as a stand-alone binary via the Maven app-assempbler plugin. You must first clone the source of the project, then using maven, execute the install goal:
Move to the directory where the artiffacts are assembled:
Now you can execute the script directly as needed:
sh bin/stress -o insert -b 50 -n 12000 localhost:9160
This will perform the operation then drop you into a shell in which you can perform additional operations. The command syntax is similar to the initial call, except that the operation is the only argument:
[cassandra-stress] usage: operation [options] operation can be one of: insert, read, rangeslice, multiget, replay [N] -o,--operation <arg> One of insert, read, rangeslice, multiget, verify_last_insert (1) -b,--batch-size <arg> The number of rows in the batch_mutate call -t,--threads <arg> The number of client threads we will create -c,--columns <arg> The number of columsn to create per key -C,--clients <arg> The number of pooled clients to use
To re-run the operation again, just type the operation name. So from the initial example above, re-running the read operation with the same parameters would be:
(1) Must run single thread (-t 1) and QUORUM:QUORUM. Intended to test in a cluster of at least 3 nodes RF = 3.
The main benefit of going into a shell is that it keeps the JVM and connection pool machinery warm, so additional test runs will not have the overhead of spinning everything up. This has the added benefit of providing conitnuous access to Hector's JMX stats as well.
Todo (in rough priority)
- Create ColumnFamily specifically for test
- Add JMX innards for MBean driven command/control
- Add replay N times functionality from shell
- Add (much) better key distribution (gaussian, stdev argument)
- SuperColumn support
Offered under an MIT license (see LICENSE in the top level directory). As with any halfway decent load testing tool, you can generate a lot of load and put a system under duress or even cause it to fail. You are solely responsible for any issues experienced the execution of this tool may cause. Use at your own risk.