Skip to content

EC2 Testing Infrastructure

kirktrue edited this page Aug 17, 2010 · 2 revisions

For the 0.60 release of Voldemort we included a testing framework called the “EC2 Testing Infrastructure” project. This introduces the ability to test Voldemort against a set of ad hoc remote nodes using EC2 (or your own environment as we’ll discuss later).

As with any distributed system it’s difficult to approximate a “real world” environment, both for the Voldemort developers as well as their users. In many cases testing is done on a single node (e.g. the user’s system) or perhaps a handful of spare machines cobbled together. Neither of these scenarios are particularly suitable to gauge the actual characteristics that will be seen once the system moves to production. What is needed is a means by which to test Voldemort in an environment that more closely resembles production.

We outlined the requirements for this testing infrastructure as one that would allow us to:

  • Bring up and down an arbitrary number of nodes on demand
  • Remove dependency on internal IT for physical/virtual machine allocation
  • Use nodes that are exactly the same or intentionally dissimilar
  • Seed configuration and data on all nodes
  • Develop and execute tests from both developer and testing environments
  • Start and stop nodes (some or all) on demand
  • Be as cost effective as possible

With the criteria above, it was pretty easy to see why a solution such as EC2 was selected. Much of the actual heavy lifting is already done by EC2 and we can focus on hooking up Voldemort to work with it.

Requirements

The first question we need to address is, “Do I Have to Use EC2?” Surprisingly the answer is no. Despite the name, this feature was designed to work in non-EC2-based environments as well. Obviously the code to create and terminate EC2 instances is EC2-specific, but the rest of the code is agnostic to the environment under which it runs. There’s nothing that prevents a user from using an internal testing lab to perform the tests. The hope is that as the need arises, other IaaS providers’ APIs can be utilized to hook into their instance lifecycle mechanism.

Regardless of the underlying infrastructure the server and client nodes systems’ must meet certain requirements. As mentioned before, while the project is geared toward using the EC2 infrastructure for testing, we want the utilities/library to be as agnostic as possible. So we define a minimal set of requirements.

Prerequisites for server nodes:

  • 2.6-based distribution of Linux
  • ulimit set appropriate to number of sockets involved in test
  • Key-based SSH access (i.e. does not require a password)
  • Java 5 installed with JAVA_HOME environment variable set
  • Required ports open for cluster connectivity

Prerequisites for client nodes:

  • 2.6-based distribution of Linux
  • ulimit set appropriate to number of sockets involved in test
  • Key-based SSH access (i.e. does not require a password)
  • Java 5 installed with JAVA_HOME environment variable set

Note also that a given system can function as a Voldemort server node, a client, or both.

Host File Format

Most of the command line tools require a file that contains a listing of the host names that are used by that operation. For example, when copying data to the test systems we need to provide the host names of those systems in order for the data to be copied.

The host names of the machines we are using for testing (regardless of whether they serve as clients or servers) need to be stored in a specific format expected by the various scripts.

While the format is actually quite simple, it is different based on the network configuration. If our test machines are on a separate logical network from the machine that is executing the commands (i.e. the “local” machine), it may be that the name for those systems from the perspective of the local machine is different from the name those systems use on their internal network. This is the case with EC2; each EC2 instance has two host names: an external host name and an internal host name. In EC2’s documentation these are known as the instance’s “public” and “private” DNS names, respectively. When we access an EC2 instance (via HTTP, SSH, etc.) from the local machine we use the instance’s external host name. The external host name usually looks something like “ec2-xx-xx-xx-xx.compute-1.amazonaws.com” while the internal host name is often of the form “domU-xx-xx-xx-xx-xx-xx.compute-1.internal” (sometimes an internal IP format is used rather than a MAC address). If the test systems are on the same logical network, it’s often the case that a given system on that network only has one host name which makes things easier.

The host file format is very simple. In the case of a separate external/internal host name, the host file is a line-delineated, equal sign-separated list of external host name and internal host name of the general form:

external host name 1=internal host name 1
external host name 2=internal host name 2

For example, if you’re running on EC2, your host file will look similar to:

ec2-75-101-207-45.compute-1.amazonaws.com=domU-12-31-39-07-CD-57.compute-1.internal
ec2-75-432-54-543.compute-1.amazonaws.com=domU-342-312-81-34-CD-57.compute-1.internal

Even if you’re not using EC2, it could be that your network is logically separated which could result in the need to make a mapping file, e.g.:

test01.example.com=joker.qa
test02.example.com=batman.qa

If our test machines are on the same logical network, the host file format is even simpler, being a line-delimited list of the host names:

test01.example.com
test02.example.com

If you’re using EC2, the EC2 instance creation script will generate the host information for you in the correct format, you simply need to save the output to a file. If you’re using something other than EC2, you’ll need to generate the files manually. Also, if you want your nodes to act as servers or clients (but not both), you’ll need to make two separate files with the appropriate nodes in each and pass these to the various commands as appropriate. More details will be provided as needed.

Usage Overview

In general terms, there are two ways to use the EC2 testing infrastructure:

  1. Via command-line scripts
  2. Via Java (often via JUnit)

The operations that are supported are:

  • EC2 instance creation
  • EC2 instance termination
  • Server node data clean up
  • Server cluster descriptor generation
  • Remote client test invocation
  • Cluster start up
  • Cluster shut down
  • Deployment (data, configuration, binaries)

All of the code, scripts, and configuration comes standard in the main Voldemort distribution.

The scripts must be run from the Voldemort root directory (which is the main Voldemort directory under which all the sub-directories (bin, contrib, src, etc.) live). This directory on the local machine is hereafter referred to as VOLDEMORT_ROOT.

The command-line scripts are located at:

$VOLDEMORT_ROOT/contrib/ec2-testing/bin

The main source files are located under:

$VOLDEMORT_ROOT/contrib/ec2-testing/src/java
$VOLDEMORT_ROOT/contrib/ec2-testing/test

In the following several sections we will outline the operations one-by-one. But first we need to discuss the host file format used by most of the operations.

EC2 Instance Creation

Script:

voldemort-ec2instancecreator.sh

Classes:

voldemort.utils.Ec2RemoteTestUtils
voldemort.utils.Ec2Connection
voldemort.utils.TypicaEc2Connection

This script and classes are used to initialize EC2 instances for use as Voldemort servers or clients (or both). This step is independent of Voldemort and is simply a means to provision instances meeting our prerequisites. In actuality one could use Amazon’s stock command-line tools as a means to provision the servers.

As is required by EC2, to use either the script or the Java API, you’ll need to have access to a valid AWS access ID and secret key. Additionally, you’ll need the ID of an AMI that meets the environmental requirements above. Usually that will mean that a user has created a custom AMI that meets those requirements.

It’s also possible to specify an EC2 instance size upon creating the instances. It is important to note that the instance size will be dependent on the architecture of the AMI. For example, if the AMI is 32-bit, a “large” EC2 instance can not be started for that AMI. Conversely, if the AMI is 64-bit, a “small” instance cannot be allocated. These are constraints of EC2, not this framework.

The key advantage of using this tool over Amazon’s command-line tools is that the voldemort-ec2instancecreator.sh script generates a listing of the host names of the instances created in the format that the rest of the operations require. Upon successfully booting the instances, the external and internal host names are output by the script in the format as described in the section “Host File Format”. Simply running this script and redirecting the output to a file is all that is needed to generate the hosts file.

Depending on your test set up, you may need to run the voldemort-ec2instancecreator.sh utility more than once. For example, if you want your servers and clients to reside on separate instances, you’d want to run voldemort-ec2instancecreator.sh twice, once to create a list of servers and then again to create a list of clients, saving the output of each invocation to separate files.

$ ./contrib/ec2-testing/bin/voldemort-ec2instancecreator.sh <options> > /tmp/servers
$ ./contrib/ec2-testing/bin/voldemort-ec2instancecreator.sh <options> > /tmp/clients

EC2 Instance Termination

Script:

voldemort-ec2instanceterminator.sh

Classes:

voldemort.utils.Ec2RemoteTestUtils
voldemort.utils.Ec2Connection
voldemort.utils.TypicaEc2Connection

This utility will terminate EC2 instances. This step is also independent of Voldemort and is simply a means to de-provision instances previously created.

In addition to the Amazon access ID and secret key, a file containing a list of host names or a list of instance IDs is required. These will then be terminated.

When invoking the termination via Java code, it’s a good idea to put the EC2 instance termination code somewhere that it’s certain to be executed. For example, in a shutdown hook, in a test’s tear down method, or similar. However, there are cases where the termination code may never get executed, for example of the tests hang, the JVM crashes, and so forth. It is recommended that any instances that are created are persisted outside of the JVM for use by automated clean up scripts or even manual intervention.

Cluster Descriptor Generation

Script:

voldemort-clustergenerator.sh

Classes:

voldemort.utils.ClusterGenerator
voldemort.utils.RemoteTestUtils

This is a simple script/Java API that creates a cluster.xml dynamically, based upon the host names provided. It implements the generate_cluster_xml.py script in Java for use both as a command line utility and from within JUnit. The number of partitions is presently fixed to be the same for all nodes. The partition numbers are randomly generated and assigned to each node.

Deployment

Script:

voldemort-deployer.sh

Classes:

voldemort.utils.Deployer
voldemort.utils.RsyncDeployer
voldemort.utils.RemoteTestUtils

The deployer script/library API is used to deploy the Voldemort directory structure to a set of remote machines. These machines can be either servers or clients, or a mixture of both.

As can be gleaned from the class name, this actually uses rsync under the covers. This means that repeated transfers to instances will synchronize only changes, providing a boost in speed in test set up times. The list of remote hosts to which to sync the data are updated in parallel.

The two most important parameters are the parent directory on the remote host under which data will be copied and the source directory on the local machine. The two directories can be whatever are needed to copy the binaries, system data, and so forth. For example:

$ ./contrib/ec2-testing/bin/voldemort-deployer.sh --hostnames servers --source `pwd` --parent .
$ ./contrib/ec2-testing/bin/voldemort-deployer.sh --hostnames clients --source /tmp/test/bin --parent bin
$ ./contrib/ec2-testing/bin/voldemort-deployer.sh --hostnames clients --source /tmp/test/config --parent config

In the above example we copy the Voldemort source directory (assuming we’re in $VOLDEMORT_ROOT) to the remote servers. Next we copy our own custom test classes, scripts, etc. to the “bin” directory off of the remote test client’s home directory. Finally, we copy the configuration to the remote test client’s home directory.

Cluster Starter

Script:

voldemort-clusterstarter.sh 

Classes:

voldemort.utils.ClusterStarter
voldemort.utils.SshClusterStarter
voldemort.utils.RemoteTestUtils

The Voldemort server runner utility/library API is used to start the Voldemort servers on a set of remote machines. The heavy lifting will be performed by SSH to gain access to the machine and voldemort-start.sh for starting the server.

For each server, we determine the mapping of external server host name to node ID using cluster.xml and the server name mapping. After SSH-ing into each server node we export VOLDEMORT_HOME as the remote machine’s configuration directory and VOLDEMORT_NODE_ID for the system’s unique node ID. We then invoke voldemort-start.sh and monitor the output from each server node.

VoldemortConfig already includes logic to look for the node ID in the VOLDEMORT_NODE_ID environment variable if it’s not present in server.properties. This means that we first need to remove the node.id property from server.properties before starting up the server node. This is automatically handled by the framework to prevent conflicts.

Cluster Stopper

Script:

voldemort-clusterstopper.sh

Classes:

voldemort.utils.ClusterStopper
voldemort.utils.SshClusterStopper
voldemort.utils.RemoteTestUtils

The Voldemort server stopper utility/library API is used to stop the Voldemort servers on a set of remote machines. It does not halt the operating system, just the Voldemort process. The heavy lifting will be performed by SSH to gain access to the machine and voldemort-stop.sh for stopping the server.

Cluster Cleaner

Script:

voldemort-clustercleaner.sh

Classes:

voldemort.utils.ClusterCleaner
voldemort.utils.SshClusterCleaner

This will SSH into each of the server nodes and delete the “data” directory under the given $VOLDEMORT_HOME configuration directory.

Remote Test Client Runner

Script:

voldemort-clusterremotetest.sh

Classes:

voldemort.utils.RemoteTest
voldemort.utils.SshRemoteTest
voldemort.utils.RemoteTestUtils

The Voldemort test client runner utility/library API is used to start the Voldemort test clients on a set of remote machines. The heavy lifting will be performed by SSH to gain access to the machine and run the external command.

Usage of the remote test script is probably the most difficult because it intentionally doesn’t do much. It basically executes the commands provided in a commands file on the remote machine via SSH. The output of the remote command is then echoed to standard output, for all of the test clients. So it can get pretty ugly, but it’s probably best to save off the result and write a script that can parse the data according to each host name.

While using the remote test functionality provides a high bar for entry, the data that is returned is intended to be generic enough that arbitrary shell scripts and tools can take use these results for aggregation, trending, graphing, etc. We also didn’t want to presume any set of tests that should be executed and instead make it generalized enough that—hopefully—any test can be executed within the infrastructure.

Here is an example that shows how to build up, execute, and parse the results from voldemort-clusterremotetest.sh:

counter=0
sleep=30
cat /tmp/clients |
while read line
do
  rampSeconds=$(($sleep * $counter))
  idx=$(($numRequests * $counter))
  counter=$(($counter + 1))
  externalHost=`echo $line | cut -d'=' -f1`
  cmd="cd client/$(basename `pwd`)"
  cmd="$cmd ; sleep $rampSeconds"
  cmd="$cmd ; ./bin/voldemort-remote-test.sh --start-key-index $idx ${url} test 100"
  echo "$externalHost=$cmd" >> /tmp/commands
done
./contrib/ec2-testing/bin/voldemort-clusterremotetest.sh \
    --hostnames /tmp/clients \
    --commands /tmp/commands > /tmp/results
./contrib/ec2-testing/examples/remotetest/remotetestparser.scala /tmp/results

Conclusion

The EC testing infrastructure has been a great way to test different scenarios, new features, and performance for Voldemort developers. However, users of Voldemort can use this same framework to get a much more accurate idea of how their particular application will fair under differing loads, configuration, and so forth.

For Voldemort we are running performance tests on a regular basis in order to catch any performance regressions. These tests output raw data which are then trended and turned into graphs like the following:

Please check out the source code, scripts, and examples and let us know what works and what doesn’t. Feel free to ask questions, submit requests and patches, or comments/criticisms via the mailing list.