Application instrumentation is one of those things, like test driven development, which I've come to consider as mission critical in writing good software. And, just like test driven development, it seems to always end up getting short-cutted, de-prioritized, or otherwise man-handled when crunch time comes.
I've been around long enough now to know that when you need to do something like instrumenation, you just need to draw a line in the sand and execute. I did this recently at Apigee on the Usergrid platform, and I decided to write up instructions in a hope that folks will realize how easy this is to do now and how valueable it is in understanding how to scale and manage your applications.
So stop what you are working on, create a new branch, and get ready to follow along at home. To get immediate return on time spent should not take more than an afternoon of hackery. Really the biggest bottleneck I found, was scoring the time from our very good by already overworked ops team to pull the trigger getting on a monitoring solution in place for us.
A quick peek at the history of the term instrumentation will be helpful for the context of understanding the provided functionality: http://en.wikipedia.org/wiki/Instrumentation.
Graphing these concepts onto the application behavior at the code level is fitting - what we deal with as programmers is input and output. Measuring what happens in between provides a critical feedback loop to both application developers and operations personnel about the runtime health of a system.
When coupled with a monitoring system which stores historical data, code instrumentation provides valuable information into application performance over time. With this information, it becomes easier to spot performance regressions and application hotspots as load and traffic patterns change (assuming those are monitored as well).
Getting Started With Instrumentation
As a first step, it's best to start with common dispatch mechanisms or system entry points, instrumenting them for execution time. For example, lets assume we have just instrumented the "authenticate" method of a UserService. This hypothetical UserService encapsulates the interactions with several distinct systems, retrieving user information from a data store and verifying a certificate against a key store.
Over time, we will build up a fairly reliable idea of the mean execution time and standard deviation of this method, 5ms with occasional spikes into 11ms let's say. We can now use this information to trigger events in our monitoring system. If we saw a gradual degradation which brought the average outside of standard deviation, we could make an assumption that there is a problem lurking under the hood - perhaps a missing index on a database lookup where the table has continued to grow. A warning can be raised to the monitoring system.
Additionally, if there were an immediate spike in execution time, say to 90ms, we can assume given this anomalous behavior of an order of magnitude execution time increase, there is a partial failure of one or more systems involved. We can trigger an alarm condition on our monitoring system and begin immediate diagnosis.
With exception metering, we gain powerful insights into the behavior of the system in failure situations and can respond pro-actively to services which are in a failure mode. For example, if the class returning a web resource begin to throw 500 errors, we could again raise immediate alarms, and have essentially real-time notification of component failure.
Along these same lines, the immediate presence of a large number UserPasswordMismatchExceptions thrown from our UserService might mean a brute force password attack was underway. The monitoring system could notify operations that evasive action was needed.
Metrics was selected for the following reasons:
- modular, well-structured API
- annotation-driven instrumentation
- plug-able interface for common monitoring systems (including graphite and munin)
Instrumentation Example Project
A sample project has been created to demonstrate the most common and immediately useful features of instrumentation. The project consists of a main class that executes some hashing operations repeatedly to generate execution data. The operations are encapsulated into two different classes: one which uses the method-level annotations from the metrics api and the other which use the metrics API classes directly.
The only requirements for running the application are a JVM, git and a recent version of maven installed. The project is available on github at the following URL: [project URL]
Running the example application
Clone the git repository
git clone https://github.com/zznate/instrumentation-example
cd into the directory
Execute the sample application via maven with the following command:
mvn -e exec:java -Dexec.mainClass="com.apigee.instrumentation.example.InstrumentedBeanCli"
ctrl+c to kill the program (but only when you are satisfied with the results of the analysis of the application as described below).
Analysing Instrumentation Output
For demonstration purposes, we are using JMX via one of the console utilities in the JDK to go over instrumentation output. In an actual deployment environment, one would typically configure a visualization system - we use graphite at Apigee - to store the output and collect data for trend analysis.
Analysing this data in pre-production environments (testing, staging, etc.) also makes it easier to spot performance regressions or degradations before a production deploy occurs. We have graphite servers in both production and staging environments, for example.
An ideal place to be (we are not there yet unfortunately) would be to have performance information from the nightly smoke testing going into a dedicated graphite server as well. I'd be nice to see this in the morning and look for potential hotspots before they get any farther.
Unfortunatley, setting up graphite is outside the scope of this doc and is uneccessary to get your head around the fundamentals. We'll use Java Visual VM to get started.
Using Java VisualVM
Java VisualVM (jvisualvm )should be available via command line with any recent JDK download. To use jvisualvm, type the following on the command line:
The MBean inspector is not present by default so we have to configure its plugin before we can use it:
- In the Tools menu select Plugins
- Go to the Available Plugins tab
- Select VisualVM-MBeans (though we won’t use it in this tutorial, Visual GC is handy as well)
- Restart jvisualvm
Once jvisualvm is running with the VisiaulVM-MBeans plugin installed, start the demo application and attach jvisualvm to the demo process by:
- Selecting the "Launcher" process from the list in the Applications pane
- You can verify we found the right process as it will display the arguments we invoked to start the demo [pic]
Examining The Demo Application Process
Now that we are attached to the demo process, let's look at the output of the annotations through the MBean interface (automatically created for us by wiring in the JmxReporter in our applcation context). In the jvisualvm window:
- Select the MBeans tab
- Expand the selection list for the Annotated MBean in the MBeans pane
- Expand the AnnotatedInstrumentedBean to reveal the output from the Timed and ExceptionMetered annotations
- Similarly, expand the someType bean to reveal the output of the Metered annotation
Note that we were able to slot the Metered output into a different bean by adding a type of "someType" when we declared the annotation. To have the output from Metered grouped with the other two annotations, remove the type argument.
Now that you have a feel for how the annotations present themselves via MBean, click around and explore the output (hitting refresh to update the statistics). The details present for each type of annotation will be explained in further detail below.
Using Metrics With Your Code
As previously mentioned, the Metrics API can be annotation driven. Dropping this into crucial parts of your code via such will therefore be straightforward. It really should not take you more than an hour to tackle the most critical paths of your code base. There are currently four different annotations avaialble in the Metrics API:
- Timed (contains metered)
- Counter (not shown in our example, but simple enough)
The actual metric types available from the Metrics API are slightly different than the annotations. I'm not going into detail on them because they are already well documented and I'm of the school of thought that instrumentation should be as minimially invasive as possible. Thus our focus on the annotations.
A Note for CDI/IoC Users
For folks using IoC containers - specifically Spring Framework or Google Guice - those packages have been moved out of the Metrics code base and are now being maintained by third parties. We'll focus on SpringFramework for the purposes of this tutorial since it's what we use on Usergrid.
- Spring IoC injected (can use Guice as well)
- Deps required
- metrics namespace in context file