Debug messages for initial Voldemort profiling #83

Merged
merged 1 commit into from Jul 10, 2012

Conversation

Projects
None yet
3 participants
Contributor

pbailis commented Jul 9, 2012

Modification Summary

This is a simple patch that prints log4j DEBUG messages for use in profiling Voldemort in production. The overhead when DEBUG is off should be negligible, and the main overhead when it is on will be due to logging. I don't propose this patch be merged into trunk--Jay and I have talked about how to do this "right", but it will be a first step towards better understanding Voldemort behavior.

To turn it on, set log4j to run at the DEBUG level, ideally with timestamp logging at the millisecond level.

Client-side modifications

  • PipelineRoutedStore:
    • end-to-end latencies for GET, GETALL, GETVERSIONS, PUT, DELETE
    • for PUT: ParallelPutRequests, SerialPutRequests
    • for GET, GETVERSIONS: PerformParallelRequests, PerformSerialRequests
    • for GETALL: PerformParallelGetAllRequests,PerformSerialGetAllRequests
  • SocketStore:
    • client-server WARS analysis
    • opaque re: what kind of request
    • logs remote and local ports, start/end time

Server-side modifications:

  • AsyncRequestHandler:
    • lowest level at which we have a socket handle
    • logs completion time of server request with local+client IP, port
  • VoldemortNativeRequestHandler:
    • detailed logging for GET, GETALL, GETVERSIONS, PUT, DELETE
      • keys, value hashes, time, versions
  • BdbStorageEngine:
    • storage-specific GET, GETALL, GETVERSIONS, PUT, DELETE logging

Other

  • AbstractReadRepair:
    • basic logging for measuring how often RR fires and how long it takes
  • HintedHandoff:
    • basic logging for measuring how often and how long hintedhandoff happens/takes

How do we correlate debug messages across method invocations?

There is no notion of a request context in Voldemort, so we can't easily correlate log messages produced by different methods. This is problematic: on the client, given a log message in a SocketStore, how do we know which request it belongs to? (The key is probably sufficient, but in the event of concurrent requests to the same key,we cannot disambiguate.) Across clients and servers we have a similar problem.

We cannot change the method signatures, so instead we correlate at method boundaries. In general, the same (ByteArray key) instance is used up and down the client stack (from the serializing class to the socket) and another (ByteArray key) instance is used similarly in the server stack. Using System.identityHashCode()` to get a (non-unique but close enough to unique) handle to track exactly which request we're operating on up and down the stack. Across system boundaries, we have some other messiness, which I outline here:

CLIENT:CLIENT    PipelineRoutedStore -> SocketStore : identityHashCode(key)
CLIENT:SERVER    SocketStore -> AsyncRequestHandler : clientId:clientPort
SERVER:SERVER    AsyncRequestHandler -> VoldemortNativeRequestHandler : identityHashCode(handler)
SERVER:SERVER    VoldemortNativeRequestHandler -> BdbStorageEngine : identityHashCode(key)

This means, for example, to find the corresponding debug messages in BdbStorageEngine:

  1. Find the hashcode of the request key ByteArray in the PipelineRoutedStore log
  2. Find the corresponding client socket in the SocketStore logs
  3. Find the request that came in on the client socket in the AsyncRequestHandler logs
  4. Find the server-side request key ByteArray in the AsyncRequestHandler logs
  5. Find the debug message in BdbStorageEngine for the corresponding request key

It's pretty nasty, but I can't think of a better way to do it without modifying the API. This is effectively a five-way join between the logs.

Caveat: While it would be nice, this ad-hoc method-by-method parameter-based correlation means that we can't rely on sampling to limit tracing overheads (as in Dapper). This is potentially going to be a problem: the logging overhead may be onerous. If each method output, say, 1 in 1000 DEBUG messages, it would decrease overheads. However, we cannot ensure that that all classes choose the same requests to output. For example, PipelineRoutedStore may output a debug message for a given request while SocketStore does not. Statistically, there will be intersection, but this is likely rare (roughly p^4). There is possibly a one-to-one correlation of debug messages between some classes, but more likely we can't synchronize the counts between classes (if we had a means of doing so, we wouldn't have this problem, and because the join key changes between classes, it's not likely that there's an easy common scheme to do so).

Contributor

pbailis commented Jul 9, 2012

Backwards reference to old patch against release:
#82

@vinothchandar vinothchandar and 1 other commented on an outdated diff Jul 9, 2012

...rver/protocol/vold/VoldemortNativeRequestHandler.java
@@ -321,18 +362,67 @@ private void handleGetAll(DataInputStream inputStream,
// write back the results
outputStream.writeInt(results.size());
+
+ logger.debug("GETALL start");
@vinothchandar

vinothchandar Jul 9, 2012

Collaborator

can we place this and the following GETALL end , inside if debugEnabled block?

@pbailis

pbailis Jul 10, 2012

Contributor

Fixed--thanks!

@vinothchandar vinothchandar and 1 other commented on an outdated diff Jul 9, 2012

.../store/routed/action/PerformSerialGetAllRequests.java
@@ -79,6 +79,10 @@ public void execute(Pipeline pipeline) {
boolean zoneRequirement = false;
MutableInt successCount = pipelineData.getSuccessCount(key);
+ logger.debug("GETALL for key " + key + " (keyRef: " + System.identityHashCode(key)
@vinothchandar

vinothchandar Jul 9, 2012

Collaborator

if check here too..

@pbailis

pbailis Jul 10, 2012

Contributor

Fixed--thanks!

@vinothchandar vinothchandar and 1 other commented on an outdated diff Jul 9, 2012

src/java/voldemort/store/slop/HintedHandoff.java
@@ -127,6 +129,12 @@ public void requestComplete(Object result, long requestTime) {
failedNodes.add(node);
if(response.getValue() instanceof UnreachableStoreException) {
UnreachableStoreException use = (UnreachableStoreException) response.getValue();
+
+ logger.debug("Write of key " + slop.getKey() + " for "
@vinothchandar

vinothchandar Jul 9, 2012

Collaborator

if check here too

@pbailis

pbailis Jul 10, 2012

Contributor

Fixed--thanks!

zhongjiewu merged commit b687f02 into voldemort:master Jul 10, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment