JMX Monitoring

rsumbaly edited this page Jul 14, 2011 · 2 revisions

This wiki page is a dump of all the MBeans that we expose along with their details.

  • voldemort.cluster
    • Cluster
      • getName() - Gets the name of the cluster as specified in the cluster metadata
      • getNumberOfNodes() - Gets the number of nodes as specified in the cluster metadata
  • voldemort.cluster.failuredetector ( Failure detector run on the client on the server side )
    • BannagePeriodFailureDetector ( Bans a node immediately after getting one timeout )
      • getUnavailableNodesBannageExpiration() - List of unavailable nodes and their respective bannage expiration
      • getAvailableNodes() - Gets the available node ids list as a string
      • getUnavailableNodes() - Gets the unavailable node ids list as a string
      • getAvailableNodeCount() - Gets the number of nodes which are available
      • getNodeCount() - Gets the total number of nodes
    • ThresholdFailureDetector ( Bans a node only after passing a minimum threshold of failures )
      • getNodeThresholdStats() - Each node is listed with its status (available/unavailable) and success percentage
      • getAvailableNodes() - Gets the available node ids list as a string
      • getUnavailableNodes() - Gets the unavailable node ids list as a string
      • getAvailableNodeCount() - Gets the number of nodes which are available
      • getNodeCount() - Gets the total number of nodes
  • voldemort.server
    • VoldemortServer
      • restoreDataFromReplication(parallelism) - If this node went down and lost data completely run this command with parameter being the amount of parallelism to use to restore the data from other nodes
      • start() - Start Voldemort server
      • stop() - Stop Voldemort server
      • isStarted() - Boolean indicating if Voldemort is running
  • voldemort.server.niosocket
    • nio-socket-server ( The actual service answering the normal requests )
      • getPort() - The port on which it is answering requests
    • admin-server ( The admin service answering admin requests )
      • getPort() - The port on which it is answering admin requests
  • voldemort.server.protocol.admin
    • AsyncOperationStatus - We run a couple of asynchronous jobs ( like during rebalancing, restoring from replication, etc )
      • getStatus( job_id ) - Get the status string of the job id specified
      • getAllAsyncOperations() - Get the list of all async operations
      • stopAsyncOperation( job_id ) - Stop a particular job id
  • voldemort.server.rebalance
    • RebalancerService ( The service running rebalancing )
  • voldemort.server.scheduler
    • SchedulerService ( The service which runs scheduled jobs - clean up job, slop pusher job, etc )
      • getScheduledJobs() - Gets a list of scheduled jobs
      • enable( job_id ) - Enable the job id if disabled
      • disable( job_id ) - Disable the job id if enabled
  • voldemort.server.storage
    • StorageService
      • forceCleanupOldData( store_name ) - Starts the clean up on the store name by deleting data older than 'retention_days' specified in store metadata
      • forceCleanupOldDataThrottled ( store_name, number_of_entries_per_sec ) - Same as above but with throttling
      • logStoreStats ( store_name ) - Calculates statistics related to the store ( like tuple count, etc ) and logs it
      • logStoreStats ( ) - Does the same as above but on all stores
  • voldemort.server.bdb
    • bdbStorageConfiguration
      • cleanLogs() - Forceful cleaning of BDB logs
      • getEnvStatsAsString( store_name ) - For a BDB store name get all its environment stats
    • [store_name]
      • getBdbStats() - Get store level statistics
  • voldemort.store.metadata
    • metadata
      • cleanAllRebalancingState() - Clean the rebalancing state
  • voldemort.store.readonly
    • [store_name]
      • rollback( directory_path ) - Rollback to a directory path
      • swapFiles( directory_path ) - Swap to a new directory path
      • getChunkIdToNumChunks() - Get the underlying statistics about chunk id to number of chunks
      • getLastSwapped() - get the timestamp of the last swap
  • voldemort.store.rebalancing
    • [store_name] - Triggered when we have redirecting stores on
      • getIsRedirectingStoreEnabled() - Check if we want redirections to take place
      • setIsRedirectingStoreEnabled() - Manual override to stop redirections from taking place
  • voldemort.store.slop
    • slop
      • getOutstandingTotal () - Total number of slops yet to be pushed
      • getOutstandingByNode() - Get total number of slop by node
      • getOutstandingByZone() - Get total number of slops by zone
  • voldemort.store.stats [ All calculated in a sliding window of 5 minutes ]
    • admin-streaming
      • getStreamOperationIds() - All the streaming job ids
      • getAllStreamOperations() - Get status of all stream operations.
      • getStreamOperation( handle_id ) - Get the status of a stream operation with specified id
      • clearFinished() - Manually clear out finished tasks.
      • getAvgFetchKeysDiskTimeMs() - Disk ms
      • getAvgFetchEntriesDiskTimeMs() - Disk ms
      • getAvgFetchFileDiskTimeMs() - Disk ms
      • getAvgUpdateDiskTimeMs() - Disk ms
      • getAvgSlopDiskTimeMs() - Disk ms
      • getAvgFetchKeysNetworkTimeMs() - Network ms
      • getAvgFetchEntriesNetworkTimeMs() - Network ms
      • getAvgFetchFileNetworkTimeMs() - Network ms
      • getAvgUpdateNetworkTimeMs() - Network ms
      • getAvgSlopNetworkTimeMs() - Network ms
    • [store_name]
      • getNumberOfCallsToGetAll()
      • getAverageGetAllCompletionTimeInMs()
      • getGetAllThroughput()
      • getNumberOfCallsToGet()
      • getAverageGetCompletionTimeInMs()
      • getGetThroughput()
      • getNumberOfCallsToPut()
      • getAveragePutCompletionTimeInMs()
      • getPutThroughput()
      • getNumberOfCallsToDelete()
      • getAverageDeleteCompletionTimeInMs()
      • getDeleteThroughput()
      • getNumberOfObsoleteVersions()
      • getNumberOfExceptions()
      • getAvgOperationCompletionTimeInMs()
      • getOperationThroughput()
      • getPercentGetReturningEmptyResponse()
      • getPercentGetAllReturningEmptyResponse()
      • getMaxPutLatency()
      • getMaxGetLatency()
      • getMaxGetAllLatency()
      • getMaxDeleteLatency()
      • getMaxPutSizeInBytes()
      • getMaxGetAllSizeInBytes()
      • getMaxGetSizeInBytes()
      • getAverageGetSizeInBytes()
      • getAverageGetAllSizeInBytes()
      • getAveragePutSizeInBytes()
  • voldemort.store.stats.aggregate [ All calculated in a sliding window of 5 minutes ]
    • aggregate-perf
      • getNumberOfCallsToGetAll()
      • getAverageGetAllCompletionTimeInMs()
      • getGetAllThroughput()
      • getNumberOfCallsToGet()
      • getAverageGetCompletionTimeInMs()
      • getGetThroughput()
      • getNumberOfCallsToPut()
      • getAveragePutCompletionTimeInMs()
      • getPutThroughput()
      • getNumberOfCallsToDelete()
      • getAverageDeleteCompletionTimeInMs()
      • getDeleteThroughput()
      • getNumberOfObsoleteVersions()
      • getNumberOfExceptions()
      • getAvgOperationCompletionTimeInMs()
      • getOperationThroughput()
      • getPercentGetReturningEmptyResponse()
      • getPercentGetAllReturningEmptyResponse()
      • getMaxPutLatency()
      • getMaxGetLatency()
      • getMaxGetAllLatency()
      • getMaxDeleteLatency()
      • getMaxPutSizeInBytes()
      • getMaxGetAllSizeInBytes()
      • getMaxGetSizeInBytes()
      • getAverageGetSizeInBytes()
      • getAverageGetAllSizeInBytes()
      • getAveragePutSizeInBytes()

Client side

  • voldemort.client
    • DefaultStoreClient.[store_name]
      • bootStrap() - Re-bootstrap the client with new metadata
  • voldemort.cluster.failuredetector - Same as on server side
  • voldemort.store.socket.clientrequest
    • ClientRequestExecutorPool
      • getNumberSocketsCreated() - Number of sockets created by the pool
      • getNumberSocketsDestroyed() - Number of sockets destroyed by the pool
      • getNumberOfActiveConnections() - Number of active connections as of now
      • getNumberOfCheckedInConnections() - We maintain a resource pool of connections. How many did we check back in?
      • getAvgWaitTimeMs() - Average time it took to check-out connection from resource pool
      • setMonitoringInterval ( num_checkouts ) - The number of checkouts over which performance statistics are calculated.
  • voldemort.store.stats [ Same as server ]
  • voldemort.store.stats.aggregate [ Same as server ]