Skip to content

Commit

Permalink
finagle-docs: Fixes for metrics section of the user guide
Browse files Browse the repository at this point in the history
A wide ranging cleanup, mostly in terms of formatting, links, and
consistency.

RB_ID=779711
  • Loading branch information
kevinoliver authored and jenkins committed Dec 21, 2015
1 parent b3fc368 commit 4acff04
Show file tree
Hide file tree
Showing 16 changed files with 242 additions and 201 deletions.
71 changes: 38 additions & 33 deletions doc/src/sphinx/Metrics.rst
@@ -1,16 +1,18 @@
Metrics
=======

This section aims to be a comprehensive list of all of the stats that finagle
exposes. The stats are organized by layer and then by class.
This section aims to be a comprehensive list of all of the metrics that Finagle
exposes. The metrics are organized by layer and then by class.

Some of the stats are only for clients, some only for servers, and some are for both.
Some stats are only visible when certain optional classes are used.

NB: Finagle uses RollupStatsReceiver internally, which will take stats like
"failures/twitter/TimeoutException" and roll them up, aggregating into "failures/twitter"
and also "failures". For example, if there are 3 "failures/twitter/TimeoutException" counted,
and 4 "failures/twitter/ConnectTimeoutException", then it will cound 7 "failures/twitter".
NB: Finagle sometimes uses ``RollupStatsReceivers`` internally, which will take
stats like "failures/twitter/TimeoutException" and roll them up, aggregating
into "failures/twitter" and also "failures". For example, if there are 3
"failures/twitter/TimeoutException" counted, and 4
"failures/twitter/ConnectTimeoutException", then it will count 7 for
"failures/twitter".

Public
------
Expand All @@ -28,7 +30,7 @@ Construction

.. _construction_stats:

These stats are about setting up services in finagle, and expose whether you are
These stats are about setting up services in Finagle, and expose whether you are
having trouble making services.

.. include:: metrics/Construction.rst
Expand All @@ -42,15 +44,6 @@ These metrics track various Finagle internals.

.. include:: metrics/Finagle.rst

Service Discovery
-----------------

.. _service_discovery:

These metrics track the state of name resolution and service discovery.

.. include:: metrics/ServiceDiscovery.rst

Load Balancing
--------------

Expand All @@ -68,7 +61,7 @@ Fail Fast
.. _fail_fast_stats:

The client stats under the `failfast` scope give insight into how
finagle handles services where it can't make a connection.
Finagle handles services where it can't establish a connection.

.. include:: metrics/FailFast.rst

Expand All @@ -77,10 +70,11 @@ Failure Accrual

.. _failure_accrual_stats:

The client stats under the `failure_accrual` scope track how `FailureAccrualFactory`
The client stats under the `failure_accrual` scope track how
:src:`FailureAccrualFactory <com/twitter/finagle/service/FailureAccrualFactory.scala>`
manages failures.

.. include: metrics/FailureAccrual.rst
.. include:: metrics/FailureAccrual.rst

Idle Apoptosis
--------------
Expand Down Expand Up @@ -136,17 +130,6 @@ queueing rules.
a gauge used by pipelining dispatchers that represents how many
pipelined requests are currently outstanding.

Transport
---------

.. _transport_stats:

These metrics pertain to where the Finagle abstraction ends and the bytes are sent over the wire.
Understanding these stats often requires deep knowledge of the protocol, or individual transport
(e.g. Netty) internals.

.. include:: metrics/Transport.rst

Admission Control
-----------------

Expand All @@ -162,7 +145,7 @@ Mux

.. _mux_stats:

These stats pertain to :ref:`Mux <mux>`.
These stats pertain to the :ref:`Mux <mux>` protocol.

.. include:: metrics/Mux.rst

Expand All @@ -171,7 +154,29 @@ Threshold Failure Detector

.. _failure_detector:

The client metrics under the `mux/failuredetector` scope track the behavior of out-of-band ping
based failure detection. They only apply to the mux protocol.
The client metrics under the `mux/failuredetector` scope track the behavior of
out-of-band RTT-based failure detection. They only apply to the mux
protocol.

.. include:: metrics/FailureDetector.rst

Transport
---------

.. _transport_stats:

These metrics pertain to where the Finagle abstraction ends and the bytes are sent over the wire.
Understanding these stats often requires deep knowledge of the protocol, or individual transport
(e.g. Netty) internals.

.. include:: metrics/Transport.rst


Service Discovery
-----------------

.. _service_discovery:

These metrics track the state of name resolution and service discovery.

.. include:: metrics/ServiceDiscovery.rst
6 changes: 3 additions & 3 deletions doc/src/sphinx/metrics/AdmissionControl.rst
Expand Up @@ -3,11 +3,11 @@ Deadline Admission Control

**admission_control/deadline/exceeded**
A counter of the number of requests whose deadline has expired, where the
elapsed time since expiry is within the configured tolerance
elapsed time since expiry is within the configured tolerance.

**admission_control/deadline/exceeded_beyond_tolerance**
A counter of the number of requests whose deadline has expired, where the
elapsed time since expiry is beyond the configured tolerance
elapsed time since expiry is beyond the configured tolerance.

**admission_control/deadline/rejected**
A counter of the number of requests rejected for being past their deadline
A counter of the number of requests rejected for being past their deadline.
9 changes: 5 additions & 4 deletions doc/src/sphinx/metrics/Construction.rst
Expand Up @@ -2,12 +2,13 @@ ClientBuilder
<<<<<<<<<<<<<

**codec_connection_preparation_latency_ms**
a histogram of the length of time it takes to prepare a connection and get back a service,
regardless of success or failure
A histogram of the length of time it takes to prepare a connection and get
back a service, regardless of success or failure.

StatsServiceFactory
<<<<<<<<<<<<<<<<<<<

**available**
a gauge of whether the underlying factory is available (1) or not (0). finagle uses this
primarily to decide whether a host is available to make new connections to in the load balancer.
A gauge of whether the underlying factory is available (1) or not (0).
Finagle uses this primarily to decide whether a host is eligible for new
connections in the load balancer.
9 changes: 6 additions & 3 deletions doc/src/sphinx/metrics/FailFast.rst
Expand Up @@ -2,10 +2,13 @@ FailFastFactory
<<<<<<<<<<<<<<<

**marked_dead**
a counter of how many times the `FailFastFactory` has been marked dead
A counter of how many times the host has been marked dead due to connection
problems.

**unhealthy_for_ms**
a gauge of how long the `FailFastFactory` has been retrying for this failure
A gauge of how long, in milliseconds, Finagle has been trying to reestablish
a connection.

**unhealthy_num_tries**
a gauge of the number of times the `FailFastFactory` has retried for this failure
A gauge of the number of times the Factory has tried to reestablish a
connection.
4 changes: 2 additions & 2 deletions doc/src/sphinx/metrics/FailureAccrual.rst
Expand Up @@ -2,12 +2,12 @@ FailureAccrualFactory
<<<<<<<<<<<<<<<<<<<<<

**removals**
a count of how many times any host has been removed due to failure
A count of how many times any host has been removed due to failure
accrual. Note that there is no specificity on which host in the
cluster has been removed, so a high value here could be one
problem-child or aggregate problems across all hosts.

**revivals**
a count of how many times a previously-removed host has been
A count of how many times a previously-removed host has been
reactivated after the penalty period has elapsed.

10 changes: 5 additions & 5 deletions doc/src/sphinx/metrics/FailureDetector.rst
Expand Up @@ -2,16 +2,16 @@ ThresholdFailureDetector
<<<<<<<<<<<<<<<<<<<<<<<<

**ping**
a counter of the number of pings sent to remote peers
A counter of the number of pings sent to remote peers.

**ping_latency_us**
a stat of round trip ping latencies in microseconds
A stat of round trip ping latencies in microseconds.

**marked_busy**
a counter of the number of times the endpoints are marked busy
A counter of the number of times the endpoints are marked busy.

**revivals**
a counter of the number of times the endpoints revive
A counter of the number of times the endpoints revive.

**close**
a counter of the number of endpoints that are closed
A counter of the number of endpoints that are closed.
11 changes: 7 additions & 4 deletions doc/src/sphinx/metrics/Finagle.rst
Expand Up @@ -2,14 +2,17 @@ Timer
<<<<<

**finagle/timer/pending_tasks**
a stat of the number of pending tasks to run for ``DefaultTimer.twitter``
A stat of the number of pending tasks to run for
:src:`HashedWheelTimer.Default <com/twitter/finagle/util/HashedWheelTimer.scala>`.

**finagle/timer/deviation_ms**
a stat of the deviation in milliseconds of tasks scheduled on
``DefaultTimer.twitter`` from their expected time.
A stat of the deviation in milliseconds of tasks scheduled on
:src:`HashedWheelTimer.Default <com/twitter/finagle/util/HashedWheelTimer.scala>`
from their expected time.

ClientRegistry
<<<<<<<<<<<<<<

**finagle/clientregistry/size**
a gauge of the current number of clients in the ``ClientRegistry``
A gauge of the current number of clients registered in the
:src:`HashedWheelTimer.Default <com/twitter/finagle/client/ClientRegistry.scala>`.
7 changes: 4 additions & 3 deletions doc/src/sphinx/metrics/IdleApoptosis.rst
Expand Up @@ -2,8 +2,9 @@ ExpiringService
<<<<<<<<<<<<<<<

**idle**
a counter of the number of times the service has expired from staying idle for too long
in between requests
A counter of the number of times the service has expired from staying idle
for too long in between requests.

**lifetime**
a counter of the number of times the service has exceeded its lifetime expiration duration
A counter of the number of times the service has exceeded its lifetime
expiration duration.
14 changes: 7 additions & 7 deletions doc/src/sphinx/metrics/LoadBalancing.rst
Expand Up @@ -2,7 +2,7 @@ All Balancers
<<<<<<<<<<<<<

**size**
A gauge of the number of nodes being balanced across
A gauge of the number of nodes being balanced across.

**available**
A gauge of the number of *available* nodes as seen by the load balancer.
Expand All @@ -17,22 +17,22 @@ All Balancers
These nodes will never be available for service.

**load**
A gauge of the total load over all nodes being balanced across
A gauge of the total load over all nodes being balanced across.

**meanweight**
A gauge tracking the arithmetic mean of the weights of the endpoints
being load-balanced across. Does not apply to
`com.twitter.finagle.loadbalancer.HeapBalancer`.
:src:`HeapBalancer <com/twitter/finagle/loadbalancer/HeapBalancer.scala>`.

**adds**
A counter of the number of hosts added to the loadbalancer
A counter of the number of hosts added to the loadbalancer.

**removes**
A counter of the number of hosts removed from the loadbalancer
A counter of the number of hosts removed from the loadbalancer.

**max_effort_exhausted**
A counter of the number of times a balancer failed to find a node that was
`Status.Open` within `com.twitter.finagle.loadbalancer.Balancer.maxEffort`
``Status.Open`` within ``com.twitter.finagle.loadbalancer.Balancer.maxEffort``
attempts. When this occurs, a non-open node may be selected for that
request.

Expand All @@ -41,4 +41,4 @@ ApertureLoadBandBalancer

**aperture**
A gauge of the width of the window over which endpoints are
load-balanced
load-balanced.
19 changes: 11 additions & 8 deletions doc/src/sphinx/metrics/Mux.rst
@@ -1,18 +1,21 @@
**<server_label>/mux/draining**
the number of times the server has initiated session draining
A counter of the number of times the server has initiated session draining.

**<server_label>/mux/drained**
the number of times the server has successfully completed the draining protocol within its allotted time
A counter of the number of times the server has successfully completed the
draining protocol within its allotted time.

**<client_label>/mux/draining**
the number of times a server initiated session draining
A counter of the number of times a server initiated session draining.

**<client_label>/mux/drained**
the number of times server-initiated draining completed successfully

A counter of the number of times server-initiated draining completed
successfully.

**clienthangup**
the number of times sessions have been abruptly terminated by the client
A counter of the number of times sessions have been abruptly terminated by
the client.

**serverhangup**
the number of times sessions have been abruptly terminated by the server

A counter of the number of times sessions have been abruptly terminated by
the server.
19 changes: 12 additions & 7 deletions doc/src/sphinx/metrics/Pooling.rst
Expand Up @@ -2,28 +2,33 @@ CachingPool
<<<<<<<<<<<

**pool_cached**
a gauge of the number of connections cached in the idle Cache
A gauge of the number of connections cached.

WatermarkPool
<<<<<<<<<<<<<

**pool_waiters**
a gauge of the number of clients waiting on connections
A gauge of the number of clients waiting on connections.

**pool_size**
a gauge of the number of connections that are currently alive, either in use or not
A gauge of the number of connections that are currently alive, either in use
or not.

**pool_num_waited**
a counter of the number of times there were no connections immediately available and the client waited for a connection
A counter of the number of times there were no connections immediately
available and the client waited for a connection.

**pool_num_too_many_waiters**
a counter of the number of times there were no connections immediately available and there were already too many waiters
A counter of the number of times there were no connections immediately
available and there were already too many waiters.

SingletonPool
<<<<<<<<<<<<<

**conn/fail**
a counter of the number of times the connection could not be established and must be retried
A counter of the number of times the connection could not be established and
must be retried.

**conn/dead**
a counter of the number of times the connection succeeded once, but later died and must be retried
A counter of the number of times the connection succeeded once, but later
died and must be retried.

0 comments on commit 4acff04

Please sign in to comment.