Skip to content
Anton Matvienko edited this page Jan 10, 2024 · 9 revisions

Health Monitor

TempestaFW has health monitoring ability, which allows to measure the health of backend servers in sense of HTTP availability. If health monitor is enabled for the server and such server produces a large number of "bad" responses (with undesirable HTTP statuses) beyond a given limit in certain timeout, it will be excluded from scheduling of client requests until new "good" responses (with separately specified conditions) will be received from it. Limits, timeouts and "bad" HTTP statuses are set via special directive. The syntax is as follows:

server_failover_http <status> <count> <timeout>;
  • <status> is HTTP status (or wildcard pattern) of response.
  • <count> is a limit of responses.
  • <timeout> is a timeframe in seconds.

This directive applies to all servers for which the monitor is enabled. Directive may be repeated to configure monitoring of several HTTP statuses.

The health monitor itself is configured via following section:

health_check <name> {
	request		<req_string>;
	request_url	<url_string>;
	resp_code	<codes>;
	resp_crc32	<crc32>;
	timeout		<timeout>;
}
  • <name> is a unique identifier of health monitor.
  • <req_string> is a string containing the health monitoring request; default value is "GET / HTTP/1.0\r\n\r\n".
  • <url_string> is a string with URL; client requests with this URL will be used as health monitoring requests; default value is "/".
  • <codes> is a list of space separated HTTP statuses.
  • <crc32> is a hex number - calculated CRC32 checksum for expected response body (the value along with value defines conditions for "good" responses, which will signal that server is alive); also keyword auto can be specified instead of hex number - this means that no crc32 verification is required (the same as the absence of resp_crc32 directive); user can generate CRC32 checksum via Linux utility crc32, which is a part of libarchive-zip-perl package.
  • <timeout> is a timeout in seconds after which new health monitoring request (specified in request directive) will be send to backend server (if there were no client requests satisfying condition, given in request_url directive).

Administrator must configure either resp_code or resp_crc32 (or both directives) with explicit values (not auto). The health_check section may be repeated to configure several health monitors in TempestaFW. Default health monitor with name auto is always present in TempestaFW. Its configuration is given below:

health_check auto {
	request		"GET / HTTP/1.0\r\n\r\n";
	request_url	"/";
	resp_code	200;
	resp_crc32	auto;
	timeout		10;
}

Auto monitor can be explicitly redefined (with name auto) by administrator with custom settings - in this case default auto monitor is not created. It is also important to note that keyword auto in resp_crc32 directive has special meaning for auto monitor (implicitly or explicitly defined): it means that the crc32 value will be generated on the fly from the first received response and used to verify the crc32 values of subsequent responses. Health monitor is specified for separate server groups (explicit or implicit), and for such groups a monitor with specific ID is enabled. This means, that for servers from such groups - all directives 'server_failover_http' and section 'health_check' (with corresponding ID) are applied. To specify particular health monitor for server group - special directive exists inside srv_group section:

health <id>;
  • <id> is a health monitor identifier.

Following example demonstrates how to apply health monitor h_monitor1 to server group main with several HTTP statuses monitoring:

server_failover_http 404 300 15;
server_failover_http 500 300 10;
server_failover_http 502 100 5;

health_check h_monitor1 {
	request		"GET / HTTP/1.0\r\n\r\n";
	request_url	"/root/";
	resp_code	200;
	resp_crc32	0x71f21b41;
	timeout		10;
}

srv_group main {
	server 10.10.0.1:8080;
	server 10.10.0.2:8080;

	health h_monitor1;
}

Health statistics directives

health_stat <statuses>;
  • <statuses> is a list of space separated HTTP statuses (or wildcard patterns).

Example:

health_stat 400 5*;

Total count of responses from Tempesta for each specified HTTP status. It includes responses directly from servers as well as from cache. Displayed in Performance statistics.

health_stat_server <statuses>;
  • <statuses> is a list of space separated HTTP statuses (or wildcard pattern).

Example:

health_stat_server 400 5*;

Total count of responses from servers for each specified HTTP status. Only responses directly from servers are considered; responses from the cache are ignored. Displayed in Servers statistics.

The 200 HTTP status is always monitored, regardless of whether it is specified in the directive.

Also note that enabling the server_failover_http directive automatically includes counting responses from servers. Therefore, for those HTTP statuses for which server_failover_http is enabled, there is no need to enable health_stat_server.

Clone this wiki locally