Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Added
- Options to build static dashboards

## Changed
- Set default Prometheus job to `tarantool`
- Set default InfluxDB measurement to `tarantool_http`


## [1.3.0] - 2022-06-29
Grafana revisions: [InfluxDB revision 13](https://grafana.com/api/dashboards/12567/revisions/13/download), [Prometheus revision 13](https://grafana.com/api/dashboards/13054/revisions/13/download), [InfluxDB TDG revision 2](https://grafana.com/api/dashboards/16405/revisions/2/download), [Prometheus TDG revision 2](https://grafana.com/api/dashboards/16406/revisions/2/download).
Expand Down
12 changes: 4 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
JOB ?= tarantool
RATE_TIME_RANGE ?= 2m
POLICY ?= autogen
MEASUREMENT ?= tarantool_http
OUTPUT_STATIC_DASHBOARD ?= dashboard.json

.PHONY: build-deps
Expand All @@ -14,10 +16,7 @@ ifndef DATASOURCE
@echo 1>&2 "DATASOURCE must be set"
false
endif
ifndef JOB
@echo 1>&2 "JOB must be set"
false
endif
# JOB is optional, default is "tarantool"
# RATE_TIME_RANGE is optional, default is "2m"
jsonnet -J ./vendor -J . \
--ext-str DATASOURCE=${DATASOURCE} \
Expand All @@ -40,10 +39,7 @@ ifndef DATASOURCE
false
endif
# POLICY is optional, default is "autogen"
ifndef MEASUREMENT
@echo 1>&2 "MEASUREMENT must be set"
false
endif
# MEASUREMENT is optional, default is "tarantool_http"
jsonnet -J ./vendor -J . \
--ext-str DATASOURCE=${DATASOURCE} \
--ext-str POLICY=${POLICY} \
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,12 @@ You can also interact with Prometheus at [localhost:9090](http://localhost:9090/

To set up an InfluxDB dashboard for monitoring example app, use the following variables:

- `Measurement`: `tarantool_app_http`;
- `Measurement`: `tarantool_http`;
- `Policy`: `default`.

To set up an Prometheus dashboard for monitoring example app, use the following variables:

- `Job`: `tarantool_app`;
- `Job`: `tarantool`;
- `Rate time range`: `2m`.

### Monitoring local app
Expand Down Expand Up @@ -116,7 +116,7 @@ to install build dependencies and dependencies that are required to run tests lo

To build a static dashboard with no input and dynamic variables, run `make` commands.
```bash
make DATASOURCE=MyPrometheus JOB=MyApp \
make DATASOURCE=Prometheus JOB=tarantool \
OUTPUT_STATIC_DASHBOARD=mydashboard.json build-static-prometheus
```
Following targets are available:
Expand All @@ -127,14 +127,14 @@ Following targets are available:

Variables for Prometheus targets:
- `DATASOURCE`: name of a Prometheus data source;
- `JOB`: name of a Prometheus job collecting your application metrics;
- `JOB` (optional, default `tarantool`): name of a Prometheus job collecting your application metrics;
- `RATE_TIME_RANGE` (optional, default `2m`): rps computation rate time range;
- `OUTPUT_STATIC_DASHBOARD` (optional, default `dashboard.json`): compiled dashboard file.

Variables for InfluxDB targets:
- `DATASOURCE`: name of a InfluxDB data source;
- `POLICY` (optional, default `autogen`): InfluxDB metrics retention policy;
- `MEASUREMENT`: name of a InfluxDB measurement with your application metrics;
- `MEASUREMENT` (optional, default `tarantool_http`): name of a InfluxDB measurement with your application metrics;
- `OUTPUT_STATIC_DASHBOARD` (optional, default `dashboard.json`): compiled dashboard file.

You can also compile configurable Prometheus dashboard template (the same we publish to
Expand Down
1 change: 1 addition & 0 deletions dashboard/build/influxdb/dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ dashboard_raw(
name='INFLUXDB_MEASUREMENT',
label='Measurement',
type='constant',
value='tarantool_http',
description='InfluxDB Tarantool metrics measurement'
).addInput(
name='INFLUXDB_POLICY',
Expand Down
1 change: 1 addition & 0 deletions dashboard/build/influxdb/tdg_dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ tdg_dashboard_raw(
name='INFLUXDB_MEASUREMENT',
label='Measurement',
type='constant',
value='tarantool_http',
description='InfluxDB Tarantool metrics measurement'
).addInput(
name='INFLUXDB_POLICY',
Expand Down
3 changes: 1 addition & 2 deletions dashboard/build/prometheus/dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ dashboard_raw(
name='PROMETHEUS_JOB',
label='Job',
type='constant',
pluginId=null,
pluginName=null,
value='tarantool',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this is the default value for a user input? Does it make sense to change it too with the JOB value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templating dashboards are built as part of tests, they do not have separate make tests and are built with libsonnet module imports.

description='Prometheus Tarantool metrics job'
).addInput(
name='PROMETHEUS_RATE_TIME_RANGE',
Expand Down
3 changes: 1 addition & 2 deletions dashboard/build/prometheus/tdg_dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ tdg_dashboard_raw(
name='PROMETHEUS_JOB',
label='Job',
type='constant',
pluginId=null,
pluginName=null,
value='tarantool',
description='Prometheus Tarantool metrics job'
).addInput(
name='PROMETHEUS_RATE_TIME_RANGE',
Expand Down
8 changes: 4 additions & 4 deletions doc/monitoring/grafana_dashboard.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ metrics path as it was configured on Tarantool instances:
.. code-block:: yaml

scrape_configs:
- job_name: "example_project"
- job_name: tarantool
static_configs:
- targets:
- "example_project:8081"
Expand Down Expand Up @@ -123,7 +123,7 @@ to Telegraf configuration including each Tarantool instance metrics URL:
insecure_skip_verify = true
interval = "10s"
data_format = "json"
name_prefix = "example_project_"
name_prefix = "tarantool_"
fieldpass = ["value"]

Be sure to include each label key as ``label_pairs_<key>`` so it will be
Expand Down Expand Up @@ -172,11 +172,11 @@ For TDG dashboard, please use
insecure_skip_verify = true
interval = "10s"
data_format = "json"
name_prefix = "example_project_"
name_prefix = "tarantool_"
fieldpass = ["value"]

If you connect Telegraf instance to InfluxDB storage, metrics will be stored
with ``"<name_prefix>http"`` measurement (``"example_project_http"`` in our example).
with ``"<name_prefix>http"`` measurement (``"tarantool_http"`` in our example).

.. _monitoring-grafana_dashboard-import:

Expand Down
26 changes: 13 additions & 13 deletions example_cluster/prometheus/alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ groups:
rules:
# Alert for CRUD module request errors.
- alert: CRUDHighErrorRate
expr: rate(tnt_crud_stats_count{ job="tarantool_app", status="error" }[5m]) > 0.1
expr: rate(tnt_crud_stats_count{ job="tarantool", status="error" }[5m]) > 0.1
for: 1m
labels:
severity: critical
Expand All @@ -190,7 +190,7 @@ groups:

# Warning for CRUD module requests too long responses.
- alert: CRUDHighLatency
expr: tnt_crud_stats{ job="tarantool_app", quantile="0.99" } > 0.1
expr: tnt_crud_stats{ job="tarantool", quantile="0.99" } > 0.1
for: 1m
labels:
severity: warning
Expand All @@ -201,7 +201,7 @@ groups:

# Warning for too many map reduce CRUD module requests.
- alert: CRUDHighMapReduceRate
expr: rate(tnt_crud_map_reduces{ job="tarantool_app" }[5m]) > 0.1
expr: rate(tnt_crud_map_reduces{ job="tarantool" }[5m]) > 0.1
for: 1m
labels:
severity: warning
Expand All @@ -214,12 +214,12 @@ groups:

- name: tarantool-business
rules:
# Warning for any endpoint of an instance in tarantool_app job that responds too long.
# Warning for any endpoint of an instance in tarantool job that responds too long.
# Beware that metric name depends on name of the collector you use in HTTP metrics middleware
# and request depends on type of this collector.
# This example based on summary collector with default name.
- alert: HTTPHighLatency
expr: http_server_request_latency{ job="tarantool_app", quantile="0.99" } > 0.1
expr: http_server_request_latency{ job="tarantool", quantile="0.99" } > 0.1
for: 5m
labels:
severity: warning
Expand All @@ -228,12 +228,12 @@ groups:
description: "Some {{ $labels.method }} requests to {{ $labels.path }} path with {{ $labels.status }} response status
on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' are processed too long."

# Warning for any endpoint of an instance in tarantool_app job that sends too much 4xx responses.
# Warning for any endpoint of an instance in tarantool job that sends too much 4xx responses.
# Beware that metric name depends on name of the collector you use in HTTP metrics middleware
# and request depends on type of this collector.
# This example based on summary collector with default name.
- alert: HTTPHighClientErrorRateInstance
expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool_app", status=~"^4\\d{2}$" }[5m])) > 10
expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool", status=~"^4\\d{2}$" }[5m])) > 10
for: 1m
labels:
severity: warning
Expand All @@ -242,12 +242,12 @@ groups:
description: "Too many {{ $labels.method }} requests to {{ $labels.path }} path
on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' get client error (4xx) responses."

# Warning for any endpoint in tarantool_app job that sends too much 4xx responses (cluster overall).
# Warning for any endpoint in tarantool job that sends too much 4xx responses (cluster overall).
# Beware that metric name depends on name of the collector you use in HTTP metrics middleware
# and request depends on type of this collector.
# This example based on summary collector with default name.
- alert: HTTPHighClientErrorRate
expr: sum by (job, method, path) (rate(http_server_request_latency_count{ job="tarantool_app", status=~"^4\\d{2}$" }[5m])) > 20
expr: sum by (job, method, path) (rate(http_server_request_latency_count{ job="tarantool", status=~"^4\\d{2}$" }[5m])) > 20
for: 1m
labels:
severity: warning
Expand All @@ -256,12 +256,12 @@ groups:
description: "Too many {{ $labels.method }} requests to {{ $labels.path }} path
on instances of job '{{ $labels.job }}' get client error (4xx) responses."

# Warning for any endpoint of an instance in tarantool_app job that sends 5xx responses.
# Warning for any endpoint of an instance in tarantool job that sends 5xx responses.
# Beware that metric name depends on name of the collector you use in HTTP metrics middleware
# and request depends on type of this collector.
# This example based on summary collector with default name.
- alert: HTTPServerErrors
expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool_app", status=~"^5\\d{2}$" }[5m])) > 0
expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool", status=~"^5\\d{2}$" }[5m])) > 0
for: 1m
labels:
severity: warning
Expand All @@ -270,12 +270,12 @@ groups:
description: "Some {{ $labels.method }} requests to {{ $labels.path }} path
on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' get server error (5xx) responses."

# Warning for any endpoint of a router instance (with "router" in alias) in tarantool_app job that gets too little requests.
# Warning for any endpoint of a router instance (with "router" in alias) in tarantool job that gets too little requests.
# Beware that metric name depends on name of the collector you use in HTTP metrics middleware
# and request depends on type of this collector.
# This example based on summary collector with default name.
- alert: HTTPLowRequestRateRouter
expr: sum by (job, instance, alias) (rate(http_server_request_latency_count{ job="tarantool_app", alias=~"^.*router.*$" }[5m])) < 10
expr: sum by (job, instance, alias) (rate(http_server_request_latency_count{ job="tarantool", alias=~"^.*router.*$" }[5m])) < 10
for: 5m
labels:
severity: warning
Expand Down
2 changes: 1 addition & 1 deletion example_cluster/prometheus/prometheus.localapp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ scrape_configs:
static_configs:
- targets: ["localhost:9090"]

- job_name: "tarantool_app"
- job_name: "tarantool"
static_configs:
- targets:
- "host.docker.internal:8081"
Expand Down
2 changes: 1 addition & 1 deletion example_cluster/prometheus/prometheus.tdg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ scrape_configs:
static_configs:
- targets: ["localhost:9090"]

- job_name: "tarantool_app"
- job_name: "tarantool"
static_configs:
- targets:
- "tdg:8080"
Expand Down
2 changes: 1 addition & 1 deletion example_cluster/prometheus/prometheus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ scrape_configs:
static_configs:
- targets: ["localhost:9090"]

- job_name: "tarantool_app"
- job_name: "tarantool"
static_configs:
- targets:
- "app:8081"
Expand Down
Loading