This repository is a result of some learning and investigation I performed into some technologies that were mostly new to me, namely:
The focus is on Prometheus and Grafana working in combination with each other to display metrics in dashboards.
The code in this project is meant to be deployed onto a Docker Swarm mode cluster that was previously deployed using tonyskidmore/docker-swarm. Although that Docker swarm deployment here is hosted on a Windows 10 system running Vagrant and a few other requirements, the code in this repo can be deployed directly from a Linux system also. Although not covered in detail here it was originally created and tested in a purely Linux environment.
In this lab style environment it is intended that each service is deployed to the Docker Swarm cluster as stacks by a series of separate docker compose files. These are layered with the intention of taking a step at a time to review the different applications as they are deployed onto the cluster. The series in which these are intended to be deployed are:
- Consul
- RabbitMQ
- HAProxy
- Java app
- Prometheus and Grafana
- Messaging publishing service
- Messaging consumer service
Content published by Ahmet Vehbi Olgaç and Marcel Dempers and courses on Pluralsight by Elton Stoneman really helped me a lot as the basis of the content of this repo (see section the References).
Note:
None of the deployments are meant to describe how these products should be deployed in a Production or any other type of environment. The aim is just to show from a high-level perspective how these products work and how some of them can be monitored.
The layout above represents a high-level view of the services that will be deployed and the externally accessible ports from the Windows host system. It also shows the monitoring scraping communication lines from Prometheus to: Docker Swarm nodes, RabbitMQ cluster nodes and the Java application. Grafana is shown using Prometheus as a data source.
The first step is to ensure that the Docker Swarm mode cluster has been deployed as per the tonyskidmore/docker-swarm project. A differently deployed Docker swarm cluster can be used if desired but references to the specifics of that cluster will need to be adjusted.
If you are following along with the initial cluster deployment and then moving here perform the following steps on you Windows 10 host:
cd \vagrant
git clone https://github.com/tonyskidmore/docker-swarm-monitoring.git
cd docker-swarm-monitoring
The default Docker Swarm mode cluster has been deployed in such a way that access to it from the Windows host is made via unencrypted communication to to the Docker Swarm manager node (192.168.217.133:2375
). It is necessary to set environment variables in any PowerShell session so this works:
$env:DOCKER_HOST="192.168.217.133:2375"
$env:DOCKER_TLS_VERIFY=""
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
qmh74hjo5djwwzqa7jkyknisb * docker-swarm-01 Ready Active Leader 19.03.13
fu8kqvcxfuficfi1ld0owy0l6 docker-swarm-02 Ready Active 19.03.13
ydqqjvpshir1nhss6surd11kr docker-swarm-03 Ready Active 19.03.13
If your output is similar to the above then you are good to go.
The Start-SwarmApps.ps1 PowerShell script can be used interactively or ran from a PowerShell prompt on the Windows 10 host to drive all of the deployments. The script needs some refactoring but for now it has worked in testing. Further below are the more individual and generic steps that can be performed to deploy the application stacks. There are some tidbits amongst the more detailed steps along with some further descriptions. Be sure to review the docker-compose-*.yml
files being deployed along with any associated product configuration files.
If you wanted to just deploy everything in one go then you can execute the following:
cd c:\vagrant\docker-swarm-monitoring
.\Start-SwarmApps.ps1
If there are no issues then the script should deploy all of the services shown in the Services layout section, opening a browser to each as it goes. Click on the script window to refocus on the script. You can run with the -OpenBrowser $false
parameter to avoid the browser functionality if desired.
To deploy only a subset of services the script can be run with the -DeployList
with a CSV separated list of services. For example, to deploy only as far as HAProxy you could run:
cd c:\vagrant\docker-swarm-monitoring
.\Start-SwarmApps.ps1 -DeployList "consul,rabbitmq,haproxy"
Note:
PowerShell script execution has to be allowed on your system to be able to run scripts. To allow script execution, if not already enabled, run PowerShell as Administrator and run the following prior to running the instructions above:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Force
The services being deployed in this repo are targeting a network called test
so this to be created first:
docker network create --driver=overlay --attachable test
c6bzyhnfpcapbrnjtydznmsvc
docker network ls
NETWORK ID NAME DRIVER SCOPE
0a65fa8b26c5 bridge bridge local
d8191f32e336 docker_gwbridge bridge local
28c6d06a3dc9 host host local
vfy3m29793sq ingress overlay swarm
3bce52d11586 none null local
c6bzyhnfpcap test overlay swarm
The first service to be deployed is Consul. This will provide the service discovery for the RabbitMQ cluster.
Assuming that the host you are running the docker commands on has the required binaries installed (e.g. the Windows host with Docker Toolbox installed) and the required environment variables set (see Windows host PowerShell access) then to deploy the Consul stack change to the location of this cloned repository and run:
docker stack deploy -c docker-compose-consul.yml consul
After deployment check that the service was deployed checking that the commands below contain something similar to the example output below:
docker stack services consul
ID NAME MODE REPLICAS IMAGE PORTS
ID NAME MODE REPLICAS IMAGE PORTS
2wamsz17spvs consul_consul global 3/3 consul:1.7.9 *:8400->8400/tcp, *:8500->8500/tcp, *:8600->8600/tcp
docker stack ps consul
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
qviwseufg9f6 consul_consul.fu8kqvcxfuficfi1ld0owy0l6 consul:1.7.9 docker-swarm-02 Running Running about a minute ago
tr79t27j23rj consul_consul.qmh74hjo5djwwzqa7jkyknisb consul:1.7.9 docker-swarm-01 Running Running about a minute ago
yl6xbs12px2l consul_consul.ydqqjvpshir1nhss6surd11kr consul:1.7.9 docker-swarm-03 Running Running about a minute ago
You can validate the status of the Consul cluster status from the command line:
# check status of leader election
(Invoke-WebRequest -Uri 192.168.217.133:8500/v1/status/leader -UseBasicParsing).Content
"10.0.2.4:8300"
# verify active peers
(Invoke-WebRequest -Uri 192.168.217.133:8500/v1/status/peers -UseBasicParsing).Content
["10.0.2.5:8300","10.0.2.4:8300","10.0.2.3:8300"]
or in bash:
# check status of leader election
curl 192.168.217.133:8500/v1/status/leader
"10.0.2.4:8300"
# verify active peers
curl 192.168.217.133:8500/v1/status/peers
["10.0.2.4:8300","10.0.2.5:8300","10.0.2.3:8300"]
You should now be able to access the Consul UI from the Windows host via http://192.168.217.133:8500/. The Consul Service should appear under the Services tab, all with successful Health Checks. Under Nodes all nodes should be listed under Healthy Nodes, with a star badge indicating the leader.
The first thing to mention prior to moving onto deploying the RabbitMQ cluster is mention Monitoring with Prometheus & Grafana. This is excellent information detailing all aspects of monitoring Rabbit MQ with Prometheus. The Quick Start is highly recommended to run through to see what is possible with RabbitMQ, Prometheus and Grafana. We have taken the RabbitMQ-Overview
dashboard from that repository and included it as part of this project to demonstrate how a dashboard can be automatically deployed. I would also point you in the direction of the session hosted by Gerhard Lazu & Michal Kuratczyk from the RabbitMQ Summit 2019 (see References).
Following on from the previous Consul deployment we will now deploy the RabbitMQ cluster. To begin with we will label the Docker Swarm nodes so that placement occurs as desired. For example, the rabbitmq-01
configuration will be deployed to the node with label rabbitmq1
. We can label the nodes and validate that these have been applied as follows:
# label each node for placement rule
1..3 | % { & docker node update --label-add "rabbitmq$_=true" "docker-swarm-0$_" }
docker-swarm-01
docker-swarm-02
docker-swarm-03
# check that labels have been applied as expected, in order
1..3 | % { (docker inspect "docker-swarm-0$_" | ConvertFrom-Json).Spec.Labels | ConvertTo-Json }
{
"rabbitmq1": "true"
}
{
"rabbitmq2": "true"
}
{
"rabbitmq3": "true"
}
Now that the labels have been allocated for placement we deploy the RabbitMQ stack:
docker stack deploy -c docker-compose-rabbitmq.yml rabbitmq
We can validate validate the deployment. This can take a few moments for the expected 1/1 REPLICAS to be displayed so retry the command until that state is in effect:
docker stack services rabbitmq
ID NAME MODE REPLICAS IMAGE PORTS
998kdxiqwt4z rabbitmq_rabbitmq-02 global 1/1 rabbitmq:3.8.9-management
rg4ffk5syvxm rabbitmq_rabbitmq-01 global 1/1 rabbitmq:3.8.9-management
sljwr1n8abfj rabbitmq_rabbitmq-03 global 1/1 rabbitmq:3.8.9-management
docker stack ps rabbitmq
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
5l1afiv1o1uu rabbitmq_rabbitmq-02.fu8kqvcxfuficfi1ld0owy0l6 rabbitmq:3.8.9-management docker-swarm-02 Running Running about a minute ago
2khcfvfivmz0 rabbitmq_rabbitmq-01.qmh74hjo5djwwzqa7jkyknisb rabbitmq:3.8.9-management docker-swarm-01 Running Running about a minute ago
vujetjx6muqf rabbitmq_rabbitmq-03.ydqqjvpshir1nhss6surd11kr rabbitmq:3.8.9-management docker-swarm-03 Running Running about a minute ago
The RabbitMQ cluster has now been deployed but because it is placed behind HAProxy we cannot access it until that service has been deployed.
HAProxy can be deployed in a similar fashion to the previous service stacks:
docker stack deploy -c docker-compose-haproxy.yml haproxy
validate the deployment, waiting for REPLICAS
be in the desired state of 3/3
:
docker stack services haproxy
ID NAME MODE REPLICAS IMAGE PORTS
eio9bx9mtser haproxy_haproxy global 3/3 haproxy:2.3.0-alpine *:1936->1936/tcp, *:5672->5672/tcp, *:15672->15672/tcp
docker stack ps haproxy
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
u0dsfxxp09qq haproxy_haproxy.ydqqjvpshir1nhss6surd11kr haproxy:2.3.0-alpine docker-swarm-03 Running Running about a minute ago
mabdfp8x2qfe haproxy_haproxy.fu8kqvcxfuficfi1ld0owy0l6 haproxy:2.3.0-alpine docker-swarm-02 Running Running about a minute ago
zfalsvyx5p39 haproxy_haproxy.qmh74hjo5djwwzqa7jkyknisb haproxy:2.3.0-alpine docker-swarm-01 Running Running about a minute ago
We can validate that RabbitMQ is accessible via HAProxy with a simple PowerShell command, which should return StatusCode: 200
:
Invoke-WebRequest -Uri "http://192.168.217.133:15672" -UseBasicParsing
StatusCode : 200
StatusDescription : OK
Content : <!DOCTYPE html>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>RabbitMQ Management</ti...
RawContent : HTTP/1.1 200 OK
content-security-policy: script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'self'
vary: origin
Content-Length: 2884
Content-Type: text/html
Date: Wed, 18 Nov 2020 20:15:...
Forms :
Headers : {[content-security-policy, script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'self'], [vary, origin], [Content-Length, 2884], [Content-Type, text/html]...}
Images : {}
InputFields : {}
Links : {}
ParsedHtml :
RawContentLength : 2884
We should now be able to access the RabbitMQ UI using the URL http://192.168.217.133:15672 from the Windows host. The default credentials are Username: guest
and Password: guest
. Once logged in, the 3 nodes should appear in the Overview tab under the Nodes section.
In an attempt to demonstrate gathering some very basic Java virtual machine(JVM) and Tomcat metrics we will deploy a simple Java application. The demo app is a bit flakey but is serves a purpose. I have been looking to add something a bit more interesting but have not got round to that just yet, something on the lines of Monitoring Spring Boot Apps with Micrometer, Prometheus, and Grafana or Monitoring Java Spring Boot applications with Prometheus: Part 1 maybe.
Anyway for now we will deploy what we have:
docker stack deploy -c docker-compose-java.yml java
wait for the service to come up:
docker stack services java
ID NAME MODE REPLICAS IMAGE PORTS
x06apyuc7sm7 java_java replicated 3/3 tonyskidmore/java:v1 *:8080->8080/tcp
docker stack ps java
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6ss6vlr5dyhu java_java.1 tonyskidmore/java:v1 docker-swarm-03 Running Running about a minute ago
fwbcbj0f7q0w java_java.2 tonyskidmore/java:v1 docker-swarm-02 Running Running about a minute ago
6ns2vhs11tl8 java_java.3 tonyskidmore/java:v1 docker-swarm-01 Running Running about a minute ago
We can throw a little activity at the Tomcat server to see some activity when we get onto monitoring:
for ($i=1 ;$i -lt 40 ; $i++) { Invoke-WebRequest -Uri http://192.168.217.133:8080/ -UseBasicParsing | Out-Null }
or in bash:
for ((i=1;i<=40;i++)); do curl -s http://192.168.217.133:8080/ > /dev/null; done
Access the application in a browser using the URL http://192.168.217.133:8080 and click the Order more stock
button a few times. If an error is display just refresh the base URL and try again. This is to generate some application level metrics.
Now that the basics of what we want to monitor have been deployed we can move onto the focus of this project, which is the Prometheus and Grafana deployment:
docker stack deploy -c docker-compose-monitoring.yml monitoring
we are only deploying a single instance of each, so just need to check that they come up ok:
docker stack services monitoring
ID NAME MODE REPLICAS IMAGE PORTS
65kg5tif1lru monitoring_grafana replicated 1/1 grafana/grafana:6.7.2 *:3000->3000/tcp
lprl4h6bt0kv monitoring_prometheus replicated 1/1 prom/prometheus:v2.17.2 *:9090->9090/tcp
docker stack ps monitoring
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
4uirwo4bhljv monitoring_prometheus.1 prom/prometheus:v2.17.2 docker-swarm-03 Running Running about a minute ago
6gjkyp3lm5db monitoring_grafana.1 grafana/grafana:6.7.2 docker-swarm-01 Running Running about a minute ago
First lets check that our defined targets are up in Prometheus at http://192.168.217.133:9090/targets.
We should see that our docker-managers
, docker-workers
, java-app
, java-tomcat
and rabbitmq-server
targets are all up.
If we navigate to the Configuration at http://192.168.217.133:9090/config we will see the config that was passed in as part of the monitoring stack configuration file.
If we explore the Graph tab in Prometheus at http://192.168.217.133:9090/graph we can click the insert metric at cursor
(or type in the name of a metric if we know it into the query text box) to see all the available metrics that are being scraped. For example, rabbitmq_build_info
:
It is these PromQL queries that form the foundation of the dashboard panels that we will see in Grafana.
Having explored Prometheus let's open up Grafana at http://192.168.217.133:3000. The default credentials are Username: admin
and Password: admin
(you can Skip
changing the password on first login).
We can see that some configuration has been automatically applied as part of the deployment and we see a couple of pre-defined dashboards: Runtime Dashboard
and RabbitMQ-Overview taken directly from the RabbitMQ examples that can be seen as part of the Quick Start tutorial examples.
If we open up the RabbitMQ-Overview
dashboard we should see something like the below:
This includes the information that we queried directly in Prometheus i.e. rabbitmq_build_info
. However, if we edit the panel we can see that the PromQL is a bit more involved so that the data is displayed in the required format:
If we go back to the Grafana Home page and open Runtime Dashboard
we will see a dashboard that has been custom made and then exported to Runtime-Dashboard.json and automatically imported as part of the monitoring
service deployment.
In this dashboard we can see some application level metrics (including the stock orders we created during the java app deployment) as well as metrics from the JVM and Tomcat. There are also Docker node metrics (not shown above).
Now that we have the monitoring service up and running it would be useful to trigger activity in RabbitMQ. A good solution would be to investigate further the RabbitMQ PerfTest for generating load testing. However, in this example we are going to do something simpler by deploying publisher
and consumer
services.
To deploy the publisher service:
docker stack deploy -c docker-compose-publisher.yml publisher
and then check the details:
docker stack services publisher
ID NAME MODE REPLICAS IMAGE PORTS
rkq3dqg7tme4 publisher_publisher replicated 1/1 tonyskidmore/publisher:0.0.1 *:80->80/tcp
docker stack ps publisher
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
z5c3epp9g8wc publisher_publisher.1 tonyskidmore/publisher:0.0.1 docker-swarm-02 Running Running 31 seconds ago
Once the publisher service is up and running we can send some messages via the application's HTTP interface:
1..20 | % { Invoke-RestMethod -Method POST -Uri "http://192.168.217.133:80/publish/message$_" -UseBasicParsing }
or from bash:
for ((i=1;i<=20;i++)); do curl -X POST "http://192.168.217.133:80/publish/message$i"; done
We should now see the message activity generated above:
We can then run the consumer service to pull any messages in the queue. This gives us a simple mechanism to allow us to push and pull messages to maybe develop and test a dashboard and alerting based on message patterns.
docker stack deploy -c docker-compose-consumer.yml consumer
and then check the details:
docker stack services consumer
ID NAME MODE REPLICAS IMAGE PORTS
8ipmjax1iiid consumer_consumer replicated 1/1 tonyskidmore/consumer:0.0.1
docker stack ps consumer
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
r449h1jq1cyp consumer_consumer.1 tonyskidmore/consumer:0.0.1 docker-swarm-02 Running Running 46 seconds ago
If we then go back to the RabbitMQ-Overview
dashboard we should see that Ready messages
is now 0 and some further consumer based activity.
Monitoring with Prometheus & Grafana
Implementing Highly Available RabbitMQ Cluster on Docker Swarm using Consul-based Discovery by Ahmet Vehbi Olgaç
RabbitMQ : Message Queues for beginners by Marcel Dempers
RabbitMQ : How to setup a RabbitMQ cluster - for beginners by Marcel Dempers
Observe and understand RabbitMQ by Gerhard Lazu & Michal Kuratczyk
Getting Started with Prometheus (Pluralsight) by Elton Stoneman
Monitoring Containerized Application Health with Docker (Pluralsight) by Elton Stoneman