Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring guide #2407

Merged
merged 1 commit into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions guides/common/attributes-base.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
:ManagingHostsDocURL: {BaseURL}Managing_Hosts/{BaseFilenameURL}#
:ManagingOrganizationsLocationsDocURL: {BaseURL}Managing_Organizations_and_Locations/{BaseFilenameURL}#
:ManagingSecurityDocURL: {BaseURL}Managing_Security_Compliance/{BaseFilenameURL}#
:MonitoringDocURL: {BaseURL}Monitoring_Project/{BaseFilenameURL}#
:PlanningDocURL: {BaseURL}Planning_for_Project/{BaseFilenameURL}#
:ProvisioningDocURL: {BaseURL}Provisioning_Hosts/{BaseFilenameURL}#
:ReleaseNotesDocURL: {BaseURL}Release_Notes/{BaseFilenameURL}#
Expand Down
1 change: 1 addition & 0 deletions guides/common/attributes-satellite.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
:ManagingConfigurationsPuppetDocURL: {BaseURL}managing_configurations_using_puppet_integration_in_red_hat_satellite/index#
:ManagingHostsDocURL: {BaseURL}managing_hosts/index#
:ManagingSecurityDocURL: {BaseURL}managing_security_compliance/index#
:MonitoringDocURL: {BaseURL}monitoring_red_hat_satellite/index#
:PlanningDocURL: {BaseURL}satellite_overview_concepts_and_deployment_considerations/index#
:ProvisioningDocURL: {BaseURL}provisioning_hosts/index#
:TuningDocURL: {BaseURL}tuning_performance_of_red_hat_satellite/index#
Expand Down
1 change: 1 addition & 0 deletions guides/common/attributes-titles.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
:ManagingHostsDocTitle: Managing Hosts
:ManagingOrganizationsLocationsDocTitle: Managing Organizations and Locations in {ProjectName}
:ManagingSecurityDocTitle: Managing Security Compliance
:MonitoringDocTitle: Monitoring {ProjectName}
:PlanningDocTitle: Planning for {ProjectName}
:ProvisioningDocTitle: Provisioning Hosts
:QuickstartDocTitle: Quickstart Guide for {Project} on {install-on-os}
Expand Down
13 changes: 13 additions & 0 deletions guides/common/modules/con_metrics-data-retention.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[id='metrics-data-retention_{context}']
= Metrics Data Retention

The storage capacity required by PCP data logging is determined by the following factors:

* The metrics being logged.
* The logging interval.
* The retention policy.

The default logging (sampling) interval is 60 seconds.

The default retention policy is to keep archives for the last 14 days, compressing archives older than one day.
PCP archive logs are stored in the `/var/log/pcp/pmlogger/_{foreman-example-com}_` directory.
19 changes: 19 additions & 0 deletions guides/common/modules/con_metrics-overview.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[id='metrics-overview_{context}']
= Metrics Overview

Obtaining metrics from {Project} is useful for troubleshooting a current issue, and capacity planning.
This guide describes how to collect live metrics and archive them for a fixed period of time.
ifdef::satellite[]
If you need to raise a support case with {Team} to resolve a performance issue, the archived data provides valuable insight.
Note that {Team} Support can only access the archived data if you upload it to a Support Case.
endif::[]

You can collect the following metrics from {Project}:

* Basic statistics from the operating system, including system load, memory utilization, and input/output operations.
* Process statistics, including memory and CPU utilization.
* Apache HTTP Server activity statistics.
* PostgreSQL activity statistics.
* {Project} application statistics.

Use Performance Co-Pilot (PCP) to collect and archive {Project} metrics.
35 changes: 35 additions & 0 deletions guides/common/modules/con_pcp-metrics.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
[id='pcp-metrics_{context}']
= PCP Metrics

Metrics are stored in a tree-like structure.
For example, all network metrics are stored in a node named `network`.
Each metric may be a single value, or a list of values, known as instances.
For example, kernel load has three instances, a 1-minute, 5-minute, and 15-minute average.

For every metric entry, PCP stores both its data and metadata.
This includes the metrics description, data type, units, and dimensions.
For example, the metadata enables PCP to output multiple metrics with different dimensions.

The value of a counter metric only increases.
For example, a count of disk write operations on a specific device only increases.
When you query the value of a counter metric, PCP converts this into a rate value by default.

In addition to system metrics such as CPU, memory, kernel, XFS, disk, and network, the following metrics are configured:

[%header,cols=2*]
|===
|Metric
|Description

|hotproc.*
|Basic metrics of key {Project} processes

|apache.*
|Apache HTTP Server metrics

|postgresql.*
|Basic PostgreSQL statistics

|openmetrics.foreman.fm_rails_*
|{Project} metrics
|===
6 changes: 6 additions & 0 deletions guides/common/modules/con_performance-co-pilot.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[id='performance-co-pilot_{context}']
= Performance Co-Pilot

Performance Co-Pilot (PCP) is a suite of tools and libraries for acquiring, storing, and analyzing system-level performance measurements.
PCP can be used to analyze live and historical metrics.
You can retrieve and view metrics through a CLI or a web UI.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[id='performance-metric-domain-agents_{context}']
= Performance Metric Domain Agents

A Performance Metric Domain Agent (PMDA) is a PCP add-on which enables access to metrics of an application or service.
To gather all metrics relevant to {Project}, you must install PMDAs for Apache HTTP Server and PostgreSQL.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[id='retrieving-metrics-using-the-cli_{context}']
= Retrieving Metrics using the CLI

Using the CLI tools provided with PCP, you can retrieve metrics either live or from an archive file.
8 changes: 8 additions & 0 deletions guides/common/modules/con_retrieving-metrics.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[id='retrieving-metrics_{context}']
= Retrieving Metrics

You can retrieve metrics from PCP using the CLI or the web UI interface.
A number of CLI tools are provided with PCP, which can either output live data, or data from archived sources.
ifndef::foreman-deb[]
The web UI interface is provided by the Grafana web application.
endif::[]
19 changes: 19 additions & 0 deletions guides/common/modules/proc_changing-data-retention-policy.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[id='changing-data-retention-policy_{context}']
= Changing Data Retention Policy

This procedure describes how to change the data retention policy.

.Procedure
ifndef::foreman-deb[]
. Edit the `/etc/sysconfig/pmlogger_timers` file.
endif::[]
ifdef::foreman-deb[]
. Edit the `/etc/default/pmlogger_timers` file.
endif::[]
. Find the line containing `PMLOGGER_DAILY_PARAMS`.
. If it is commented, uncomment the line.
. Ensure the default parameter `-E` is present.
. Add parameter `-x`, and add a value for the desired number of days after which data is archived.
. Add parameter `-k`, and add a value for the number of days after which data is deleted.
+
For example, the parameters `-x 4 -k 7` specify that data will be compressed after 4 days, and deleted after 7 days.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[id='changing-default-logging-interval_{context}']
= Changing Default Logging Interval

This procedure describes how to change the default logging interval.

.Procedure
. Edit the `/etc/pcp/pmlogger/control.d/local` configuration file.
. Edit the LOCALHOSTNAME line and append `-t __XX__s`, where _XX_ is the desired time interval, measured in seconds.
. Restart the `pmlogger` service.
70 changes: 70 additions & 0 deletions guides/common/modules/proc_configuring-pcp-data-collection.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
[id='configuring-pcp-data-collection_{context}']
= Configuring PCP Data Collection

This procedure describes how to configure PCP to collect metrics about processes, {Project}, Apache HTTP Server, and PostgreSQL.

.Procedure
. To configure PCP to collect data about {Project} processes, configure the process monitoring PMDA to use the {Project} specific config:
+
----
# ln -s /var/lib/pcp/pmdas/proc/hotproc.conf /etc/pcp/proc/foreman-hotproc.conf
----
+
By default, PCP collects basic system metrics.
This step enables detailed metrics about the following {Project} processes:
+
* Java
* PostgreSQL
* Redis
* Dynflow
* Puma
ifndef::foreman-el,foreman-deb[]
* Pulpcore
endif::[]

. Install the process monitoring PMDA.
+
----
# cd /var/lib/pcp/pmdas/proc
# ./Install
----

. Configure PCP to collect metrics from Apache HTTP Server.
+
.. Enable the Apache HTTP Server extended status module.
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# {foreman-installer} --enable-apache-mod-status
----
.. Enable the Apache HTTP Server PMDA.
+
----
# cd /var/lib/pcp/pmdas/apache
# ./Install
----
. Configure PCP to collect metrics from PostgreSQL:
+
----
# cd /var/lib/pcp/pmdas/postgresql
# ./Install
----
. Enable telemetry functionality in {Project}:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# {foreman-installer} --foreman-telemetry-prometheus-enabled true
----
. Configure PCP to collect data from {Project}.
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# cd /var/lib/pcp/pmdas/openmetrics
# echo "https://_{foreman-example-com}_/metrics" > config.d/foreman.url
# ./Install
----
. Restart PCP to begin data collection:
+
----
# systemctl restart pmcd pmlogger
----
17 changes: 17 additions & 0 deletions guides/common/modules/proc_confirming-data-storage-usage.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[id='confirming-data-storage-usage_{context}']
= Confirming Data Storage Usage

To confirm data storage usage, enter the following command:

[options="nowrap", subs="verbatim,quotes,attributes"]
----
# less /var/log/pcp/pmlogger/_{foreman-example-com}_/pmlogger.log
----

This lists all available metrics, grouped by the frequency at which they are logged.
For each group it also lists the storage required to store the listed metrics, per day.

.Example storage statistics
----
logged every 60 sec: 61752 bytes or 84.80 Mbytes/day
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[id='enabling-access-to-telemery-data-using-the-web-ui_{context}']
= Enabling Access to Metrics using the Web UI

This procedure describes how to access metrics collected by PCP, using the web UI.

.Procedure
. Install Grafana and the Grafana PCP plug-in on your {ProjectServer}:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# {package-install-project} grafana grafana-pcp
----
. Start and enable the Grafana web service and the PCP proxy service:
+
----
# systemctl enable --now pmproxy grafana-server
----
. Open firewall port to allow access to the Grafana web interface:
+
----
# firewall-cmd --permanent --add-service=grafana
# firewall-cmd --reload
----
. Complete these procedures in _Setting up graphical representation of PCP metrics_ to access the Grafana web UI, enable the PCP plugin and configure the PCP Redis datasource:
.. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-up-graphical-representation-of-pcp-metrics_monitoring-and-managing-system-status-and-performance#accessing-the-grafana-web-ui_setting-up-graphical-representation-of-pcp-metrics[Accessing the Grafana web UI]
.. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-up-graphical-representation-of-pcp-metrics_monitoring-and-managing-system-status-and-performance#configuring-pcp-redis_setting-up-graphical-representation-of-pcp-metrics[Configuring PCP Redis]
32 changes: 32 additions & 0 deletions guides/common/modules/proc_identifying-available-metrics.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
[id=identifying-available-metrics_{context}]
= Identifying Available Metrics

* To list all metrics available via PCP, enter the following command:
+
----
# pminfo
----
* To list all {Project} metrics and their descriptions, enter the following command:
+
----
# foreman-rake telemetry:metrics
----
* To list the archived metrics, enter the following command:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# less /var/log/pcp/pmlogger/_{foreman-example-com}_/pmlogger.log
----
* The pmlogger daemon archives data as it is received, according to its configuration.
To confirm the active archive file, enter the following command:
+
----
# pcp | grep logger
----
+
The output includes the file name of the active archive file, for example:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
/var/log/pcp/pmlogger/_{foreman-example-com}_/_20230831.00.10_
----
32 changes: 32 additions & 0 deletions guides/common/modules/proc_installing-pcp-packages.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
[id='installing-pcp-packages_{context}']
= Installing PCP Packages

Use this procedure to install the PCP packages on your {ProjectServer}.

.Prerequisites
* Ensure you have a minimum of 20 GB space available in the `/var/log/pcp` directory.
+
PCP's default data retention policy is to retain data collected within the last 14 days.
Data storage per day is estimated to use usually between 100 MB and 500 MB of disk space, but may use up to several gigabytes.
For more information, see xref:changing-data-retention-policy_{context}[]

.Procedure
. Install the PCP packages:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# {package-install-project} pcp \
ifndef::foreman-deb[]
pcp-pmda-apache \
pcp-pmda-openmetrics \
pcp-pmda-postgresql \
pcp-pmda-redis \
pcp-system-tools \
endif::[]
foreman-pcp
----
. Enable and start the Performance Metrics Collector daemon, and the Performance Metrics Logger daemon:
+
----
# systemctl enable --now pmcd pmlogger
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
[id='retrieving-archived-metrics-using-cli_{context}']
= Retrieving Archived Metrics using CLI

You can use the PCP CLI tools to retrieve metrics from an archive file.
To do that, add the `--archive` parameter and specify the archive file.

* To list all metrics which were enabled when the archive file was created, enter the following command:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pminfo --archive _archive_file_
----

* To confirm the host and time period covered by an archive file, enter the following command:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pmdumplog -l _archive_file_
----

.Examples
* To list disk writes for each partition, over the time period covered by the archive file:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pmval --archive /var/log/pcp/pmlogger/_{foreman-example-com}_/_20230831.00.10_ \
-f 1 disk.partitions.write
----

* To list disk write operations per partition, with a two second interval, between the time period 14:00 and 14:15:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pmval --archive /var/log/pcp/pmlogger/_{foreman-example-com}_/_20230831.00.10_ \
-d -t 2sec \
-f 3 disk.partitions.write \
-S @14:00 -T @14:15
----

* To list average values of all performance metrics, including the time of minimum/maximum value and the actual minimum/maximum value, between the time period 14:00 and 14:30 and to output the values in tabular formatting:
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pmlogsummary /var/log/pcp/pmlogger/_{foreman-example-com}_/_20230831.00.10_ \
-HlfiImM \
-S @14:00 \
-T @14:30 \
disk.partitions.write \
mem.freemem
----

* To list system metrics stored in an archive, starting from 14:00.
The metrics are displayed in a format similar to the `top` tool.
+
[options="nowrap", subs="verbatim,quotes,attributes"]
----
# pcp --archive /var/log/pcp/pmlogger/_{foreman-example-com}_/_20230831.00.10_ \
-S @14:00 \
atop
----
Loading
Loading