VMware has ended active development of this project, this repository will no longer be updated.

# Introduction * this repo provides a (hopefully) simple approach to logging & analysis of system load on a GreenplumDB or HAWQ MPP cluster * it is a current work in progress and I will update as more components complete

README
- each directory contains a README with specific instructions
sar
- db: data collection and db load scripts (shell/sql)
- charts: charting scripts (R)
pidstat
- cron: local setup
- db: data collection and db load scripts (shell/sql)
- charts: charting scripts (R)
pg_log
- db: database load scripts (shell/sql)
- charts: charting scripts (R)
combo-analysis
- combined charts and analyses across sar/pidstat/pg_log data

Other Ramblings

Concepts

GPDB/HAWQ are MPP clusters (master combined with multiple segment servers)
All performance issues on the cluster result from over-utilization of a given resource either (a) across the entire cluster or (b) on a single machine (usually a segment)
This tk is intended to provide views of resource utilization for both cases
- in the case of (a), we provide a set of area charts which illustrate average utilization for the entire cluster
- in the case of (b), we provide a set of xy-scatter charts which illustrate specific utilization cases, classified using kmeans
All the charts use a database of performance data collected and loaded, using the sysstat tk to collect locally and then loading a set of data to a database, from which R charts obtain their data
X-AXIS (independent variable)
- the x-axis for cluster-average area charts is time
- the x-axis for xy-scatterplot charts is plist ("sar -q" provides count of active processes, which reflects active query requests in GPDB/HAWQ clusters)
Y-AXIS (dependent variable)
- depends on the resource, and scaled to provide a best-fit on the chart - to be honest, R does a pretty good job of y-scaling, so i pretty much let R do it for me

Uses for this TK

in general, this TK is intended to be used to quickly analyze specific system events historically
- while it may be possible to use this in near-real-time, at the moment, it is best for historical analyses of the period of time 2-5 days prior to now
types of investigation for which it is useful
- identify hot servers
  - collect sar statistics on the master (or the standby) host
  - load them to a repository database (GPDB/HAWQ/PGSQL)
  - use the charts to determine skew segment servers
- identify hot processes
  - setup a pidstat cron-job on all the servers
  - the cron will snap all system processes every 10 seconds
  - collect the pidstat cron output on the master/standby
  - load the pidstat data to a repository database (GPDB/HAWQ/PGSQL)
  - use the charts to determine skew processes and other process related metrics
- identify queries
  - collect pg_log information and load it to the repo database
  - join the pg_log and pidstat tables to determine the SQL being run by the hot processes

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
pg_log		pg_log
pidstat		pidstat
sar		sar
License		License
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pg_log

pg_log

pidstat

pidstat

sar

sar

License

License

README.md

README.md

Repository files navigation

VMware has ended active development of this project, this repository will no longer be updated.

Contents

Other Ramblings

Concepts

Uses for this TK

About

Releases

Packages

Languages

License

vmware-archive/mr-stats

Folders and files

Latest commit

History

Repository files navigation

VMware has ended active development of this project, this repository will no longer be updated.

Contents

Other Ramblings

Concepts

Uses for this TK

About

Resources

License

Security policy

Stars

Watchers

Forks

Languages