SUPReMM Processing Tools

Our team at the Center for Computational Research University at Buffalo develop and support a range of tools for the comprehensive management of advanced cyber infrastructure (CI) resources, including high performance computing (HPC). Part of comprehensive CI management is the monitoring and analysis of user's HPC jobs. The suite of tools to support job-level performance analysis was originally developed under a project called "SUPReMM".

The SUPReMM architecture comprises three major components:

Software that runs directly on HPC compute nodes and periodically collects performance information.
Software that uses the node-level performance data to generate job-level data.
An Open XDMoD module that enables the job-level information to be viewed and analyzed.

This repository contains the software that combines the node-level performance data to generate job-level summary data.

Full details of the SUPReMM project are available on the SUPReMM overview page in the Open XDMoD documentation.

This work was sponsored by NSF under grant numbers ACI 1203560, ACI 1025159 and ACI 1445806 for the XD Metrics Service (XMS) for NSF.

For more information, questions, feedback or bug reports send email to ccr-xdmod-help at buffalo.edu.

Want to be notified about SUPReMM package releases and news? Subscribe to the XDMoD mailing list.

Software Build Requirements

This section provides instructions on how to create an RPM or source packages for software development or debugging. The instructions for installing the released packages are available on the main website.

Rocky Linux 8

Install the EPEL repository configuration:

yum install epel-release

Enable the PowerTools repository (for Cython dependencies):

sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/Rocky-PowerTools.repo

Install the build dependencies:

yum install -y \
    gcc \
    python3-numpy \
    python3-scipy \
    python36-devel \
    python3-Cython \
    python3-pymongo \
    python3-PyMySQL \
    python3-pytest \
    python3-pytest-cov \
    python3-mock \
    python3-pexpect \
    python3-pylint \
    python3-pcp \
    pcp-devel

Installation

This project uses the python setuptools for package creation, and the setup script is known to work with setuptools version 36.4.0 or later. To install in a conda environment:

conda create -n supremm python=3.6 cython numpy scipy
source activate supremm
python3 setup.py install

RPM packages are created using:

python3 setup.py bdist_rpm

Contributing

We accept contributions via standard github pull requests.

This project is under active development with new features planned. Please contact us via the ccr-xdmod-help at buffalo.edu email address before you get started so that we can co-operate and avoid duplication of effort.

Overview (for developers)

Full details of how to install and use the software are available on the SUPReMM overview page in the Open XDMoD documentation. This section gives a very brief overview of the summarization software for software developers. As always, the definitive reference is the source code itself.

The summarization software processing flow is approximately as follows:

Initial setup including parsing configuration files, opening database connections, etc.
Query an accounting database to get the list of jobs to process
For each job:
- retrieve performance data that cover the time period the job ran;
- extract the relevant datapoints per timestep;
- run the data through the preprocessors;
- run the data through the plugins;
- collect the output of the preprocessors and plugins and store in an output database.

preprocessors and plugins are both python modules that implement a defined interface. The main difference between a preprocessor and a plugin is that the preprocessors run first and their output is available to the plugin code.

Each plugin is typically responsible for generating a job-level summmary for one or many performance metrics. Each module defines:

an identifier for the output data;
a list of required performance metrics;
a mode of operation (either only process the first and last datapoints or process all data);
an implementation of a processing function that will be called by the framework with the requested datapoints;
an implementation of a function that will be called at the end to return the results of the analyis.

An example of a plugin is one that records the mean and maximum memory usage for the job. Another example is a plugin that checks the temporal variance of the L1D cache load rate to determine if the job failed prematurely.

The software that retrieves the job information from the accounting database and writes to the output database is configurable. So, for example, you can setup the software to write the job summary records to stdout for testing purposes. The accounting database interface supports multiple accounting databases (Open XDMoD being the main one).

If you are interested in doing plugin development, then a suggested starting point is to look at some of the existing plugins. The simplest plugins, such as the block device plugin (supremm/plugins/Block.py) use the framework-provided implementation. A more complex example is the cgroup memory processor (supremm/plugins/CgroupMemory.py) that contains logic to selectively ignore certain datapoints and to do some non-trivial statistics on the data.

If you are interested in understanding the full processing workflow, then the starting point is the main() function in the summarize_jobs.py script.

License

The SUPReMM processing tools package is an open source project released under the GNU Lesser General Public License ("LGPL") Version 3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 519 Commits
.circleci		.circleci
config		config
src/supremm		src/supremm
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUPReMM Processing Tools

Software Build Requirements

Rocky Linux 8

Installation

Contributing

Overview (for developers)

License

About

Releases 19

Packages

Contributors 9

Languages

License

ubccr/supremm

Folders and files

Latest commit

History

Repository files navigation

SUPReMM Processing Tools

Software Build Requirements

Rocky Linux 8

Installation

Contributing

Overview (for developers)

License

About

Resources

License

Stars

Watchers

Forks

Releases 19

Packages 0

Contributors 9

Languages

Packages