GitHub - wandnz/nntsc: Collects, stores and examines network time series data

wandnz / nntsc Public
Notifications You must be signed in to change notification settings
Fork 0
Star 3
Collects, stores and examines network time series data
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 745 Commits
.github/workflows		.github/workflows
conf		conf
debian		debian
initscripts		initscripts
libnntsc		libnntsc
man		man
tests		tests
COPYING		COPYING
MANIFEST.in		MANIFEST.in
README		README
build_nntsc_db		build_nntsc_db
nntsc		nntsc
setup.py		setup.py
Repository files navigation

NNTSC 2.28

---------------------------------------------------------------------------
Copyright (c) 2016, 2017 The University of Waikato, Hamilton, New Zealand.
All rights reserved.

This code has been developed by the University of Waikato WAND
research group. For further information please see http://www.wand.net.nz/.
---------------------------------------------------------------------------

NNTSC is a system designed for collecting, storing and examining network time
series data. The main feature of NNTSC is that it is flexible and can work
with just about time series data, so you can unify your measurement storage
and analysis systems around a single data management API.

NNTSC also provides a client API that allows other programs to connect to
the measurement database and either ask for historical data or register
interest in upcoming live data.

If you have any questions, comments or suggestions about NNTSC, please
email us at contact@wand.net.nz.


Quickstart Instructions for Debian Users
========================================
 1. Install the NNTSC Debian package
 2. Create a Postgresql database for NNTSC:
 	createdb nntsc
 3. Create an Influx database for NNTSC:
        influx -execute 'CREATE DATABASE nntsc'
 4. Configure NNTSC (see 'Configuration' section below for more details)
	4.1. Create an RRD List file (if needed).
	4.2. Edit /etc/nntsc/nntsc.conf to point to your new database and
             RRD list.
 5. Run:
 	build_nntsc_db -C /etc/nntsc/nntsc.conf
 6. Run:
 	/etc/init.d/nntsc start



Installation from Source
========================
NNTSC uses Python setuptools to install both itself and its dependencies.

In version 2.3 there has been a significant change to the installation
process -- some of the NNTSC dependencies are now distributed and installed
separately, namely libnntscclient.

If you have checked out a copy of NNTSC from our git repository, you will also
need to get libnntscclient from the repository as well. Install it by
changing into the directory and running 'python setup.py install' prior to
attempting to install NNTSC itself.

Once these are installed, simply run the following to install NNTSC:
	python setup.py install

By default, NNTSC installs into your system's dist-packages directory but you
will need root privileges to install it there. You can change the installation
location using the --prefix option. Run 'python setup.py --help install' for
more details about how you can customise the install process.


Supported Network Measurement Data Sources
==========================================
At present, NNTSC only supports a handful of network measurement methods, but
we are working to extend this as time goes on.

NNTSC has support for the following data sources:

  AMP -- the Active Measurement Project
  Smokeping RRDs


Glossary
========
Here are a few important terms that we use when talking about NNTSC.

collection: A collection is a group of network measurement time series' (which
we call streams) which use the same measurement approach and result structure.
For example, each AMP test would result in a separate collection.

dataparser: The Python code that processes incoming network measurements and
inserts them into the NNTSC database. Each collection has its own dataparser.

metric: A term that describes what the data value for a time series actually
represents. For instance, the ping values stored by the NNTSC smokeping
collection and the rtt values stored by the AMP ICMP collection both measure
latency, so the metric for these is 'latency'. A traceroute path length is
measured in hops, so the metric in this case is 'hops'.

module: Refers to the input source for a collection. The name 'module' is used
because many collections share the same basic input source, e.g. all the AMP
test collections deal with the AMP message broker, so the dataparsers for those
collections all end up using the same Python module for receiving and processing
messages from the shared source.

modsubtype: Used to differentiate between collections that share the same
input source. Short for 'module subtype', where the subtype can be used to
determine which dataparser should be used to process and store the
received measurements correctly.

resolution: When discussing RRDs, resolution refers to the frequency at which
measurements are taken. NNTSC only stores measurements taken at the highest
available resolution and will not store RRD data that has been aggregated.

stream: A stream is a single network measurement time series within NNTSC. Each
stream has a unique ID number and has a series of properties that describe how
the measurements were taken.


Configuration
=============
Before you start using NNTSC, you'll need to create a config file that
defines where your time series data is going to be coming from and where you
want to store it. An example configuration file is included in conf/nntsc.conf
which you can use as a starting point. If you've installed NNTSC from a Debian
package, this same file will be installed to /etc/nntsc/nntsc.conf and you
should edit that file to add your configuration.

The configuration options use the standard python ConfigParser format, where
options are grouped into sections. The sections currently supported by NNTSC
and their options are as follows:

[database]
  Options relating to the Postgresql database where the relational data will
  be stored.

  Available options:
  	host - the host where the database is located. Leave blank if the
	       database is on the local machine.
	database - the name of the NNTSC database.
	username - the username to use when connecting to the database. Leave
	           blank to use the name of the user running NNTSC.
	password - the password to use when connecting to the database.	Leave
	           blank if no password required.

[influx]
  Options relating to the Influx database where the time series data will
  (mostly) be stored.

  Available options:
    useinflux - if 'no', all time series data will be stored in the Postgresql
                database instead.
    database - the name of the Influx database.
    username - the username to use when connecting to the database. Leave
               blank to use the name of the user running NNTSC.
    password - the password to use when connecting to the database. Leave
               blank if no password required.
    host - the host where the database is located. Leave blank if the
           database is on the local machine.
    port - the port that the Influx daemon is listening on. Use 8086 unless
           you've configured Influx to use another port.
    keepdata - the length of time to keep data in the database. For example,
               '30d' will keep data for 30 days.

[liveexport]
  Options relating to the Rabbit queue that funnels newly collected data to
  the live exporter.

  Available options:
        queueid - the name of the message queue.
        port - the port that the local Rabbit instance is listening on.
        username - the username to use when connecting to Rabbit.
        password - the password to use when connecting to Rabbit.

[modules]
  These options are used to enable or disable NNTSC dataparsers. To enable
  a module, set the appropriate option to "yes". To disable a module, set
  the option to "no".

  Available options:
        amp - controls whether NNTSC accepts data from AMP.
        rrd - controls whether NNTSC should scrape data from RRDs.

[amp]
  Options relating to connecting to an AMP message queue and receiving AMP data

  Available options:
  	username - the username to use when connecting to AMP's message queue.
	password - the password to use when connecting to AMP's message queue.
	host - the host where the AMP message broker is running.
	port - the port to connect to on the AMP message broker host.
	ssl - if True, encrypt all messages using SSL. If False, use plain-text.
	queue - the name of the message queue to read AMP data from.
        commitfreq - the number of AMP messages to process before committing.

[rrd]
  Options relating to scraping data from existing RRD files. See "RRD
  Scraping" below for more details on how to configure RRD scraping.

  Available options:
        rrdlist - the full path to a file describing the RRDs file to scrape.


RRD Scraping
============
NNTSC includes the ability to harvest measurements from the RRD files
created by other measurement tools. This allows users to deploy NNTSC
alongside their existing monitoring setup. At present, NNTSC only supports
scraping the RRD files produced by Smokeping.

In the Smokeping case, the Smokeping tool runs as per usual, writing its
results to an RRD file. NNTSC will then read the most recent measurements
from the RRD file and store them in its own database for later analysis.
Unlike the RRD, though, NNTSC will keep the measurements in their original
format for as long as they remain in the database, rather than aggregating
them once they reach a certain age.

The RRD list file is used to tell NNTSC which RRD files it should read and
what is being measured by each file. A documented example RRD list file is
provided in conf/rrd.examples which should be sufficient to help you write
your own RRD list file.

When you run the script to build your NNTSC database, the RRD list file
will be parsed and appropriate streams will be created for each RRD that
is described in the file. Once NNTSC itself is running, the specified
RRDs will be periodically checked for new datapoints. Should you decide
you add new RRDs to your RRD list, you must re-run the database build script
for NNTSC to recognise them.


Building the Database
=====================

Before you can run NNTSC itself, you will need to create and initialise the
database that will be used to store the time series data that you are
collecting. NNTSC uses a postgresql database for its storage. First, decide
on a name for your database -- in the following examples, I am going to use
'nntsc' as the database name but you can choose anything you like.

Next, create a Postgresql database and ensure that it is owned by the user
that you will be ultimately connecting to the database as. In the simplest
case, I will be connecting using my own username so I can create the nntsc
database using:

	createdb nntsc

If you're using Influx to store the time series data, you'll also need to
create an Influx database. Easiest way to do this via the InfluxQL shell.

        influx -execute 'CREATE DATABASE nntsc'

The next step is run the build_nntsc_db script that should have been installed
on your system when you installed NNTSC. This script will create all of the
required tables for storing the time series data. In the case of some input
sources, e.g. RRDs, it will also create 'streams' for each of time series
that will be collected. Make sure the database section of your config file
now refers to your new database and then simply run the script as follows:

	build_nntsc_db -C <your config file>

If you update your configuration, e.g. you wish to enable a data source that
you had earlier disabled or you have added new RRDs that you want to collect
data for, you can re-run build_nntsc_db and it will add new tables and streams
as needed without overwriting any existing ones.

If you really do want to start over fresh, you can pass a -F flag to
build_nntsc_db.

Be warned, this will drop ALL of your existing tables and any time series data
and streams that are stored within them. Don't do this unless you are really
sure you want to lose all that data!

If you explore your new Postgresql database with the psql tool, you should have
a 'collections' table and a number of tables begining with 'stream_' and
'data_', one per collection.


Starting NNTSC
============================

Once you've created and initialised your database, you can start NNTSC in
several ways.

Firstly, if you have installed NNTSC via a Debian package, there will be an
init script installed in /etc/init.d/ that you can use to start and stop a
NNTSC daemon. This is by far the best available option. Using the script,
you can start NNTSC by running the following:

	/etc/init.d/nntsc start

The init script also accepts 'stop' and 'restart' as commands, so it can be
used to easily stop a running instance of NNTSC. The init script will use
/etc/nntsc/nntsc.conf as the NNTSC configuration file so either make sure that
file contains your desired config or edit the init script to point to your own
config file.

Alternatively, you may manually start an NNTSC daemon by running the following:

	nntsc -b -C <your config file> [-p port] [-P pidfile]

Finally, you may run NNTSC in the foreground by omitting the '-b' option from
the manual command given above. In this case, we strongly recommend running
NNTSC within screen or some other persistant terminal environment.

By default, the nntsc exporter will listen on port 61234 for incoming client
connections. This can be changed using the '-p' option.

Note that in all cases, the same config file must be used by both NNTSC and the
build_nntsc_db script.

If run as a daemon, NNTSC will write any error or log messages to
/tmp/nntsc.log, otherwise it will report them directly to standard error.


Querying the NNTSC Database
===========================

While running, nntsc also listens on TCP port 61234 for client
connections. Connected clients may query NNTSC to access any data currently
stored in the time series database or register an interest in live data as it
arrives from the various input sources.

There is a Python API for writing a client that can connect to and query NNTSC,
without the programmer needing to know the underlying communication protocol.
The code for this API is located in the libnntscclient repository -- it's not
very well-documented at the moment but hopefully we can improve that in future
releases.

To use the client API, just add the following imports to your Python program:

	from libnntscclient.protocol import *
	from libnntscclient.nntscclient import NNTSCClient

There are three types of messages that a client can send to the NNTSC collector:

  NNTSC_REQUEST -- requests information about the current collections or streams
  NNTSC_SUBSCRIBE -- requests measurement data for a set of streams
  NNTSC_AGGREGATE -- requests aggregated measurement data for a set of streams


NNTSC_REQUEST messages must also specify a request type:
  NNTSC_REQ_COLLECTION -- request a list of all available collections
  NNTSC_REQ_STREAMS -- request a list of streams for a given collection
  NNTSC_REQ_SCHEMA -- request the list of columns in the stream and data tables
                      for a given collection

NNTSC_SUBSCRIBE messages can be used to get all of the available measurements
for some set of streams over a given time period. If the time period includes
historical data, any available historical data will be immediately returned in
response. If the time period ends in the future, new measurements will be
exported to the client as they arrive until the end time is passed. If the end
time is set to 0, then live data will be exported until the client disconnects.

NNTSC_AGGREGATE messages work much the same as NNTSC_SUBSCRIBE messages,
except that the measurements are aggregated into bins of a given size.
NNTSC_AGGREGATE can only be used with historical data -- if the end time is in
the future, then you will only get data up to the current time. You must also
provide an aggregator function and a list of columns from the data table that
you want to apply the aggregation to.

Supported aggregation functions are: max, min, sum, avg and count. They should
be fairly self-explanatory as to how they will aggregate the data within a bin.