Skip to content

OPeNDAP PROV Module

Tim L edited this page Mar 3, 2014 · 67 revisions

What is first

What we'll cover

This page introduces a new OPeNDAP module, prov_module, that records provenance of OPeNDAP requests.

Let's get to it.

Installing prov_module

Right now, the prov_module source code only lives in our branch of OPeNDAP, deployed at http://opendap.tw.rpi.edu/tomcat/opendap/. As our new module gets back into the mainstream OPeNDAP SVN, it'll show up in the tetherless-world branch, then hopefully into their production trunk. The design of the configuration variables is issue 24.

Configuration

opendap/prov_module/prov.conf.in is the default template for the "prov.conf" that gets installed for any particular OPeNDAP instance. For example, our prov.conf appears at /opt/opendap/branch/etc/bes/modules/prov.conf. The design of most of the configuration variables follow suit with Prizms' SDV Organization principles.

Your prov.conf starts out looking like:

#-----------------------------------------------------------------------#
# OPeNDAP NetCDF Data Handler BES Module Configuration file             #
#-----------------------------------------------------------------------#

#-----------------------------------------------------------------------#
# An requirements?
#-----------------------------------------------------------------------#
#BES.Include=dap.conf

#-----------------------------------------------------------------------#
# modules to load, includes data modules and command modules            #
#-----------------------------------------------------------------------#

BES.modules+=prov
BES.module.prov=/opt/opendap/branch/lib/bes/libprov_module.so

#-----------------------------------------------------------------------#
# Prov handler specific parameters
#-----------------------------------------------------------------------#
Prov.PlaceToStoreShtuff=/tmp
Base URI

The Base URI is the web domain name under which instances of PROV-O will be asserted when recording provenance. This variable is the same as Prizms' CSV2RDF4LOD_BASE_URI environment variable and the conversion:base_uri property.

A Prizms-aligned configuration.

Prov.cr_base_uri=http://opendap.tw.rpi.edu
Data Root

The Data Root is the absolute directory path of a Prizms dataset collection (many source organizations, many datasets, many versions). This variable is the same as Prizms' CSV2RDF4LOD_CONVERT_DATA_ROOT environment variable.

These are Prizms-aligned configurations.

Prov.cr_data_root=/home/prizms/prizms/opendap/data/source
Provenance Records "SDV" source identifier
Prov.cr_source_id=us
Provenance Records "SDV" dataset identifier
Prov.cr_dataset_id=opendap-prov
Dataset directory (combines the three above)

or, alternatively, the three variables above can be expressed all in one:

Prov.cr_dataset_dir=/home/prizms/prizms/opendap/data/source/us/opendap-prov

The default values for Prov.cr_source_id and Prov.cr_source_id are us and opendap-prov, respectively.

Request Name Template

How to name each request made? This becomes the "version" of the "opendap-prov" dataset.

For now, a fixed and ignored value of:

Prov.request_name_template=yyyyMMdd-s-uuid[4]

Implementation

prov_module is created and builds. Includes:

  • ProvModule class (.h / .cc), which is where things are registered with the BES Software Framework for our prov module.
  • ProvReporter class (.h / .cc).
    • The report method (.h / .cc) is what we'll use.
    • You can take a look at the dhi (BESDataHandlerInterface) parameter to the report method to see the kinds of things we can report.
    • Configuration information will come from the TheBESKeys singleton (.h / .cc)
    • Logging is done against the TheBESLog singleton (.h / .cc).
      • Should we encode TIC into the same log file that OPenDAP already uses?

The Module class registered the reporter so will get called for any and all request documents. We just need to figure out which ones we want to handle (specifically any ones that have a GET request).

Building prov_module

To build, cd into prov_module and do the following (as per issue 25):

% cd ~/prizms/opendap/opendap/prov_module
% git pull
% source /opt/opendap/branch/opendap.ksh # This sets OPENDAP_ROOT
% autoreconf --force --install
% ./configure --prefix=$OPENDAP_ROOT
% make
% make install

Then, restart OPeNDAP:

% sudo /opt/opendap/branch/bin/besctl stop
% sudo /opt/opendap/branch/bin/besctl start -d "/opt/opendap/branch/var/bes.debug,all"

After you've run autoreconf and ./configure once, you need only run make ; make install.

The latest log can be found:

$ cd /home/prizms/prizms/opendap/data/source/us/opendap-prov/version
$ ls -lt | awk '{print $8}' | grep 2014 | head -1 | xargs -I {} cat {}/source/opendap-provenance.ttl

Mapping TheBESKeys+TheBESLog to PROV

Discussed on 2014 Jan 28.

2014 Feb 18, we decided that we do NOT need to model the individual data-dds entities that result from each BESContainer load. It is adequate to model the final data-dds that results from all loads. The extra detail could be added later if the need arises.