Welcome you to visit our yw-idcc-17 web site. This demo consists of examples of YW provenance queries highlighted in the IDCC'17 presentation, paper, and demo.
The purpose of this demo is to demonstrate the
Yesworkflow (YW) query ability to use the prospective provenance created by YW, the retrospective provenance and hybrid provenance together to answer queries that can not be answered solely by prospective provenance or retrospective provenance.
The prospective provenance in this demo is created by YW which models conventional scripts and programs as scientific workflows. YW can provide a number of the benefits of using a scientific workflow management system without having to rewrite scripts and other scientific software. A YW user simply adds special YW comments to existing scripts. These comments declare how data is used and results produced, step by step, by the script. Then, YW interprets these comments and produces graphical output that reveals the stages of computation and the flow of data in the script.
There are various approaches to capture retrospective provenance. Retrospective provenance observables, e.g., from
DataONE RunManagers (file-level),
ReproZip (OS-level), or
noWorkflow (Python code-level) only yield isolated fragments of the overall data lineage and processing history. In this demo, two types of retrospective provenance observables are used:
DataONE RunManager. The
yw-recon can search the file system for files that match the URI templates declared for @IN and @OUT ports in the script. On the other hand,
DataONE RunManager can record a list of input and output files for a script run.
The following tools are used in our demo project:
Our system demonstration will illustrate the variety of provenance information that we are able to capture, query, and visualize using a combination of tools for exposing both prospective and retrospective provenance. We show how prospective provenance can be declared using YesWorkflow (YW) annotations that reveal the fine-grained (variable level) dataflow graph implicit in scripts, and how this prospective provenance can be integrated with the coarse-grained (file-level) retrospective provenance information recorded by the DataONE Run Managers for MATLAB and R, fine-grained retrospective provenance captured by noWorkflow, user-exported log file at any fine-grained level. We demonstrate the usefulness of integrating prospective and retrospective provenance in this way with queries:
Prospective provenance queries in the context of a single script. This can expose and test data dependencies at the workflow-level.
Retrospective provenance queries in the context of a single run of a single script: captures actual input and output files of a script run and other runtime observables.
Hybrid provenance query in the context of a single script and single run: blends retrospective and prospective provenance, yielding new knowledge artefacts.
Provenance query in the context of multiple scripts and multiple runs: query and visualize data dependencies across multiple script runs
Our demonstration queries and provenance reports thus yield a more complete and comprehensible picture of data provenance from multiple script runs.
Please read Query README in the demo repo.
Sample proveance query results
- YesWorkflow Graph for C3C4 Example
- Hybrid Graph for C3C4 Example
- YesWorkflow Graph for LIGO Example
- Hybrid Graph for LIGO Example
- noWorkflow Filtered Graph for LIGO Example
- YesWorkflow Graph for Kurator Example
- Hybrid Graph for Kurator Example
- YesWorkflow Graph for Twitter Example
- Hybrid Graph for Twitter Example
- Multiple_runs_Multiple_scripts_Graph for OHIBC Example
Layouts of Repository
|examples/||Contains examples demonstrating the queries in the queries folder|
|queries/||it stores the scripts to the nine demo queries we asked.|
|rules/||it contains a set of Prolog rules for generating prospective yesworkflow views rules (
|OHIBC_Howe_Sound_project/||A R workflow project
|docker/||Contains a docker image that can help users to reproduce the demonstrated provenance queries.|
|yw_jar/||Contains two version YesWorkflow Java library.|
|poster_template/||Contains the poster and other publications.|
|SQLiteToYaml/||Contains Java program is used to convert Sqlite database into yaml file to be queried by YesWorkflow.|
The example subfolders also have a typical folder structure:
Subfolders that all
<my_example> folders have:
|script/||the example script or scripts that make up <my_example>|
|facts/||the YW facts for <my_example>, generated by running YW on the example script(s)|
|views/||materialized views for <my_example>|
|recon/||reconstructed provenance used for <my_example>|
|results/||all artifacts generated by make.sh|
|supplementary/||a folder with supplementary files and information about the example|
|clean.sh||removes generated demo artifacts for <my_example>|
|make.sh||creates demo artifacts for <my_example>|
|Note: after running
simulate_data_collection/ ├── clean.sh ├── facts │ ├── yw_extract_facts.P │ └── yw_model_facts.P ├── make.sh ├── results ├── script │ ├── calibration.img │ ├── cassette_q55_spreadsheet.csv │ └── simulate_data_collection.py └── views └── yw_views.P
Installing, Browsing, and Running the Demo
The following free software are required in order to run this demo.
Java: please install Java SE Development Kit 8 by navigating to http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html to view JDK dowloads. Accept all default installation configuration. Please confirm if Java is available by typing the command below. If not, please locate the directory containing the JDK executables (
C:\Program Files\Java\jdk1.8.0_121\bin) and add the direcoty containing the JDK executables to my Windows
my_home$ java -version java version "1.8.0_91" Java(TM) SE Runtime Environment (build 1.8.0_91-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode) my_home$
XSB: a Logic Programming and Deductive Database system for Unix and Windows ([XSB homepage] (http://xsb.sourceforge.net)). The download and installation page for XSB is at [here] (http://xsb.sourceforge.net/downloads/downloads.html) or please navigate to the page https://sourceforge.net/projects/xsb/files/xsb/. The version 3.7 is the newest version.
Install XSB on Mac/Linux Download the XSB tar package (XSB 3.6 (Linux/Mac/*nixes)) from here. Then, Unpack the tarball in some directory. This should create a subdirectory, called
XSB, which contains the XSB sources. In the terminal, type
my_home$ tar xvf XSB.tar my_home$ cd XSB/build my_home$ ./configure my_home$ ./makexsb my_home$ /Users/my_home/XSB/bin/xsb
Next, you might add the path to the XSB executable (
/Users/my_home/XSB/bin/xsb) to the
PATH variable. For example, in a ~/.bashrc file, add this line:
```sh export PATH="/Users/my_home/XSB/bin:$PATH" ```
Then, in a terminal, typing this command
```sh my_home$ source ~/.bashrc my_home$ which xsb /Users/my_home/XSB/bin/xsb ```
Install XSB on Windows Download the XSB executable
xsb-3.6.0.exefor Windows platform. Run the downloaded installer file and accept all default configuration. This is the extra steps for Windows users. Please determine which directory contains the XSB executable that works for your computer:
C:\Program Files (x86)\XSB\config\x64-pc-windows\bin C:\Program Files (x86)\XSB\config\x86-pc-windows\bin
Then, add the path to the XSB executable to my windows path variable
Control Panel -> System and Security -> System -> Advanced System Settings -> Environment Variables -> Path. Typing
xsbin a command console in order to confirm that XSB can run from the command prompt.
C:\Users\my_home> xsb [xsb_configuration loaded] [sysinitrc loaded] [xsbbrat loaded] XSB Version 3.6. (Gazpatcho) of April 22, 2015 [x64-pc-windows; mode: optimal; engine: slg-wam; scheduling: local] [Build date: 2015-04-22] | ?- halt. End XSB (cputime 0.05 secs, elapsetime 4.22 secs)
For Mac/Linux, please click "Agree" to accept the agreement. Then, you are directed to a download webpage. Please choose the proper install package. For example, on Mac, we use the version graphviz-2.38.0.pkg. When the package is downloaded to your local computer, move the mouse to the "graphviz-2.38.0.pkg", right click, a window will be popped and ask you whether you want to open it, choose "Open". Then, please follow the installation procedure and accept all default configurations. When the installation is completed, you might check the
dotcommand in a terminal by typing
my_home$ which dot /usr/local/bin/dot
For Windows, please download
graphviz-2.38.msiinstaller package and start the installer file. You might accept all default configurations. Please confirm if the
dotcommand is available by typing the command below. If not, then first determined directory containing dot.exe binary (
C:\Program Files (x86)\Graphviz2.38\bin) and added the directory containing the dot executable to my Windows PATH variable.
C:\Users\my_home> dot 'dot' is not recognized as an internal or external command, operable program or batch file.
Installing Git for Mac
The easiest is to use the graphical Git installer, which you can download from the SourceForge page
If you have
MacPortsinstalled, install Git via
$ sudo port install git
- If you have
Homebrewinstalled, install Git via
$ brew install git
Installing Git for Linux If you want to install Git on Linux via a binary installer, you can generally do so through the basic package-management tool that comes with your distribution. If you’re on Fedora, you can use
$ yum install git
Or if you’re on a Debian-based distribution like Ubuntu, try apt-get:
$ apt-get install git
Install Git for Windows: please download
Gitfor Windows from https://git-for-windows.github.io/. Run the downloaded
Git-2.11.1-64-bit.exeand accept default configuration. Then, finish installation. Please check the
gitcommand in the command shell by typing
git --version. Next, you might add the
path to bash executableincluded with "Git for Windows" (
C:\Program Files\Git\bin) to my Windows
pathvariable so that the bash script can run on the command prompt directly.
C:\Users\my_home> git --version git version 2.11.1.windows.1
SQLite: a high-reliability, embedded, zero-configuration, public-domain, SQL database engine. It is availabe at SQLite homepage.
Clone yw-idcc repository to local computer
yw-idcc-17 git repo to your local machine using the command from the terminal for Mac/Linux or the command shell for Windows.
git clone https://github.com/yesworkflow-org/yw-idcc-17.git
Running the Demo
Run the demo from the command shell. For Windows users, you might either run from
Git shell which contains the
bash command or run from the command prompt directly. The bash scritps have been tested on Mac and Windows platform.
Go to the
examples/folder. There are two types of examples demonstrated. One is single script implemented in various programming languages and the other is a R workflow project. We have provided four examples here:
- Type I: Single script in various programming languages: a MATLAB example (
C3C4/) and four Python examples (
- Type II: A real-life R workflow project
- Type I: Single script in various programming languages: a MATLAB example (
Go to one of the above example. First, run the cleaning script by calling
Run the demo example by calling
./make.sh. For Windows users, please reference the example below. Note that in some cases after adding
C:\Program Files\Gitto the
Pathvariable, and use
git-cmdcommand instead of the bash command. In this way, it works both using bash in
Git shelland using
git-cmdin command shell.
- For Mac/Linux platform,
my_home$ ls OHIBC_Howe_Sound_project docker queries README.md examples rules SQLiteToYaml poster_template yw_jar my_home$ cd examples/C3C4/ my_home$ ls clean.sh facts make.sh recon results script supplementary views my_home$ bash clean.sh my_home$ bash make.sh
- For Windows platform,
C:\Users\my_home\Desktop\yw-idcc-17>cd examples\C3C4 C:\Users\my_home\Desktop\yw-idcc-17\examples\C3C4>dir Volume in drive C is Windows8_OS Volume Serial Number is 6473-FB35 Directory of C:\Users\my_home\Desktop\yw-idcc-17\examples\C3C4 02/20/2017 10:39 AM <DIR> . 02/20/2017 10:39 AM <DIR> .. 02/18/2017 12:47 PM 132 clean.sh 02/18/2017 02:14 PM <DIR> facts 02/18/2017 12:47 PM 8,546 make.sh 02/18/2017 12:47 PM <DIR> recon 02/18/2017 02:14 PM <DIR> results 02/18/2017 12:47 PM <DIR> script 02/18/2017 12:47 PM <DIR> supplementary 02/18/2017 02:14 PM <DIR> views 2 File(s) 8,678 bytes 8 Dir(s) 77,619,445,760 bytes free C:\Users\my_home\Desktop\yw-idcc-17\examples\C3C4>bash make.sh
- Go to
results/folder and check the generated provenance query result. For Mac users, you might use
opencommand to access the PDF files, while for Windows users, you might use
startcommand to access the PDF files.
Developing your own Demo
Copy your example folder under examples/ folder.
Reorganize your directory layout for your example to be the same as
simulate_data_collection. Create a
recon/folder which contains your
Copy two script files
simulate_data_collectionof the existing three examples to your own example folder.
make.shand customize the scripting name, outputfile name, parameter data object name to your example.
How to run the Demo using Docker
We have created a Docker image (
yesworkflow/provenance-demo) to help readers to explore the YesWorkflow demonstrated provenance queries. In the
yesworkflow/provenance-demo image, the XSB, Graphivz, YesWorkflow, noWorkflow, dataone demo queries are installed. Users can boot up a Docker container to run the demo provenance queries using this image within seconds, without the need to manually install packages.
Here are instructions for each OS:
As part of this installation process, you’ll need to use a shell prompt. There’s a special version of the shell that comes pre-configured for using Docker commands. Users need to use the above shell prompt in order to run a Docker command or type a specific Docker command. Here is how to open it:
- Mac OS – launch the
Docker Quickstart Terminalapplication from Launchpad.
- Linux – launch any bash shell prompt, and
dockerwill already be available.
- Windows – click the
Docker Quickstart Terminalicon on your desktop.
Downloading Docker image
Users can use the following command to download the image from Docker Hub which is similar to GitHub. The command syntax is
docker pull IMAGE_NAME. The name of our current provenance query image is yesworkflow/provenance-demo. Users can type the following command into a shell prompt.
docker pull yesworkflow/provenance-demo
This will download the image from
Docker Hub for Docker images.
Running a container from a Docker image
Once downloaded the image, users can run it using the command
docker run. Executing
docker run will create a Docker container which is isolated from the user's local computer. Here are some configuration options for
-i: interactive session
-v H:C: mount the host path on your computer
Hat the path
Cinside the Docker container.
The full command to run the provenance query looks like:
docker run -it -v $HOME:$HOME yesworkflow/provenance-demo
Then, users can go to ... to check the query results.
- Q Zhang, Y Cao, Q Wang, D Vu, P Thavasimani, T McPhillips, P Missier, B Ludäscher. Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance. IDCC 2017 (Practice Paper track).
- Y Cao, P Slaughter, C Jones, MB Jones, Q Wang, D Vu, P Thavasimani, Q Zhang, T McPhillips, P Missier, L Walker, D Vieglais, B Ludäscher. Demonstrating Hybrid Provenance Queries from Script Runs. IDCC 2017 (Demo).
- BS Halpern, C Longo, D Hardy, KL McLeod, JF Samhouri, SK Katona, et al. (2012) An index to assess the health and benefits of the global ocean. Nature. 2012;488: 615–620. doi:10.1038/nature11397.
- Y Wei, S Liu, D Huntzinger, A Michalak, N Viovy, W Post, C Schwalm, K Schaefer, A Jacobson, C Lu, H Tian, D Ricciuto, R Cook, J Mao, X Shi. (2014) NACP MsTMIP: Global and North American Driver Data for Multi-Model Intercomparison. http://dx.doi.org/10.3334/ORNLDAAC/1220
- LIGO Open Science Center: Signal Processing with GW150914 Open Data. https://losc.ligo.org/events/GW150914/