Installing Prizms

Tim L edited this page Jul 6, 2015 · 93 revisions

What is first

What we will cover

This page describes how to install Prizms. It uses http://ieeevis.tw.rpi.edu and http://lod.melagrid.org as a examples.

Let's get to it!

What you need before getting started:

  • An address to a git repository (e.g., git@github.com:timrdf/ieeevis.git).
    • You can find the address in the lower-right of your GitHub project page (e.g. https://github.com/timrdf/ieeevis).
    • Make sure the repository has been initialized and has at least one file committed.
    • This repo will be used to version-control all dataset access and enhancement metadata.
    • It will coordinate between your development and production environments.
    • The repository isn't used to store data -- only its access metadata. This keeps the repository trim and enables others to reproduce your work.

What would be good to have before getting started:

You don't need these, but you get other freebies if you have them.

(1/3) First: Bootstrap Prizms

Do one of the two:

Run the following bootstrap command in bash; it performs Steps 1 and 2 (listed below) in a single command:

  • cd; bash < <(curl -sL http://purl.org/twc/install/prizms | grep -v "^#..bin/bash$")
    • (if you don't have curl installed, do so with sudo apt-get update; sudo apt-get install curl)

OR:

  • Step 1: Choose where you'd like to install Prizms. The location should be within your user space, and not ~/prizms since this is where the installer puts your development environments for each prizms project. When in doubt, just use ~/opt.
    • e.g. mkdir -p opt; cd opt
  • Step 2: Clone Prizms: $ git clone git://github.com/timrdf/prizms.git
    • e.g. git clone git://github.com/timrdf/prizms.git
    • If you don't have git, install it with sudo apt-get install git-core

(2/3) Next: Review, Understand, and Set the Eight Configuration Parameters

  • Step 3: Run the installer's --help for an overview of the configuration parameters
    • e.g. lebot@melagrid:~/opt$ prizms/bin/install.sh --help

(Note: the syntax highlighting is unintentional, it is meaningless.)

usage: install.sh [--me <your-URI>] [--my-email <your-email>] 
                  [--proj-user <user>] [--proj-home <dir>] [--repos <code-repo>] 
                  [--upstream-ckan <ckan>] [--our-base-uri <uri>] [--our-source-id <source-id>]
                  [--our-datahub-id <datahub-id>]

This script will determine and use the following parameters to install an instance of Prizms:
  (these arguments must be provided in the order listed)

 --me             | [optional] the project developer's  URI                              (e.g. http://jsmith.me/foaf#me)

 --as             | [optional] the project developer's user name                         (e.g. jsmith)

 --my-email       |            the project developer's  email address                    (e.g. me@jsmith.me)
                  : This email will be used to create an SSH key (if none exists; with your confirmation)
                  : This email will be set as git's user.email setting (with your confirmation)

 --proj-user      | the project's                       user name                        (e.g. melagrid)

 --proj-home      | the project's                       user directory home              (e.g. /data/home)

 --repos          | the project's code repository                                        (e.g. git@github.com:timrdf/ieeevis.git)

 --upstream-ckan  | [optional] the URL of a CKAN from which to pull dataset listings     (e.g. http://data.melagrid.org)
                  : see https://github.com/jimmccusker/twc-healthdata/wiki/Retrieving-CKAN%27s-Dataset-Distribution-Files

 --our-base-uri   | the HTTP namespace for all datasets in the Prizms that we are making (e.g. http://lod.melagrid.org)
                  : see https://github.com/timrdf/csv2rdf4lod-automation/wiki/Conversion-process-phase%3A-name

 --our-source-id  | the identifier for *us* as an organization that produces datasets.   (e.g. melagrid-org)
                  : see https://github.com/timrdf/csv2rdf4lod-automation/wiki/Conversion-process-phase%3A-name

 --our-datahub-id | datahub.io's CKAN identifier for this dataset.                       (e.g. melagrid)
                  : This id is for use within the namespace http://datahub.io/dataset/<our-datahub-id>.
                  : see https://github.com/jimmccusker/twc-healthdata/wiki/Listing-twc-healthdata-as-a-LOD-Cloud-Bubble

If the required parameters are not provided, the script will ask for them interactively before installing anything.
The installer will ask permission before performing each install step, so that you know what it's doing.

https://github.com/timrdf/prizms/wiki
https://github.com/timrdf/prizms/wiki/Installing-Prizms

The following figure illustrates the configuration parameters.

  • --me is your foaf:Person URI,
  • --my-email is used to create an ssh key (if you don't have one in .ssh), and is set in your local git configuration (as user.email ).
  • --proj-user is a unix user name that will be created (if it doesn't exist) to publish your Linked Data. It will share permissions in your htdocs directory and own the cronjob that it runs to analyze your Linked Data.
  • --repos is a git repository URL which is traditionally hosted on GitHub. Prizms will use (or create if it's not there) some directories such as data/source and commit them back into the repository. Your unix account is a Prizms development environment and you push to your production environment via this git repository. --proj-user will the pull the same repository and react to the metadata that it finds to retrieve, convert, and publish new datasets that are in your project.
  • --upstream-ckan is an OPTIONAL parameter. If your Prizms node is focused on converting the datasets listed in an existing CKAN site (such as http://catalog.data.gov or http://data.gov.uk or http://data.melagrid.org), then setting this parameter will tell Prizms to pull the appropriate access metadata and start converting it.
  • --our-base-uri is the URI namespace for your data. For example, http://logd.tw.rpi.edu, http://healthdata.tw.rpi.edu, or http://ieeevis.tw.rpi.edu. Prizms will use this namespace to create URIs for all entities that it identifies, e.g. http://ieeevis.tw.rpi.edu/id/venue/vast/2006
  • --our-source-id is the short string identifier for your source organization which is creating this Prizms data site. When aggregating others datasets into your data site, Prizms uses short string identifiers for the organization from which you retrieved the data, e.g. 'epa-gov' or 'twitter-com-timrdf'. In the case for --our-source-id, Prizms needs to know what source organization it should collect the datasets that you created on this data set. You are just another organization that's providing data, and Prizms needs to know what you call yourself. Some reasonable values for this value are
      1. a trimmed and tidied form of your namespace (e.g. ieeevis-tw-rpi-edu),
      1. the same value as --proj-user (e.g. ieeevis),
      1. a short reference to you (e.g. us as in "we").
  • --our-datahub-id is used to construct the URL for your datahub.io dataset entry. Listing your Prizms data site on http://datahub.io will let others find your data. The value of this parameter is appended to the string http://datahub.io/dataset/, and metadata specific to the LOD Cloud Diagram are automatically submitted to that dataset entry. Be sure that if you're creating a new dataset that the entry does not already exist. If it doesn't exist, then Prizms will create the CKAN entry for you.

(3/3) Finally: Install with the Eight Configuration Parameters

  • Step 4: Run the installer
    • If you want to be walked through each setting, omit the parameters e.g. lebot@melagrid:~/opt$ prizms/bin/install.sh
    • If you know some of the settings already, specify them to avoid a lengthy interview. For example:
lebot@melagrid:~/opt$ prizms/bin/install.sh --me http://tw.rpi.edu/instances/TimLebo --proj-user melagrid --repos git@github.com:jimmccusker/melagrid.git --upstream-ckan http://data.melagrid.org --our-base-uri http://lod.melagrid.org --our-source-id melagrid-org --our-datahub-id melagrid

Running as a second developer, only the URI of the person changes:

jimmccusker@melagrid:~/opt$ prizms/bin/install.sh --me http://tw.rpi.edu/instances/JamesMcCusker --proj-user melagrid --repos git@github.com:jimmccusker/melagrid.git --upstream-ckan http://data.melagrid.org --our-base-uri http://lod.melagrid.org --our-source-id melagrid-org --our-datahub-id melagrid

Examples

Melagrid

What is next