Skip to content

SETLr Basics Tutorial

Jim McCusker edited this page May 1, 2019 · 1 revision

SETLr is a powerful tool for creating RDF from tabular sources. This page will teach you the fundamentals of using SETLr to create semantic extract, transform, and load (SETL) workflows. We will start with a simple example of a spreadsheet containing just a few rows and columns, gradually introducing new SETL concepts and ideas as we work with more columns. By the end you will know the principles of SETLr and how to write your own SETL scripts.

Installation

To start, check out the code from Github, optionally create a python virtual environment, and install it using pip:

# Optional, but recommended.
virtualenv --no-site-packages venv
source venv/bin/activate

pip install setlr

Sample Data

To follow along with this tutorial, copy and paste the table in Sample Data into a spreadsheet program like Excel and save it as a CSV file called social.csv in an empty directory.

ID Name MarriedTo Knows DOB
Alice Alice Smith Bob Bob; Charles 1/12/1983
Bob Bob Smith Alice Alice; Charles 3/23/1985
Charles Charles Brown Alice; Bob 12/15/1955
Dave Dave Jones 4/25/1967

Starting Your SETL file

We are writing this SETL file in Turtle, which means we can define some convenient prefixes to make it easier to refer to certain vocabularies. The following prefixes should be added to the beginning of your SETL file, which here you should call social.setl.ttl:

@prefix prov:          <http://www.w3.org/ns/prov#> .
@prefix dcat:          <http://www.w3.org/ns/dcat#> .
@prefix dcterms:       <http://purl.org/dc/terms/> .
@prefix void:          <http://rdfs.org/ns/void#> .
@prefix setl:          <http://purl.org/twc/vocab/setl/> .
@prefix csvw:          <http://www.w3.org/ns/csvw#> .
@prefix pv:            <http://purl.org/net/provenance/ns#> .
@prefix :              <http://example.com/setl/> .

Extracting Data

A SETL file is an RDF file that uses the PROV Ontology to describe activities (extracts, transforms, and loads) that use and generate entities (tables and graphs). Extracting data is fairly straightforward. The following describes a process where a setl:Table entity, called :table, is generated by a setl:Extract activity that uses the file social.csv. Add it to your file to load social.csv into the resource :table:

:table a csvw:Table, setl:Table;
  csvw:delimiter ",";
  prov:wasGeneratedBy [
    a setl:Extract;
    prov:used <social.csv>;
  ].

The type csvw:Table tells setlr that the table is to be interpreted as a CSV table, using the CSV on the Web vocabulary. SETLr supports the ability to indicate the delimiter used (using csvw:delimiter) and the number of initial rows to skip (using csvw:skipRows) in the file. _setl:Table_s are parsed into a data frame object using Pandas internally, and directly extracting RDF files is also supported. SETLr supports extracting the following data types:

Type Format Options Parsed Type
csvw:Table, setl:Table Comma (or other) Separated Value (CSV, TSV, etc.) csvw:delimiter, csvw:skipRows Data Frame
setl:XPORT, setl:Table SAS Transport (XPORT) file format Data Frame
setl:SAS7BDAT, setl:Table SAS Dataset file format Data Frame
setl:Excel, setl:Table XLS or XLSX file format Data Frame
owl:Ontology OWL Ontology file in RDF RDF Graph
void:Dataset RDF File RDF Graph

We will use :table in the transformation process to generate some RDF. For more on Extract activities, see the Extract page.

Transforming tables into RDF using JSLDT

The transformation process is easily the most complex of the processes to write. JSLDT, or JSON-LD Templates, relies on the design of JSON-LD to be a flexible templating system for RDF. We start this by describing the transformation with a very simple template:

<http://example.com/social> a void:Dataset;
  prov:wasGeneratedBy [
    a setl:Transform, setl:JSLDT;
    prov:used :table;
    setl:hasContext '''{
  "foaf" : "http://xmlns.com/foaf/0.1/"
}''';
    prov:value '''[{
  "@id": "https://example.com/social/{{row.ID}}",
  "@type": "foaf:Person",
  "foaf:name": "{{row.Name}}"
}]'''].

Note that the dataset was generated by a setl:Transform that is also a setl:JSLDT, which tells SETLr how to process that transform. The property setl:hasContext is used to process contexts for all of the JSON-LD that is generated by this transform, but context can also be provided inside the JSLDT directly. The prov:value of the transform is the template itself:

[{
  "@id": "https://example.com/social/{{row.ID}}",
  "@type": "foaf:Person",
  "foaf:name": "{{row.Name}}"
}]

This template is generated over each row in the table, and every JSON key and value is applied through the Jinja templating engine. When the JSLDT is processed on the first row, it produces the following RDF in JSON-LD:

[{
  "@id": "https://example.com/social/Alice",
  "@type": "foaf:Person",
  "foaf:name": "Alice Smith"
}]

These individual JSON-LD graphs are then aggregated together into the final graph, here serialized into Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.com/social/Alice> a foaf:Person ;
    foaf:name "Alice Smith" .

<https://example.com/social/Bob> a foaf:Person ;
    foaf:name "Bob Smith" .

<https://example.com/social/Charles> a foaf:Person ;
    foaf:name "Charles Brown" .

<https://example.com/social/Dave> a foaf:Person ;
    foaf:name "Dave Jones" .

There is a lot more to learn about using JSLDT that will help you create exactly the RDF that you want. The JSLDT Template Language wiki page has the full tutorial on it.

Loading RDF Data

SETLr supports two types of loading, to a file on disk or to a SPARQL endpoint. Loading to a file is fairly straightforward:

<social.ttl> a pv:File;
    dcterms:format "text/turtle";
    prov:wasGeneratedBy [
      a setl:Load;
      prov:used <http://example.com/social> ;
    ].

SETLr supports the following formats:

  • RDF/XML:
    • default
    • application/rdf+xml
    • text/rdf
  • Turtle:
  • text/turtle
  • application/turtle
  • application/x-turtle
  • N-Triples: text/plain
  • N3: text/n3
  • TriG: application/trig
  • JSON-LD: application/json

SETLr loads data into a triple store if the type of the generated entity is _sd:Service and has a sd:endpoint value:

@prefix sd: <http://www.w3.org/ns/sparql-service-description#>.
:sparql_load a setl:Load, sd:Service;
  sd:endpoint <http://example.com/sparql>.

Runnning Your SETL Script

You can run your SETL script using the setlr command:

$ setlr social.setl.ttl

It will create a file called social.ttl that matches the example.