SETLr Basics Tutorial

SETLr is a powerful tool for creating RDF from tabular sources. This page will teach you the fundamentals of using SETLr to create semantic extract, transform, and load (SETL) workflows. We will start with a simple example of a spreadsheet containing just a few rows and columns, gradually introducing new SETL concepts and ideas as we work with more columns. By the end you will know the principles of SETLr and how to write your own SETL scripts.

Installation

To start, check out the code from Github, optionally create a python virtual environment, and install it using pip:

# Optional, but recommended.
virtualenv --no-site-packages venv
source venv/bin/activate

pip install setlr

Sample Data

To follow along with this tutorial, copy and paste the table in Sample Data into a spreadsheet program like Excel and save it as a CSV file called social.csv in an empty directory.

ID	Name	MarriedTo	Knows	DOB
Alice	Alice Smith	Bob	Bob; Charles	1/12/1983
Bob	Bob Smith	Alice	Alice; Charles	3/23/1985
Charles	Charles Brown		Alice; Bob	12/15/1955
Dave	Dave Jones			4/25/1967

Starting Your SETL file

We are writing this SETL file in Turtle, which means we can define some convenient prefixes to make it easier to refer to certain vocabularies. The following prefixes should be added to the beginning of your SETL file, which here you should call social.setl.ttl:

@prefix prov:          <http://www.w3.org/ns/prov#> .
@prefix dcat:          <http://www.w3.org/ns/dcat#> .
@prefix dcterms:       <http://purl.org/dc/terms/> .
@prefix void:          <http://rdfs.org/ns/void#> .
@prefix setl:          <http://purl.org/twc/vocab/setl/> .
@prefix csvw:          <http://www.w3.org/ns/csvw#> .
@prefix pv:            <http://purl.org/net/provenance/ns#> .
@prefix :              <http://example.com/setl/> .

Extracting Data

A SETL file is an RDF file that uses the PROV Ontology to describe activities (extracts, transforms, and loads) that use and generate entities (tables and graphs). Extracting data is fairly straightforward. The following describes a process where a setl:Table entity, called :table, is generated by a setl:Extract activity that uses the file social.csv. Add it to your file to load social.csv into the resource :table:

:table a csvw:Table, setl:Table;
  csvw:delimiter ",";
  prov:wasGeneratedBy [
    a setl:Extract;
    prov:used <social.csv>;
  ].

The type csvw:Table tells setlr that the table is to be interpreted as a CSV table, using the CSV on the Web vocabulary. SETLr supports the ability to indicate the delimiter used (using csvw:delimiter) and the number of initial rows to skip (using csvw:skipRows) in the file. _setl:Table_s are parsed into a data frame object using Pandas internally, and directly extracting RDF files is also supported. SETLr supports extracting the following data types:

Type	Format	Options	Parsed Type
csvw:Table, setl:Table	Comma (or other) Separated Value (CSV, TSV, etc.)	csvw:delimiter, csvw:skipRows	Data Frame
setl:XPORT, setl:Table	SAS Transport (XPORT) file format		Data Frame
setl:SAS7BDAT, setl:Table	SAS Dataset file format		Data Frame
setl:Excel, setl:Table	XLS or XLSX file format		Data Frame
owl:Ontology	OWL Ontology file in RDF		RDF Graph
void:Dataset	RDF File		RDF Graph

We will use :table in the transformation process to generate some RDF. For more on Extract activities, see the Extract page.

Transforming tables into RDF using JSLDT

The transformation process is easily the most complex of the processes to write. JSLDT, or JSON-LD Templates, relies on the design of JSON-LD to be a flexible templating system for RDF. We start this by describing the transformation with a very simple template:

<http://example.com/social> a void:Dataset;
  prov:wasGeneratedBy [
    a setl:Transform, setl:JSLDT;
    prov:used :table;
    setl:hasContext '''{
  "foaf" : "http://xmlns.com/foaf/0.1/"
}''';
    prov:value '''[{
  "@id": "https://example.com/social/{{row.ID}}",
  "@type": "foaf:Person",
  "foaf:name": "{{row.Name}}"
}]'''].

Note that the dataset was generated by a setl:Transform that is also a setl:JSLDT, which tells SETLr how to process that transform. The property setl:hasContext is used to process contexts for all of the JSON-LD that is generated by this transform, but context can also be provided inside the JSLDT directly. The prov:value of the transform is the template itself:

[{
  "@id": "https://example.com/social/{{row.ID}}",
  "@type": "foaf:Person",
  "foaf:name": "{{row.Name}}"
}]

This template is generated over each row in the table, and every JSON key and value is applied through the Jinja templating engine. When the JSLDT is processed on the first row, it produces the following RDF in JSON-LD:

[{
  "@id": "https://example.com/social/Alice",
  "@type": "foaf:Person",
  "foaf:name": "Alice Smith"
}]

These individual JSON-LD graphs are then aggregated together into the final graph, here serialized into Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.com/social/Alice> a foaf:Person ;
    foaf:name "Alice Smith" .

<https://example.com/social/Bob> a foaf:Person ;
    foaf:name "Bob Smith" .

<https://example.com/social/Charles> a foaf:Person ;
    foaf:name "Charles Brown" .

<https://example.com/social/Dave> a foaf:Person ;
    foaf:name "Dave Jones" .

There is a lot more to learn about using JSLDT that will help you create exactly the RDF that you want. The JSLDT Template Language wiki page has the full tutorial on it.

Loading RDF Data

SETLr supports two types of loading, to a file on disk or to a SPARQL endpoint. Loading to a file is fairly straightforward:

<social.ttl> a pv:File;
    dcterms:format "text/turtle";
    prov:wasGeneratedBy [
      a setl:Load;
      prov:used <http://example.com/social> ;
    ].

SETLr supports the following formats:

RDF/XML:
- default
- application/rdf+xml
- text/rdf
Turtle:
text/turtle
application/turtle
application/x-turtle
N-Triples: text/plain
N3: text/n3
TriG: application/trig
JSON-LD: application/json

SETLr loads data into a triple store if the type of the generated entity is _sd:Service and has a sd:endpoint value:

@prefix sd: <http://www.w3.org/ns/sparql-service-description#>.
:sparql_load a setl:Load, sd:Service;
  sd:endpoint <http://example.com/sparql>.

Runnning Your SETL Script

You can run your SETL script using the setlr command:

$ setlr social.setl.ttl

It will create a file called social.ttl that matches the example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly