Skip to content

Batch Mode for JSON LD Generation

dkapoor edited this page Sep 8, 2016 · 23 revisions

Karma can be used in a batch mode to generate JSON-LD for large datasets. This can be done using a command line Utility OfflineRDFGenerator or using the Karma JSON-LD Generation API

OfflineRDFGenerator

This is a command line utility to load a model and a source, and then generate RDF and JSON-LD. The source can be JSON, XML, CSV or database. With database, the API loads 10,000 rows at a time. Karma home setting KARMA_USER_HOME should be set appropriately: see Configuration.

Building the karma-offline JAR

To build the offline jar, goto the karma-offline subdirectory and execute the following:

cd karma-offline
mvn install -P shaded

This builds a standalone jar karma-offline-0.0.1-SNAPSHOT-shaded.jar in the target sub-folder or karma-offline that can be used to generate RDF and JSON-LD in batch mode

Generating JSON-LD using karma-offline

To generate JSON-LD when the source is a file, go the the karma-offline/target sub-directory of Karma and execute the following command:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype <sourcetype> \
--filepath <filepath> \
--modelfilepath <modelfilepath> \
--sourcename <sourcename> \
--outputfile <rdf-outputfile> \
--jsonoutputfile <json-outputfile> \
[--contextfile <contextfile> | --contexturl <contextUrl>] \
[--selection <selectionName] \
[--root <rootClassIDForJsonLD>] \
[--killtriplemap <triplemapid to stop from expansion> ] \
[--stoptriplemap <stop the rdf generation from this triplemapid onwards> ] 

Example invocation for a JSON file:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype JSON \
--filepath "/files/data/wikipedia.json" \
--modelfilepath "/files/models/model-wikipedia.ttl" \
--sourcename wikipedia \
--outputfile wikipedia-rdf.n3 \
--contextfile wiki-context.json \
--root "http://schema.org/Document1" \
--jsonoutputfile wikipedia.json

For a CSV file, you can specify additional parameters, such as the delimiter, text qualifier, header start index and the data start index. Example invocation for a JSON file with tab as delimiter and quotes as qualifier:

Example invocation for a CSV file:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype CSV \
--filepath "/files/data/wikipedia.csv" \
--delimiter TAB \
--textqualifier '\\\"' \
--headerindex 1 \
--dataindex 2 \
--modelfilepath "/files/models/model-wikipedia.ttl" \
--sourcename wikipedia \
--outputfile wikipedia-rdf.n3 \
--contextfile wiki-context.json \
--root "http://schema.org/Document1" \
--jsonoutputfile wikipedia.json

To generate JSON-LD of a database table, go to the karma-offline subdirectory of Karma and run the following command from terminal:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB \
--modelfilepath <modelfilepath> \
--outputfile <outputfile> \
--jsonoutputfile <json-outputfile> \
[--contextfile <contextfile> | --contexturl <contextUrl>] \
[--selection <selectionName] \
[--root <rootClassIDForJsonLD>] \
[--killtriplemap <triplemapid to stop from expansion> ] \
[--stoptriplemap <stop the rdf generation from this triplemapid onwards> ] \
--dbtype <dbtype> \
--hostname <hostname> \
--username <username> \
--password <password> \
--portnumber <portnumber> \
--dbname <dbname> \
--tablename <tablename>

Valid argument values for dbtype are Oracle, MySQL, SQLServer, PostGIS, Sybase

Example invocation:

java -cp mysql-connector-java-5.0.8-bin.jar:karma-offline-0.0.1-SNAPSHOT-shaded.jar \
edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB \
--dbtype MySQL \
--hostname localhost \
--username root \
--password mypassword \
--portnumber 3306 \
--dbname karma \
--tablename offlineUsers \
--modelfilepath "/Users/dipsy/karma-projects/offlineUsers-model.ttl" \
--outputfile offlineUsers-rdf.n3 \
--contentfile person-context.json \
--jsonoutputfile offlineUsers-jdonld.json \
--root "http://schema.org/Person1"

Using Selection Feature in Offline Mode

If the model requires a selection, the selection name 'DEFAULT_TEST 'needs to be passed as a command line argument --selection to the OfflineRDFGenerator. This makes it possible to execute the same model with or without selection in offline mode. Example invocation:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB --dbtype SQLServer \
--hostname example.com --username root --password secret \
--portnumber 1433 --dbname Employees --tablename Person \
--modelfilepath "/files/models/db-r2rml-model.ttl" \
--outputfile db-rdf.n3 \
--contextfile db-context.json \
--root "http://schema.org/Person1" \
--sourcename wikipedia \
--selection "DEFAULT_TEST" \
--jsonoutputfile db.json

Generating Context from Model

To generate the context from the model using command line, you can use the following utility:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.GenerateContextFromModel \
--modelpath <path-to-model-file>
--outputfile <optional, output-file-name>

Example:

java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.GenerateContextFromModel \
--modelpath language-model-1.txt \
--outputfile language-context.json

Sample output

{"@context": {
    "a": "@type",
    "prefLabel": {"@id": "http://www.w3.org/2008/05/skos#prefLabel"},
    "Concept": {
        "@type": "@id",
        "@id": "http://www.w3.org/2008/05/skos#Concept"
    },
    "url": "@id"
}}

GenericRDFGenerator

This API is meant for repeated RDF/JSON-LD generation from the same model. In this setting we load the models at the beginning and then every time the user does a query we use the model to generate RDF. The input can be JSON, CSV or an XML File / String / InputStream.

edu.isi.karma.rdf.GenericRDFGenerator

API to add a model to the RDF Generator

// modelIdentifier : Provides a name and location of the model file
void addModel(R2RMLMappingIdentifier modelIdentifier); 

API to generate the JSON-LD For a Request

//request : Provides all details for the Inputs to the RDF Generator like the input data, setting for provenance etc
void generateRDF(RDFGeneratorRequest request)

edu.isi.karma.rdf.RDFGeneratorRequest

API to set the input data

//inputData : Input Data as String
public void setInputData(String inputData)

//inputStream: Input data as a Stream
public void setInputStream(InputStream inputStream)

//inputFile: Input data file
public void setInputFile(File inputFile)

API to set the input data type

//dataType: Valid values: CSV,JSON,XML,AVRO
public void setDataType(InputType dataType)

Setting to generate provenance information

//addProvenance -> flag to indicate if provenance information should be added to the RDF
public void setAddProvenance(boolean addProvenance) 

The writer for RDF

//writer -> Writer for the output. For JSON-LD generation, this should be JSONKR2RMLRDFWriter
public void addWriter(KR2RMLRDFWriter writer)

Example use:

GenericRDFGenerator rdfGenerator = new GenericRDFGenerator();

//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
				"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);

String filename = "files/data/people.json";
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
JSONKR2RMLRDFWriter writer = new JSONKR2RMLRDFWriter(pw);
RDFGeneratorRequest request = new RDFGeneratorRequest("people-model", filename);
request.setInputFile(new File(getTestResource(filename).toURI()));
request.setAddProvenance(true);
request.setDataType(InputType.JSON);
request.addWriter(writer);
rdfGenerator.generateRDF(request);
String jsonld = sw.toString();
System.out.println("Generated JSON-LD: " + jsonld);

Using Selection Feature in the API

If the model requires a selection, GenericRDFGenerator provides a contructor that takes in the selection name 'DEFAULT_TEST 'as the argument.

Example use:

GenericRDFGenerator rdfGenerator = new GenericRDFGenerator('DEFAULT_TEST');

//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
				"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);

String filename = "files/data/people.json";
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
JSONKR2RMLRDFWriter writer = new JSONKR2RMLRDFWriter(pw);
RDFGeneratorRequest request = new RDFGeneratorRequest("people-model", filename);
request.setInputFile(new File(getTestResource(filename).toURI()));
request.setAddProvenance(true);
request.setDataType(InputType.JSON);
request.addWriter(writer);
rdfGenerator.generateRDF(request);
String jsonld = sw.toString();
System.out.println("Generated JSON-LD: " + jsonld);
Clone this wiki locally