graphkit

GraphKit is a pipeline processing tool for graph-based data extraction, transformation and analysis. The tool's graph model is based on annotated JSON schema definitions.

A typical pipeline might extract data from a set of CSV files or database tables, translate them to JSON using a given schema, combine them into an RDF graph, perform de-duplication and data integration, and eventually run a set of queries on the resulting graph.

Stages

The following stages / operations should be supported in the graph processing pipeline:

csv:read: Generate an iterator from a CSV file.
readtable: Generate an iterator from a SQL database table.
json:map: Apply a JSON schema mapping to the data coming from a source.
rdf:load: Import the data from a JSON stream into a triple store.
rdf:dedupe: Apply sameAs mappings based on some external mapping file.
rdf:sparql: Run a SPARQL query.
mql:query: Run an MQL query.
rdf:dump: Export RDF data to a file.
json:unmap: Apply a JSON schema mapping to convert objects to a flat table.
csv:write: Export data to a CSV file.

To link flat data structures to nested object graphs matching JSON schema definitions, jsonmapping is used.

Tests

The test suite will usually be executed in it's own virtualenv and perform a coverage check as well as the tests. To execute on a system with virtualenv and make installed, type:

$ make test

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
examples/everypol		examples/everypol
graphkit		graphkit
tests		tests
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/everypol

examples/everypol

graphkit

graphkit

tests

tests

.gitignore

.gitignore

DESIGN.md

DESIGN.md

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

Makefile

Makefile

README.md

README.md

setup.py

setup.py

Repository files navigation

graphkit

Stages

Tests

About

Releases

Packages

Languages

License

tmtmtmtm/graphkit

Folders and files

Latest commit

History

Repository files navigation

graphkit

Stages

Tests

About

Resources

License

Stars

Watchers

Forks

Languages