Skip to content

yasarkhangithub/PolyWeb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolyWeb

PolyWeb is a query federation approach which can federate SPARQL queries over different data models (RDF and non-RDF data models) simultaneously.

Experimental Setup

The experimental setup (i.e. code, datasets, setting, queries) for evaluation of PolyWeb is described here.

Code

The PolyWeb source code can be checkedout from PolyWeb GitHub Page.

Datasets

There are three different types of datasets used for PolyWeb evaluation experiments, i.e. RDF, RDB and TSV. RDF dataset is available in the form of Virtuoso db file, RDB dataset is available in the form of MySQL database dump file and TSV dataset is available in the form of TSV file.

These datasets can be downloaded from PolyWeb Datasets.

Settings

Each RDF dataset was loaded into a different Virtuoso (Open Source v.7.2.4.2) SPARQL endpoint SPARQL endpoint on separate physical machines. Relational data set is loaded into MySQL database (version 5.7.14) and CSV dataset into Apache Drill (version 1.8.0). All experiments are carried out on a local network, so that network cost remains negligible. The machines used for experiments have a 2.60 GHz Core i7 processor, 8 GB of RAM and 500 GB hard disk running a 64-bit Windows 7 OS. The database configuration parameters are used as default configuration for MySQL database. Default configurations are also used for Apache Drill. The configuration parameters are listed in Table 1 below.

Table 1: Datasets Configuration

Dataset Port URL Config Parameters
COSMIC-CNV 8899 {System-IP}:8899/sparql NoB=680000, MDF=500000, MQM=8G
TCGA-OV-CNV 8085 jdbc:mysql://{System-IP}:8085/tcga_ov_cnv Default configuration
CNVD-CNV 8047 {System-IP}:8047/query.json Default configuration
  • NoB = NumberOfBuffers
  • MDF = MaxDirtyBuffers
  • MQM = MaxQueryMem

Queries

A total of 10 queries are designed to evaluate and compare the query federation performance of PolyWeb against FedX and HiBISCuS based on the metrics defined.

Queries used in evaluation experiments of PolyWeb can be downloaded from PolyWeb Queries.

Comparison Metrics

For each query type we measured (1) the number of sources selected; (2) the average source selection time; (3) the average query execution time; and (4) the number of results returned to assess result completeness relatively. The performance of PolyWeb, FedX and HiBISCuS are compared based on these metrics.

Team

Yasar Khan

Antoine Zimmermann

Alokkumar Jha

Dietrich Rebholz-Schuhmann

Ratnesh Sahay

About

PolyWeb: Querying Web Polystores

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages