Skip to content

This is the repository for doing stream reasoning with DatalogMTL

Notifications You must be signed in to change notification settings

wdimmy/StreamReasoningWithDatalogMTL

Repository files navigation

Stream Reasoning with DatalogMTL

This repository contains code, datasets and other related resources of our paper titled "Stream Reasoning with DatalogMTL" (Under Review).

Overview:


1. Introduction: [Back to Top]

In the paper we study stream reasoning in DatalogMTL—an extension of Datalog with metric temporal operators. In particular, we propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating DatalogMTL programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set.We have implemented our stream reasoning algorithms as an extension of the DatalogMTL reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.


2. Programs

We put in the programs folder the three programs we used in our experiments, where short_stop.txt, meteor_nonrecursive.txt and meteor_recursive.txt correspond to \Prog, \Prog_{nr} and \Prog_{r} in the paper.

3. Datasets

Due to the GitHub Storage Limits, we are unable to upload to this repo the four datasets D1 (5 million), D2 (10 million), D3 (15 million), D4 (20 million), generated by LUBM. Hence, in order for users to replicate our experiments, we put the four datasets in the google drive and users can download them in this publicly available link. For the other two Hackathon datasets S1 and S2 used in our experiments, we include them in the demo folder of this repo.

The above-mentioned datasets were all synthetic datasets, so we also describe the process of automatic generation in the following two parts in case some users want to generate some new synthetic datasets by themselves.

3.1 Lehigh University Benchmark (LUBM)
3.1.1 Download the LUBM data generator

You can download the data generator (UBA) from SWAT Projects - Lehigh University Benchmark (LUBM) website. In particular, we used UBA1.7.

After downloading the UBA1.7 package, you need to add package's path to CLASSPATH. For examole,

export CLASSPATH="$CLASSPATH:your package path"
3.1.2 Generate the owl files
==================
USAGES
==================

command:
   edu.lehigh.swat.bench.uba.Generator
      	[-univ <univ_num>]
	[-index <starting_index>]
	[-seed <seed>]
	[-daml]
	-onto <ontology_url>

options:
   -univ number of universities to generate; 1 by default
   -index starting index of the universities; 0 by default
   -seed seed used for random data generation; 0 by default
   -daml generate DAML+OIL data; OWL data by default
   -onto url of the univ-bench ontology

We found some naming and storage issues when using the above command provided by the official documentation. To provide a more user-friendly way, we wrote a script which can be directly used to generate required owl files by passing some simple arguments. An example is shown as follows,

from meteor_reasoner.datagenerator import generate_owl

univ_nume = 1 # input the number of universities you want to generate
dir_name = "./data" # input the directory path used for the generated owl files.

generate_owl.generate(univ_nume, dir_name)

In ./data, you should obtain a serial of owl files like below,

University0_0.owl 
University0_12.owl  
University0_1.owl
University0_4.owl
.....

Then, we need to convert the owl files to datalog-like facts. We also prepare a script that can be directly used to do the conversion.

from meteor_reasoner.datagenerator import generate_datalog

owl_path = "owl_data" # input the dir_path where owl files locate
out_dir = "./output" # input the path for the converted datalog triplets

generate_datalog.extract_triplet(owl_path, out_dir)

In ./output, you should see a ./output/owl_data containing data in the form of

UndergraduateStudent(ID0)
undergraduateDegreeFrom(ID1,ID2)
takesCourse(ID3,ID4)
undergraduateDegreeFrom(ID5,ID6)
UndergraduateStudent(ID7)
name(ID8,ID9)
......

and ./output/statistics.txt containing the statistics information about the converted datalog-like data in the form of

worksFor:540
ResearchGroup:224
....
AssistantProfessor:146
subOrganizationOf:239
headOf:15
FullProfessor:125
The number of unique entities:18092
The number of triplets:8604
3.1.3 Add punctual intervals

Up to now, we only construct the atemporal data, so the final step will be adding temporal information (intervals) to these atemporal data. In the stream reasoning scenario, we consider punctual intervals, namely, the leftendpint equals to the right endpoint (e.g., A@[1,1]). To be more specific, assuming that we have a datalog-like dataset in datalog/datalog_data.txt, if we want to create a dataset containing 10000 facts and each facts has at most 2 intervals, each of time points are randomly chosen from a range [0, 300], we can run he following command (remember to add --min_val=0, --max_val=300, --punctual).

python add_intervals.py --datalog_file_path datalog/datalog_data.txt --factnum 10000 --intervalnum 2 --min_val 0 --max_val 300 --punctual 

In the datalog/10000.txt, there should be 10000 facts, each of which in the form P(a,b)@\varrho, and a sample of facts are shown as follows,

undergraduateDegreeFrom(ID1,ID2)@[7,7]
takesCourse(ID34,ID4)@[46,46]
undergraduateDegreeFrom(ID5,ID6)@[21,21]
name(ID18,ID9)@[22,22]
......
3.2 Hackathon Benchmark

The Hackathon Challenge, organised at the Stream Reasoning Workshop 2021, provides a stream generator together with several reasoning tasks. We considered the scenario in the challenge where input streams contain data from Eclipse Simulation of Urban MObility (SUMO) describing road vehicles in a traffic jam, and the task is to detect vehicles that make a short stop (less than 5 seconds). A detailed description about the Stream Generator could be found here. The main idea is to use a server to generate a stream of data and then stream the data to the client through a WebSocket. The server is also acting as a web server to control the server through our provided REST API.

In particular, to correctly generate the datasets in DatalogMTL-style, we modified the original files (client.py and custom_websocket_client.py) in the "example-client" folder in the Hackathon repo. To be more specific, after building and run the docker image on local machine, you should replace the two files in example-client folder with our customized file (download from here).

4. Mocking stream reasoning scenarios

Given the generated static temporal data, we mock the stream reasoning scenarios by writing a script to read the static temporal data and then output a set of facts having the same punctual point each time. (NOTE that you need to download LUBM datasets link and unzip into datasets folder first if you want to run the LUBM experiments.)

4.1 Using MeTeoR_Str to do the streaming reasoning
 # For Hackathon S1
 python meteor_str_stream_reasoning.py --datapath datasets/S1.txt --rulepath programs/short_stop.txt --target ShortStop
 
# For Hackathon S2
 python meteor_str_stream_reasoning.py --datapath datasets/S2.txt --rulepath programs/short_stop.txt --target ShortStop

 # For LUBM D1 and non-recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D2 and non-recursive program 
 python meteor_str_tream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D3 and non-recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate

 # For LUBM D4 and non-recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D1 and recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D2 and recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D3 and recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D4 and recursive program 
 python meteor_str_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

A log file (meteor_str_mock.log) will be generated in the current directory.


4.2 Using MeTeoR to do the streaming reasoning
 # For Hackathon S1
 python meteor_stream_reasoning.py --datapath datasets/S1.txt --rulepath programs/short_stop.txt --target ShortStop
 
# For Hackathon S2
 python meteor_stream_reasoning.py --datapath datasets/S2.txt --rulepath programs/short_stop.txt --target ShortStop

 # For LUBM D1 and non-recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D2 and non-recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D3 and non-recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate

 # For LUBM D4 and non-recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
 
 # For LUBM D1 and recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D2 and recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D3 and recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

 # For LUBM D4 and recursive program 
 python meteor_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_recursive.txt --target  a1:Scientist

A log file (meteor_mock.log) will be generated in the current directory.


In particular, we provide two additional arguments --input_detail and --output_detail, which are used to control whether you want to save in the log file the input streams and the out stream at each time point, respectively.

Contact

For any questions, please drop an email to Dingmin Wang (wangdimmy@gmail.com).

About

This is the repository for doing stream reasoning with DatalogMTL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages