This repository contains code, datasets and other related resources of our paper titled "Stream Reasoning with DatalogMTL" (Under Review).
1. Introduction: [Back to Top]
In the paper we study stream reasoning in DatalogMTL—an extension of Datalog with metric temporal operators. In particular, we propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating DatalogMTL programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set.We have implemented our stream reasoning algorithms as an extension of the DatalogMTL reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.
We put in the programs folder the three programs we used in our experiments, where short_stop.txt, meteor_nonrecursive.txt and meteor_recursive.txt correspond to \Prog, \Prog_{nr} and \Prog_{r} in the paper.
Due to the GitHub Storage Limits, we are unable to upload to this repo the four datasets D1 (5 million), D2 (10 million), D3 (15 million), D4 (20 million), generated by LUBM. Hence, in order for users to replicate our experiments, we put the four datasets in the google drive and users can download them in this publicly available link. For the other two Hackathon datasets S1 and S2 used in our experiments, we include them in the demo folder of this repo.
The above-mentioned datasets were all synthetic datasets, so we also describe the process of automatic generation in the following two parts in case some users want to generate some new synthetic datasets by themselves.
You can download the data generator (UBA) from SWAT Projects - Lehigh University Benchmark (LUBM) website. In particular, we used UBA1.7.
After downloading the UBA1.7 package, you need to add package's path to CLASSPATH. For examole,
export CLASSPATH="$CLASSPATH:your package path"
==================
USAGES
==================
command:
edu.lehigh.swat.bench.uba.Generator
[-univ <univ_num>]
[-index <starting_index>]
[-seed <seed>]
[-daml]
-onto <ontology_url>
options:
-univ number of universities to generate; 1 by default
-index starting index of the universities; 0 by default
-seed seed used for random data generation; 0 by default
-daml generate DAML+OIL data; OWL data by default
-onto url of the univ-bench ontology
We found some naming and storage issues when using the above command provided by the official documentation. To provide a more user-friendly way, we wrote a script which can be directly used to generate required owl files by passing some simple arguments. An example is shown as follows,
from meteor_reasoner.datagenerator import generate_owl
univ_nume = 1 # input the number of universities you want to generate
dir_name = "./data" # input the directory path used for the generated owl files.
generate_owl.generate(univ_nume, dir_name)
In ./data, you should obtain a serial of owl files like below,
University0_0.owl
University0_12.owl
University0_1.owl
University0_4.owl
.....
Then, we need to convert the owl files to datalog-like facts. We also prepare a script that can be directly used to do the conversion.
from meteor_reasoner.datagenerator import generate_datalog
owl_path = "owl_data" # input the dir_path where owl files locate
out_dir = "./output" # input the path for the converted datalog triplets
generate_datalog.extract_triplet(owl_path, out_dir)
In ./output, you should see a ./output/owl_data containing data in the form of
UndergraduateStudent(ID0)
undergraduateDegreeFrom(ID1,ID2)
takesCourse(ID3,ID4)
undergraduateDegreeFrom(ID5,ID6)
UndergraduateStudent(ID7)
name(ID8,ID9)
......
and ./output/statistics.txt containing the statistics information about the converted datalog-like data in the form of
worksFor:540
ResearchGroup:224
....
AssistantProfessor:146
subOrganizationOf:239
headOf:15
FullProfessor:125
The number of unique entities:18092
The number of triplets:8604
Up to now, we only construct the atemporal data, so the final step will be adding temporal information (intervals) to these atemporal data. In the stream reasoning scenario, we consider punctual intervals, namely, the leftendpint equals to the right endpoint (e.g., A@[1,1]). To be more specific, assuming that we have a datalog-like dataset in datalog/datalog_data.txt, if we want to create a dataset containing 10000 facts and each facts has at most 2 intervals, each of time points are randomly chosen from a range [0, 300], we can run he following command (remember to add --min_val=0, --max_val=300, --punctual).
python add_intervals.py --datalog_file_path datalog/datalog_data.txt --factnum 10000 --intervalnum 2 --min_val 0 --max_val 300 --punctual
In the datalog/10000.txt, there should be 10000 facts, each of which in the form P(a,b)@\varrho, and a sample of facts are shown as follows,
undergraduateDegreeFrom(ID1,ID2)@[7,7]
takesCourse(ID34,ID4)@[46,46]
undergraduateDegreeFrom(ID5,ID6)@[21,21]
name(ID18,ID9)@[22,22]
......
The Hackathon Challenge, organised at the Stream Reasoning Workshop 2021, provides a stream generator together with several reasoning tasks. We considered the scenario in the challenge where input streams contain data from Eclipse Simulation of Urban MObility (SUMO) describing road vehicles in a traffic jam, and the task is to detect vehicles that make a short stop (less than 5 seconds). A detailed description about the Stream Generator could be found here. The main idea is to use a server to generate a stream of data and then stream the data to the client through a WebSocket. The server is also acting as a web server to control the server through our provided REST API.
In particular, to correctly generate the datasets in DatalogMTL-style, we modified the original files (client.py and custom_websocket_client.py) in the "example-client" folder in the Hackathon repo. To be more specific, after building and run the docker image on local machine, you should replace the two files in example-client folder with our customized file (download from here).
Given the generated static temporal data, we mock the stream reasoning scenarios by writing a script to read the static temporal data and then output a set of facts having the same punctual point each time. (NOTE that you need to download LUBM datasets link and unzip into datasets folder first if you want to run the LUBM experiments.)
# For Hackathon S1
python meteor_str_stream_reasoning.py --datapath datasets/S1.txt --rulepath programs/short_stop.txt --target ShortStop
# For Hackathon S2
python meteor_str_stream_reasoning.py --datapath datasets/S2.txt --rulepath programs/short_stop.txt --target ShortStop
# For LUBM D1 and non-recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D2 and non-recursive program
python meteor_str_tream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D3 and non-recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D4 and non-recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D1 and recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D2 and recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D3 and recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D4 and recursive program
python meteor_str_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
A log file (meteor_str_mock.log) will be generated in the current directory.
# For Hackathon S1
python meteor_stream_reasoning.py --datapath datasets/S1.txt --rulepath programs/short_stop.txt --target ShortStop
# For Hackathon S2
python meteor_stream_reasoning.py --datapath datasets/S2.txt --rulepath programs/short_stop.txt --target ShortStop
# For LUBM D1 and non-recursive program
python meteor_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D2 and non-recursive program
python meteor_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D3 and non-recursive program
python meteor_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D4 and non-recursive program
python meteor_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_nonrecursive.txt --target a1:AssociateProfessorCandidate
# For LUBM D1 and recursive program
python meteor_stream_reasoning.py --datapath datasets/D1.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D2 and recursive program
python meteor_stream_reasoning.py --datapath datasets/D2.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D3 and recursive program
python meteor_stream_reasoning.py --datapath datasets/D3.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
# For LUBM D4 and recursive program
python meteor_stream_reasoning.py --datapath datasets/D4.txt --rulepath programs/meteor_recursive.txt --target a1:Scientist
A log file (meteor_mock.log) will be generated in the current directory.
In particular, we provide two additional arguments --input_detail and --output_detail, which are used to control whether you want to save in the log file the input streams and the out stream at each time point, respectively.
For any questions, please drop an email to Dingmin Wang (wangdimmy@gmail.com).