The Role of Event-Time Order in Data Streaming Analysis – Examples
Slides and code from the ACM DEBS 2020 Tutorial titled "The Role of Event-Time Order in Data Streaming Analysis", given by Vincenzo Gulisano (Chalmers University of Technology), Dimitris Palyvos-Giannas (Chalmers University of Technology), Bastian Havers (Chalmers University of Technology & Volvo Cars) and Marina Papatriantafilou (Chalmers University of Technology).
The corresponding paper is available online.
The recording of this tutorial can be found at:
The demonstration dataset used for the disorder examples from
DisorderQuery.java is already provided in the folder
The larger dataset used in FullQuery is available for download here.
Place it in a location of your choice and change the variable
Config.java to the downloaded file. Then, in the same java file, point
PATH_TO_REPOSITORY to the location to which you downloaded the repository.
Optional: Grafana dashboard
To your Flink config file found at
YOUR_FLINK_HOME/conf/flink-conf.yaml, append the following lines:
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter metrics.reporter.grph.host: localhost metrics.reporter.grph.port: 2003 metrics.reporter.grph.protocol: TCP
Start a Flink cluster (as described here), and it will show up as a metric in the Graphite connector in Grafana, along with all the Flink jobs that you submit to the cluster.
Build the jar file with Maven:
cd PATH_TO_REPOSITORY mvn clean package
There are two main classes provided in this repository,
Both have been tested with Apache Flink version >= 1.9.2.
DisorderQuery.javaruns without parameters, output files are created in the folder
FullQuery.javaexpects two parameters. The first is an integer >= 1 and defines the parallelism of the Join. The second parameter is either
false. If set to
false, the query will execute normally.
trueenables sleep timers in the operators and stalls the watermark progress slightly, to make differences in the watermarks more visible and to simulate a higher load. Output files are created in the folder
./YOUR_FLINK_HOME/bin/flink start ./YOUR_FLINK_HOME/bin/flink run -c DEBS2020streamingEventTime.FullQuery PATH_TO_REPOSITORY/target/DEBS2020streamingEventTime-1.0.jar 4 false
will start the Full Query with Join parallelism 4 and no sleep timers.