Home

Thunderain is a Real-Time Analytical Processing (RTAP) example using Spark and Shark, which can be best characterized by the following four salient properties:

Data continuously streamed in & processed in near real-time
Real-time data queried and presented in an online fashion
Real-time and history data combined and mined interactively
Predominantly RAM-based processing

Architecture

The whole architecture of Thunderain is like this:

Here data is collected from web server and transfered by Kafka message queue, Spark Streaming cluster will fetch data from Kafka in each batch duration and process it. Processed data can be put into in memory table using Shark's readable format. User can connect to embedded SharkServer for querying, Also user can self-implement output class to store processed data in any other way.

The UML class chart is:

Here each App is bound to each Kafka topic, if you want to process several kinds of topics in one framework, you should implement each topic related App.

In each App, user can:

Inherit from AbstractEventParser to implement a self-defined parser class to parse input streaming data.
Inherit from AbstractEventOutput to implement a self-defined output class to store processed data.
Use 3 operators (CountOperator, AggregateOperator, DistinctAggregateCountOperator) I've already implemented or make your own operator by implementing AbstractOperator and OperatorConfig.

User Documentation

For details on how to run Thunderain, please refer to https://github.com/thunderain-project/thunderain/wiki/Thunderain-Configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Architecture

User Documentation

Clone this wiki locally