-
Notifications
You must be signed in to change notification settings - Fork 21
Home
Thunderain is a Real-Time Analytical Processing (RTAP) example using Spark and Shark, which can be best characterized by the following four salient properties:
- Data continuously streamed in & processed in near real-time
- Real-time data queried and presented in an online fashion
- Real-time and history data combined and mined interactively
- Predominantly RAM-based processing
The whole architecture of Thunderain is like this:
Here data is collected from web server and transfered by Kafka message queue, Spark Streaming cluster will fetch data from Kafka in each batch duration and process it. Processed data can be put into in memory table using Shark's readable format. User can connect to embedded SharkServer for querying, Also user can self-implement output class to store processed data in any other way.
The UML class chart is:
Here each App is bound to each Kafka topic, if you want to process several kinds of topics in one framework, you should implement each topic related App.
In each App, user can:
- Inherit from
AbstractEventParser
to implement a self-defined parser class to parse input streaming data. - Inherit from
AbstractEventOutput
to implement a self-defined output class to store processed data. - Use 3 operators (
CountOperator
,AggregateOperator
,DistinctAggregateCountOperator
) I've already implemented or make your own operator by implementingAbstractOperator
andOperatorConfig
.
For details on how to run Thunderain, please refer to https://github.com/thunderain-project/thunderain/wiki/Thunderain-Configuration