Skip to content
Saisai Shao edited this page Jun 21, 2013 · 3 revisions

Thunderain is a Real-Time Analytical Processing (RTAP) example using Spark and Shark, which can be best characterized by the following four salient properties:

  • Data continuously streamed in & processed in near real-time
  • Real-time data queried and presented in an online fashion
  • Real-time and history data combined and mined interactively
  • Predominantly RAM-based processing

Architecture

The whole architecture of Thunderain is like this:

cluster architecture

Here data is collected from web server and transfered by Kafka message queue, Spark Streaming cluster will fetch data from Kafka in each batch duration and process it. Processed data can be put into in memory table using Shark's readable format. User can connect to embedded SharkServer for querying, Also user can self-implement output class to store processed data in any other way.

The UML class chart is:

streaming uml

Here each App is bound to each Kafka topic, if you want to process several kinds of topics in one framework, you should implement each topic related App.

In each App, user can:

  • Inherit from AbstractEventParser to implement a self-defined parser class to parse input streaming data.
  • Inherit from AbstractEventOutput to implement a self-defined output class to store processed data.
  • Use 3 operators (CountOperator, AggregateOperator, DistinctAggregateCountOperator) I've already implemented or make your own operator by implementing AbstractOperator and OperatorConfig.

User Documentation

For details on how to run Thunderain, please refer to https://github.com/thunderain-project/thunderain/wiki/Thunderain-Configuration

Clone this wiki locally