Prerequisites:
- JDK 11
- Internet connection and some disk space
- Download the pre-built JAR file from GitHub
- Run the JAR file with:
java -jar redditstats-1.0.jar
OR clone the Git repository and run by Gradle
git clone https://github.com/valtsmazurs/redditstats.git
cd redditstats
./gradlew bootRun
The embedded HTTP server will be binded to localhost port 8080.
Accepts HTTP GET requests. Returns JSON data if any available.
URIs:
- /activity - returns Reddit activity: how many new submissions and comments have been posted during the given time range
- /mostActive/top100 - returns top 100 most active subreddits during the given time range; the activity is measured as sum of new submissions and comments together
- /mostActive/bySubmissions - returns a subreddit that has the greatest number of submissions during the given time range
- /mostActive/byComments - returns a subreddit that has the greatest number of submissions during the given time range
All REST endpoints accept parameter timeRange with the available values:
- ONE_MINUTE
- FIVE_MINUTES
- ONE_HOUR
- ONE_DAY
- UNLIMITED
Default: ONE_MINUTE
E.g. http://localhost:8080/activity?timeRange=ONE_HOUR
will return the total number of submissions and commends
that have been logged during the last hour.
The project consists of components that expose only their interfaces.
- redditevent - the innermost component; deals with RedditEvent storage and retrieval; depends on Mongodb as a storage engine and uses Mongodb for data aggregations
- gathering - gathers the data from stream.pushshift.io, uses redditevent to store the received events
- statistics - provides Java API for Reddit statistics, uses redditevent for aggregated data retrieval
- webservice - implements REST endpoints, uses statistics to get the data
The architecture demonstrates dependency flow where the innermost component has the least amount of dependencies and is not dependent on other components of this application.
- MongoDB - widely used, stable, well suited for this case where there is no need for relations that would be provided by RDBMS
- Embedded MongoDB - saves time to set up local MongoDB service
- Spring Boot - web service creation, dependency injection, comfortable APIs for MongoDB
- Immutables - reduces the amount of the boilerplate code, provides comfortable way how to work with immutable objects
- Apache CXF - handles stream.pushshift.io's Server Sent Events (SSE)
- For tests:
- Junit5
- Mockito
- Assertj - better readable assertion statements
- Gradle - build tool of choice, supports Spring Boot plugin
Most of functionality (that does not involve parallelism or external resources) is covered by unit tests.