Redditbeat is an Elastic Beat to index new Reddit Submissions of one or multiple Subreddits.
Getting Started with Redditbeat
Ensure that this folder is at the following location:
git clone https://github.com/voigt/redditbeat.git $GOPATH/src/github.com/voigt/redditbeat cd $GOPATH/src/github.com/voigt/redditbeat
To get running with Redditbeat and also install the dependencies, run the following command:
To build the binary for Redditbeat run the command below. This will generate a binary in the same directory with the name redditbeat.
To run Redditbeat with debugging output enabled, run:
./redditbeat -c redditbeat.yml -p data/redditmap.json -e -d "*"
Hint: If you want to reindex already indexed Subreddits (resets data/redditmap.json):
You'll want to configure which Subreddits to index. You will do this in
redditbeat: # Defines how often an event is sent to the output period: 60s # how often to check for new Submissions reddit: username: "username" password: "password" useragent: "Redditbeat v0.1" subs: ["kitten", "news"] # a list of Subreddits to index limit: 10 # curret limit is 100
- index new Submissions of one or multiple given Subreddits
- add persistency, so already indexed submissions will not be indexed again
- add dockerfile
- index new Submissions of one or multiple Users
- Redditbeat misses some new Submissions
Redditbeat is making use of geddit. Unfortunately geddit saves the timestamp of a submission in
float32, which means we lose up to 99 seconds of the timestamp. Ultimately this leads to the fact, that Redditbeat does not recognise new Submissions of which created date is closer than 99 secs. geddit is already informed.