Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


The ultimate twitter streaming data collector

At Unnati we're a bunch of Data Scientists solving important business problems.

The crux of solving these data related problems is being able to collect the data itself. In some cases, data might already be available in the way we want it, but in most cases, it has to either be procured or transformed to fit our needs.

Social media analytics is turning out to be a very important aspect of a business. End users are quick to praise and even quicker to shame a brand or a product on Social Media. This has resulted in a rapid investment being put into being able to monitor and act on inputs received from Social Media.

But to begin, we need data.

The Problem

Consider Twitter. Their RESTful APIs are quite stringent in terms of Rate limits. What we really want to use is their streaming API. The streaming API doesn't have rate limits and grants us the power of processing these things in near real time.

The problem is in reinventing the wheel, most of the times, we end up writing the data collection layer time and again with minor changes to the codebase. The crux of the collection layer though, largely remains the same.

The Solution

To solve this problem, we built Tweezer. With Tweezer, you can start collecting data in under 5 mins. All you need is a twitter authorized app created at your end and an instance of MongoDB.

We have a handy configuration file to manage the workings of the app. This configuration file will have the authorized app credentials, data store credentials and the keywords/hashtags to track.

Very recently, HasGeek conducted their annual conference on JavaScript in India - JsFoo. To test run Tweezer, we left it running for 3 days monitoring the hashtags and keywords relevant to the event.

Using this data, we have even put together a dashboard visualizing the various angles of JsFoo: here


Using Docker


Use the Dockerfile and build the docker image

The docker image comes with jdk-8 and mongodb

$ sudo docker build -t mytwitterstream .

Once the image is built, make sure you add the credentials in application.conf

  • specify the twitter API credentials
  • specify mongo db credentials


Run the docker image.

$ sudo docker run -t -i mytwitterstream

This internally starts mongodb, starts the twitter streamer app which writes tweets to the local db.

Using the source


$ sbt build

Set the required credentials in application.conf


$ sbt run

Using the JAR directly

If you have Java 7+ and mongodb 3 installed and do not want the docker setup, you can pickup the jar from dropbox and run tweezer. Make sure you configure the application.conf (here is a sample) and set an environment variable HARATE_CONF pointing to the location of the configuration file

export HARATE_CONF=/path/to/application.conf

Once we have the path configured, we are ready to run the jar

$ java -jar tweezer-0.2.0.jar


The ultimate twitter streaming data collector






No releases published


No packages published