Tweezer

The ultimate twitter streaming data collector

At Unnati we're a bunch of Data Scientists solving important business problems.

The crux of solving these data related problems is being able to collect the data itself. In some cases, data might already be available in the way we want it, but in most cases, it has to either be procured or transformed to fit our needs.

Social media analytics is turning out to be a very important aspect of a business. End users are quick to praise and even quicker to shame a brand or a product on Social Media. This has resulted in a rapid investment being put into being able to monitor and act on inputs received from Social Media.

But to begin, we need data.

The Problem

Consider Twitter. Their RESTful APIs are quite stringent in terms of Rate limits. What we really want to use is their streaming API. The streaming API doesn't have rate limits and grants us the power of processing these things in near real time.

The problem is in reinventing the wheel, most of the times, we end up writing the data collection layer time and again with minor changes to the codebase. The crux of the collection layer though, largely remains the same.

The Solution

To solve this problem, we built Tweezer. With Tweezer, you can start collecting data in under 5 mins. All you need is a twitter authorized app created at your end and an instance of MongoDB.

We have a handy configuration file to manage the workings of the app. This configuration file will have the authorized app credentials, data store credentials and the keywords/hashtags to track.

Very recently, HasGeek conducted their annual conference on JavaScript in India - JsFoo. To test run Tweezer, we left it running for 3 days monitoring the hashtags and keywords relevant to the event.

Using this data, we have even put together a dashboard visualizing the various angles of JsFoo: here

Usage

Using Docker

Build

Use the Dockerfile and build the docker image

The docker image comes with jdk-8 and mongodb

$ sudo docker build -t mytwitterstream .

Once the image is built, make sure you add the credentials in application.conf

specify the twitter API credentials
specify mongo db credentials

Run

Run the docker image.

$ sudo docker run -t -i mytwitterstream

This internally starts mongodb, starts the twitter streamer app which writes tweets to the local db.

Using the source

Build

$ sbt build

Set the required credentials in application.conf

Run

$ sbt run

Using the JAR directly

If you have Java 7+ and mongodb 3 installed and do not want the docker setup, you can pickup the jar from dropbox and run tweezer. Make sure you configure the application.conf (here is a sample) and set an environment variable HARATE_CONF pointing to the location of the configuration file

export HARATE_CONF=/path/to/application.conf

Once we have the path configured, we are ready to run the jar

$ java -jar tweezer-0.2.0.jar

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
project		project
src/main		src/main
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
application.conf		application.conf
sbt		sbt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweezer

The Problem

The Solution

Usage

Using Docker

Build

Run

Using the source

Build

Run

Using the JAR directly

About

Releases

Packages

Contributors 3

Languages

unnati-xyz/tweezer

Folders and files

Latest commit

History

Repository files navigation

Tweezer

The Problem

The Solution

Usage

Using Docker

Build

Run

Using the source

Build

Run

Using the JAR directly

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages