Skip to content

Stream & aggregate tweets containing a set of track terms in memory and write aggregates to rocks db. Aggregates include top hashtags, top mentions and top retweets. Contains a local executable that can run forever, computing aggregates and storing results in a local rocks DB. Also has repl mode for querying results from the db.

Notifications You must be signed in to change notification settings

vigneshc/TweetAggregates

Repository files navigation

Overview
Continuously stream tweets containing a set of track terms. Aggregate top hashtags, top mentions from the stream and store them in a rocks db instance. Contains a local executable that can run forever, collecting aggregates and storing results in a local rocks DB. Also has a repl mode for querying the results from the db.

Project Structure

  1. TweetGateCore contains classes for querying twitter, aggregating and storing results in DB.
  2. TweetGate contains classes for the executable and commands described in Usage section.

High level logic

TwitterStream.cs\StartTwitterPump() Sends tweets to a pipe using System.IO.Pipelines. ==> TwitterStream.cs\ProcessTweetStream() Pushes tweets to a reactive subject.
TweetSubject ==> Query.cs\SimpleAggregate() returns a query that aggregates data and returns observables for aggregates ==> RocksDBStore.cs\PersistObservableAsync() stores aggregates to DB.
Program.SaveAggregates.cs kicks off above workflow.
ReactiveX is used for publish-subscribe mechanism and minimal Trill is used for window aggregations.
RocksDB Sharp is used for storing aggregate data.

Usage

Either install dotnet and do “dotnet run” or build a self-sufficient executable.

  1. Save tweets to a local file.

    saveTweets [twitterConfigJsonFile] [destinationFile] [durationMinutes]
    Saves tweets to the file provided for duration minutes.

  2. Compute Aggregates from a local file.

    saveAggregates file [inputDataFile] [rocksDBPath]
    Aggregates tweets in inputDataFile and stores aggregates in DB.
    Use case is for first storing the tweets in a file using (1) and then computing aggregates over it. Mainly used for testing.

  3. Compute aggregates for tweets directly from twitter stream api.

    saveAggregates direct [twitterConfigJsonFile] [rocksDBPath]
    Streams tweets from twitter API, aggregates them and stores aggregates in DB.
    Use case is for storing aggregates for certain keywords.

  4. View aggregates in DB

    repl [rocksDBPath] [OutputDirectoryPath]
    Apis for reading content in the DB. Additional details available in Program.Repl.cs
    If OutputDirectoryPath is provided, results are stored in files in that directory. If it not provided, results are printed to console. Use case is to quickly view DB content.

Example Twitter Config. This page has more details on TrackTerms.

{
    "TrackTerms": "comma,@separated,#hashTags,and,text",
    "OAuthConsumerSecret": "<>",
    "OAuthToken": "<>",
    "OAuthTokenSecret": "<>",
    "OAuthConsumerKey": "<>"
}

Example aggregates output for about 30 minutes is available here - Top Hashtags , Top Mentions, Top Retweets.

About

Stream & aggregate tweets containing a set of track terms in memory and write aggregates to rocks db. Aggregates include top hashtags, top mentions and top retweets. Contains a local executable that can run forever, computing aggregates and storing results in a local rocks DB. Also has repl mode for querying results from the db.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published