Not intended for production use
- Install requirements using
pip install -r requirements.txt
- Ensure a Postgres DB called
tweetstream
exists:u
: streamerp
: streamer
- Run DB migrations using
alembic upgrade head
to create the database table and indices - Get Twitter API credentials: https://dev.twitter.com/apps/new
- Ensure you have a
keys.py
file containing the following string variables:con_key
: the API consumer keycon_secret
: the API consumer secretacc_key
: the API access keyacc_secret
: the API access secret
- In
main()
, change theto_follow
variable to the Twitter user whose followers' Tweets you wish to retrieve - Run
python getstream.py
from the command line
- The Tweepy library is used to connect to Twitter using OAuth
- The Twitter firehose is then filtered to show only the tweets from accounts following a given account – in this case @brockleycentral.
- The tweets are streamed into a PostgreSQL database using the SQLAlchemy library and a coroutine. This allows offline retrieval and analysis using e.g. the Pandas data analysis library (see below).
If you wish to visualise the data, an IPython notebook is provided.
For offline analysis (using dumped CSV data), run this IPython notebook.
A subset of tweets is available as a zipped database dump: tweets.db.zip
. If you wish to use this for analysis, ensure your db exists with the correct credentials, but do not run the alembic upgrade command, as the structure will be created by the import – unzip the file, and import into Postgres.
Copyright Stephan Hügel, 2014
License: MIT