A repository for the Georgetown School of Continuing Studies Data Science Certificate capstone. Project is to identify instances of cyberbullying.
Project structure contains the following modules:
analysis
: Python code + Jupyter notebooks for analysiscron
: Python code to setup various cronjobs e.g. daily Reddit scraperdata
: Example data sets for local storage (temporary; should be used for exploratory purposes; upload to AWS instead)examples
: Example scripts for working with AWS RDS database + S3, Pandas, writing class wrapper forpsycopg2
papers
: Papers collected that investigates the cyberbullying problem space
This project was setup using the free tier of AWS. The EC2 instance is an Amazon instance, has Python 3.4 installed, and
requires a Python 3.4 virtualenv for the daily cron to run as is. The cronjobs are ran as the default ec2-user
.
Example for how to setup cron jobs:
ssh -i path/to/key.pem ec2-user@public-ec2-domain
- Logic to activate Python 3.4 virtualenv
python cron/daily_crons.py
- You can view cronjobs via
crontab -e
To get started you'll need to:
- Setup a Python 3.4
virtualenv
(by convention namedvenv
as that's what's ignored in.gitignore
) using therequirements.txt
file, else useconda
- Note: If using AWS EC2 Amazon image, you may need to confirm the proper pip is used to install depedendencies. You don't want system pip but the virtualenv pip
/venv/local/bin/pip3.4
. - Create a
config.yml
using your credentials, useexample-config.yml
as a template - Setup AWS credentials via
aws configure
Contact Lorena Mesa via email (me@lorenamesa.com) or http://twitter.com/loooorenanicole.
Thanks folks!