Enigma Fall 2015 Workshop 1: Scraping public data sources with python
Used in conjunction with yotambentov's PyNight2015 talk at Tufts University
- Command line application (e.g. Terminal on OSX/Linux, Command Prompt/Cygwin on Windows)
- Install Python (OSX, Windows)
- Install pip
- Install git
- Text editor
- Take a look at Yotam's lecture above and get familiar with Python
Open your terminal application and run the following:
git clone https://github.com/tuftsenigma/PublicDataScraping.git
sudo pip install beautifulsoup4
Remove sudo on Windows
pip install yahoo-finance
Remove sudo on Windows
Sign up for plotly
If you want to make some nice plots while you're at it sign up on plotly and have your api-key handy.
Scraping the Tufts Daily for Links
python link_scrape_tufts_daily.py
python simple_html_img_scrape.py [url]
url is an optional argument, defaults to https://google.com
To enable plotting using Plot.ly, follow the initialization instructions found here
Run the code:
cd examples
python nba_playoff_plusminus.py <player_id> <year>
Where the player_id is an identifier found from basketball-reference.com e.g. iversal01 for Allen Iverson
An example for J.R. Smith's 2015 Playoffs