Skip to content

Enigma Fall 2015 Workshop 1: Scraping public data sources with python

Notifications You must be signed in to change notification settings

tuftsenigma/PublicDataScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

PublicDataScraping

Enigma Fall 2015 Workshop 1: Scraping public data sources with python

Used in conjunction with yotambentov's PyNight2015 talk at Tufts University


Requirements

  • Command line application (e.g. Terminal on OSX/Linux, Command Prompt/Cygwin on Windows)
  • Install Python (OSX, Windows)
  • Install pip
  • Install git
  • Text editor
  • Take a look at Yotam's lecture above and get familiar with Python

Getting Started

Get the code

Open your terminal application and run the following:

git clone https://github.com/tuftsenigma/PublicDataScraping.git

Install BeautifulSoup

sudo pip install beautifulsoup4

Remove sudo on Windows

Install yahoo-finance

pip install yahoo-finance

Remove sudo on Windows

Optional

Sign up for plotly

If you want to make some nice plots while you're at it sign up on plotly and have your api-key handy.


Running Examples

Scraping the Tufts Daily for Links

python link_scrape_tufts_daily.py

Scraping web pages for images

python simple_html_img_scrape.py [url]

url is an optional argument, defaults to https://google.com

NBA Playoff Plus/Minus

To enable plotting using Plot.ly, follow the initialization instructions found here

Run the code:

cd examples
python nba_playoff_plusminus.py <player_id> <year>

Where the player_id is an identifier found from basketball-reference.com e.g. iversal01 for Allen Iverson

An example for J.R. Smith's 2015 Playoffs

About

Enigma Fall 2015 Workshop 1: Scraping public data sources with python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published