#  Web APIs

An API, or aplication programming interface, is the way programs communicate with one another. 

Web APIs are the way programs communicate with one another _over the internet_

[RESTful](https://en.wikipedia.org/wiki/Representational_state_transfer) APIs respect a series of design principles that make them simple to use.

The basic tools we are going to use are: POST and GET requests to urls we'll specify and json objects that we'll receive as response or send as payload (in a POST command, for example).

In [1]:
import requests

resp = requests.get('http://www.elpais.com/')
resp.content[:500]

'<!DOCTYPE html>\n<html lang="es">\n<head>\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta name="format-detection" content="address=no,email=no,telephone=no">\n<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0" />\n<title>EL PA\xc3\x8dS: el peri\xc3\xb3dico global</title>\n<meta name="lang" \t\t\tcontent="es" />\n<meta name="author" \t\tcontent="Ediciones El Pa\xc3\xads" />\n<meta name="description" \tcontent="Noticias de \xc3\xbaltima hora sobre la actualidad '

This is an API that returns the current position of the ISS:

In [2]:
r = requests.get('http://api.open-notify.org/iss-now.json')
r.status_code

200

In [3]:
r.content

'{"message": "success", "iss_position": {"latitude": "39.3633", "longitude": "-125.7067"}, "timestamp": 1493988848}'

We can convert a json-formatted string such as the one we get in the response into a Python object with the json library:

In [4]:
import json 

pos = json.loads(r.content)
pos

{u'iss_position': {u'latitude': u'39.3633', u'longitude': u'-125.7067'},
 u'message': u'success',
 u'timestamp': 1493988848}

In [5]:
pos['iss_position']['latitude']


u'39.3633'

In [6]:
import pandas as pd

pd.read_json('http://api.open-notify.org/iss-now.json')

Unnamed: 0,iss_position,message,timestamp
latitude,39.3826,success,2017-05-05 12:54:08
longitude,-125.6751,success,2017-05-05 12:54:08


We also can go in the other direction and generate json-formatted strings from Python objects:

In [7]:
mi_diccionario = {'Chicago' : "Illinois", "Kansas City" : ["Kansas", "Missouri"]}

In [8]:
mi_diccionario

{'Chicago': 'Illinois', 'Kansas City': ['Kansas', 'Missouri']}

In [9]:
json.dumps(mi_diccionario)

'{"Kansas City": ["Kansas", "Missouri"], "Chicago": "Illinois"}'

#### Exercise:
Write a function that returns the duration of the next 5 overhead passes of the ISS for a given latitude and longitude. Use http://open-notify.org/Open-Notify-API/ISS-Pass-Times/
. We are going to need to encode the parameters in the url as per the specification.

For example, for Madrid:

http://api.open-notify.org/iss-pass.json?lat=40.4&lon=-3.7

In [10]:
def get_iss(lat, lon):
    
    url = "http://api.open-notify.org/iss-pass.json?lat=%f&lon=%f" % (lat, lon)
    response = requests.get(url)
    my_dict = json.loads(response.content)
    result = my_dict['response']
    
    return result

get_iss(40.0, 3.5)

[{u'duration': 543, u'risetime': 1494038422},
 {u'duration': 640, u'risetime': 1494044141},
 {u'duration': 569, u'risetime': 1494049996},
 {u'duration': 530, u'risetime': 1494055870},
 {u'duration': 601, u'risetime': 1494061679}]

Although we managed to get the response, more complicated sets of parameters will be a complicated and error-prone thing to encode. Thankfully, the `requests` library can do that work for us.

In [11]:
madrid_coords = {'lat': 40.4, 'lon': -3.7}

r = requests.get('http://api.open-notify.org/iss-pass.json', params=madrid_coords)
json.loads(r.content)

{u'message': u'success',
 u'request': {u'altitude': 100,
  u'datetime': 1493986734,
  u'latitude': 40.4,
  u'longitude': -3.7,
  u'passes': 5},
 u'response': [{u'duration': 371, u'risetime': 1493989991},
  {u'duration': 417, u'risetime': 1494038436},
  {u'duration': 637, u'risetime': 1494044076},
  {u'duration': 601, u'risetime': 1494049897},
  {u'duration': 537, u'risetime': 1494055774}]}

In [12]:
resp = json.loads(r.content)['response']

pd.DataFrame(resp)

Unnamed: 0,duration,risetime
0,371,1493989991
1,417,1494038436
2,637,1494044076
3,601,1494049897
4,537,1494055774


Even more complicated sets of parameters are sometimes required. When that is the case, API designers often decide to require them in json format, received via a `POST` request.

For example, take a look at the [QPX api from Google](https://developers.google.com/qpx-express/v1/trips/search). In the documentation, they define the body of the request, which we will have to provide, and of the response, which they'll provide back.

In [13]:
help(requests.post)

Help on function post in module requests.api:

post(url, data=None, json=None, **kwargs)
    Sends a POST request.
    
    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response



# Web scraping

![HTML to DOM](http://www.cs.toronto.edu/~shiva/cscb07/img/dom/treeStructure.png)

![DOM TREE](http://www.openbookproject.net/tutorials/getdown/css/images/lesson4/HTMLDOMTree.png)



In [14]:
from IPython.display import IFrame

IFrame('http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts', 800, 600)

In [15]:
from bs4 import BeautifulSoup

r = requests.get('http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts')

page = r.content
page[:1000]

'<!DOCTYPE html>\n<html lang="en" dir="ltr" xmlns:article="http://ogp.me/ns/article#" xmlns:book="http://ogp.me/ns/book#" xmlns:product="http://ogp.me/ns/product#" xmlns:profile="http://ogp.me/ns/profile#" xmlns:video="http://ogp.me/ns/video#" prefix="content: http://purl.org/rss/1.0/modules/content/  dc: http://purl.org/dc/terms/  foaf: http://xmlns.com/foaf/0.1/  og: http://ogp.me/ns#  rdfs: http://www.w3.org/2000/01/rdf-schema#  schema: http://schema.org/  sioc: http://rdfs.org/sioc/ns#  sioct: http://rdfs.org/sioc/types#  skos: http://www.w3.org/2004/02/skos/core#  xsd: http://www.w3.org/2001/XMLSchema# ">\n  <head>\n    <meta charset="utf-8" /><script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(e,n,t){function r(t){if(!n[t]){var o=n[t]={exports:{}};e[t][0].call(o.exports,function(n){var o=e[t][1][n];return r(o||n)},o,o.exports)}return n[t].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<t.length;o++)r(t[o]);return r}({1:[funct

In [16]:
soup = BeautifulSoup(page, 'html5lib')
print soup.prettify()[:1000]

<!DOCTYPE html>
<html dir="ltr" lang="en" prefix="content: http://purl.org/rss/1.0/modules/content/  dc: http://purl.org/dc/terms/  foaf: http://xmlns.com/foaf/0.1/  og: http://ogp.me/ns#  rdfs: http://www.w3.org/2000/01/rdf-schema#  schema: http://schema.org/  sioc: http://rdfs.org/sioc/ns#  sioct: http://rdfs.org/sioc/types#  skos: http://www.w3.org/2004/02/skos/core#  xsd: http://www.w3.org/2001/XMLSchema# " xmlns:article="http://ogp.me/ns/article#" xmlns:book="http://ogp.me/ns/book#" xmlns:product="http://ogp.me/ns/product#" xmlns:profile="http://ogp.me/ns/profile#" xmlns:video="http://ogp.me/ns/video#">
 <head>
  <meta charset="utf-8"/>
  <script type="text/javascript">
   window.NREUM||(NREUM={}),__nr_require=function(e,n,t){function r(t){if(!n[t]){var o=n[t]={exports:{}};e[t][0].call(o.exports,function(n){var o=e[t][1][n];return r(o||n)},o,o.exports)}return n[t].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<t.length;o++)r(t[o]);return r}({1:[functi

In [17]:
print soup.prettify()[28700:30500]

  Letter to Congress Opposing A Bill to Post Private Information of Asbestos Victims Online and Make It Harder for Victims to Receive Compensation
                  </span>
                 </h2>
                 <time datetime="2017-03-06T11:48:00-0500">
                  March 6, 2017
                 </time>
                </div>
               </a>
               <div>
               </div>
              </div>
             </div>
             <div class="block block-content col-12 col-lg-4">
              <div class="content-details ">
               <a class="b-inner" href="/about/advocacy/legislative-alerts/letter-congress-opposing-legislation-strip-rights-working-people">
                <div class="b-text">
                 <h5 class="content-type">
                  Legislative Alert
                 </h5>
                 <h2 class="content-title">
                  <span>
                   Letter to Congress Opposing Legislation to Strip Rights from Working People Who Wor

In [18]:
help(soup.find_all)

Help on method find_all in module bs4.element:

find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs) method of bs4.BeautifulSoup instance
    Extracts a list of Tag objects that match the given
    criteria.  You can specify the name of the Tag and any
    attributes you want the Tag to have.
    
    The value of a key-value pair in the 'attrs' map can be a
    string, a list of strings, a regular expression object, or a
    callable that takes a string and returns whether or not the
    string matches for some custom definition of 'matches'. The
    same is true of the tag name.



In [19]:
alerts = soup.find_all('div', class_='content-details')
print len(alerts)
type(alerts)

18


bs4.element.ResultSet

In [20]:
alerts[0]

<div class="content-details ">\n\t<a class="b-inner" href="/about/advocacy/legislative-alerts/letter-congress-opposing-bill-would-subject-workers-longer-hours">\n\t  <div class="b-text">\n      <h5 class="content-type">Legislative Alert</h5>\n      <h2 class="content-title"><span>Letter to Congress Opposing a Bill That Would Subject Workers to Longer Hours and More Unpredictable Schedules</span>\n</h2>\n            <time datetime="2017-04-25T14:50:35-0400">April 25, 2017</time>\n    </div>\n\t</a>\n  <div></div>\n</div>

In [21]:
first = alerts[0]
print first.find('time').get_text()
print first.a.find('span').get_text()
print first.a['href']

April 25, 2017
Letter to Congress Opposing a Bill That Would Subject Workers to Longer Hours and More Unpredictable Schedules
/about/advocacy/legislative-alerts/letter-congress-opposing-bill-would-subject-workers-longer-hours


In [22]:
def get_aflcio_alerts():
    result = []
    r = requests.get('http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts')
    soup = BeautifulSoup(r.content, 'html5lib')
    
    for alert in soup.find_all('div', class_='content-details'):
        dictionary = {}
        dictionary['date'] = alert.find('time').get_text()
        dictionary['title'] = alert.a.find('span').get_text()
        dictionary['link'] = 'http://www.aflcio.org' + alert.a['href']
        
        result.append(dictionary)
        
    return result

In [23]:
letters = get_aflcio_alerts()
letters[:2]

[{'date': u'April 25, 2017',
  'link': u'http://www.aflcio.org/about/advocacy/legislative-alerts/letter-congress-opposing-bill-would-subject-workers-longer-hours',
  'title': u'Letter to Congress Opposing a Bill That Would Subject Workers to Longer Hours and More Unpredictable Schedules'},
 {'date': u'March 23, 2017',
  'link': u'http://www.aflcio.org/about/advocacy/legislative-alerts/letter-senators-opposing-supreme-court-nomination-neil-gorsuch',
  'title': u'Letter to Senators Opposing the Supreme Court Nomination of Neil Gorsuch'}]

In [24]:
# And we come full circle! We encode the list we created in 
# a json string. We could then provide that over the internet
# in our own API!!

json.dumps(letters)[:1000]

'[{"date": "April 25, 2017", "link": "http://www.aflcio.org/about/advocacy/legislative-alerts/letter-congress-opposing-bill-would-subject-workers-longer-hours", "title": "Letter to Congress Opposing a Bill That Would Subject Workers to Longer Hours and More Unpredictable Schedules"}, {"date": "March 23, 2017", "link": "http://www.aflcio.org/about/advocacy/legislative-alerts/letter-senators-opposing-supreme-court-nomination-neil-gorsuch", "title": "Letter to Senators Opposing the Supreme Court Nomination of Neil Gorsuch"}, {"date": "March 16, 2017", "link": "http://www.aflcio.org/about/advocacy/legislative-alerts/letter-congress-opposing-attacks-union-rights-working-people", "title": "Letter to Congress Opposing Attacks on Union Rights of Working People at the Veterans Affairs"}, {"date": "March 16, 2017", "link": "http://www.aflcio.org/about/advocacy/legislative-alerts/letter-congress-opposing-repeal-osha-rule-requiring-employers", "title": "Letter to Congress Opposing Repeal of OSHA R

# Annex: ultra easy scraping with pandas!

When the data we want is already formatted as a table, we can do it even more easily! Just use `pandas.read_html`:

In [25]:
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll', header=0)

In [26]:
tables[4].head()

Unnamed: 0,Deaths,Date,Incident
0,20000,000000001626-05-30-000030 May 1626,"Wanggongchang Explosion in Beijing, China in t..."
1,6000,000000001948-11-01-00001 November 1948,Boiler and ammunition explosion aboard an unid...
2,3000,000000001769-08-18-000018 August 1769,A lightning bolt caused the Brescia Explosion ...
3,1950,000000001917-12-06-00006 December 1917,"Halifax Explosion in Nova Scotia, Canada[37]"
4,1500,000000001941-06-08-00008 June 1941,Ammunition plant with facilities explode at Sm...


# Annex II: exercises

### Exercise:

Extract the date of the worst aviation disaster from: https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html

### Exercise: 

Assuming the list is exhaustive, calculate how many people died in accidental explosions per decade in the XX century. Plot it.

Data: 
https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html, pd.to_datetime, matplotlib or seaborn

### Exercise: 

create a function that, given the two tables extracted from http://en.wikipedia.org/wiki/List_of_S%26P_500_companies and a date, returns the list of companies in the S&P 500 at that date.