In [5]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi']

RuntimeError: module compiled against API version a but this version of numpy is 9

ImportError: numpy.core.multiarray failed to import

In [6]:
import numpy
numpy.version.version

'1.9.2'

In [4]:
import numpy
print numpy.__path__
print numpy.__version__


['/root/anaconda2/lib/python2.7/site-packages/numpy']
1.9.2


# Consuming APIs (and JSON)

Consuming APIs is supposed to be easy (that's the point of having an API).  

Let's look at a simple example of consuming a JSON API.  The example we'll look at is a *geocoder*: That is, a service for converting between addresses and normalized geographic information (e.g. latitude and longitude).  Going from addresses to normalized form is "forward geocoding" and going the other way is "reverse geocoding".

We'll interact with a free (and non-authenticated) geocoder run by OpenStreetMap:

In [1]:
import urllib2
import simplejson as json
def geocode(address):
    url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)
    ret = urllib2.urlopen(url).read()
    return json.loads(ret)

my_home = geocode("865 page st, san francisco, ca 94117")
my_home

[{'address': {'city': 'SF',
   'country': 'United States of America',
   'country_code': 'us',
   'county': 'SF',
   'house_number': '865',
   'neighbourhood': 'North of Panhandle',
   'postcode': '94117',
   'road': 'Page Street',
   'state': 'California'},
  'boundingbox': ['37.772362897959',
   '37.772462897959',
   '-122.43498053061',
   '-122.43488053061'],
  'class': 'place',
  'display_name': '865, Page Street, North of Panhandle, SF, California, 94117, United States of America',
  'importance': 0.301,
  'lat': '37.7724128979592',
  'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
  'lon': '-122.434930530612',
  'place_id': '477327017',
  'type': 'house'}]

You can also select out elements of JSON blobs in "the natural way":

In [None]:
my_home[0]['boundingbox']

### Things to note:

1.  In this case, the request parameters were encoded in the URL itself.  This is usually the case for simple "`GET`" queries.  Because our string contained characters like spaces, we had to "URL encode" it (this is what `urllib2.quote` does).  It's usually a bad idea to do your own encoding like this: below we'll talk about the `requests` library, which lets us avoid this.

2. The result was returned to us in the form of _JSON_.  JSON is JavaScript Object Notation -- it's a human readable text-based format for transmitting key-value pairs (and strings, numbers, and arrays).  The `json` package lets us convert between this and Python's native dictionaries, etc.
 
3. This was a public API, with no authentication.  We'll go through an example of the code for an authenticated API at the end -- the example will be the free Twitter stream.  (The reason we didn't do this up front is that you can't run the code without signing up for an API key, etc.)

In [2]:
address = "1600 Pennsylvania Avenue, Washington, DC"
url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)

print address
print
print url
print
print urllib2.urlopen(url).read()
print
json.loads(urllib2.urlopen(url).read())

1600 Pennsylvania Avenue, Washington, DC

http://nominatim.openstreetmap.org/search?q=1600%20Pennsylvania%20Avenue%2C%20Washington%2C%20DC&addressdetails=1&format=json

[{"place_id":"114705257","licence":"Data © OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright","osm_type":"way","osm_id":"238241022","boundingbox":["38.8974898","38.897911","-77.0368539","-77.0362521"],"lat":"38.8976989","lon":"-77.036553192281","display_name":"White House, 1600, Pennsylvania Avenue Northwest, Monumental Core, Logan Circle, Washington, District of Columbia, 20500, United States of America","class":"building","type":"yes","importance":0.91767573872961,"address":{"building":"White House","house_number":"1600","pedestrian":"Pennsylvania Avenue Northwest","neighbourhood":"Monumental Core","suburb":"Logan Circle","city":"Washington","state":"District of Columbia","postcode":"20500","country":"United States of America","country_code":"us"}},{"place_id":"27830651","licence":"Data ©

[{'address': {'building': 'White House',
   'city': 'Washington',
   'country': 'United States of America',
   'country_code': 'us',
   'house_number': '1600',
   'neighbourhood': 'Monumental Core',
   'pedestrian': 'Pennsylvania Avenue Northwest',
   'postcode': '20500',
   'state': 'District of Columbia',
   'suburb': 'Logan Circle'},
  'boundingbox': ['38.8974898', '38.897911', '-77.0368539', '-77.0362521'],
  'class': 'building',
  'display_name': 'White House, 1600, Pennsylvania Avenue Northwest, Monumental Core, Logan Circle, Washington, District of Columbia, 20500, United States of America',
  'importance': 0.91767573872961,
  'lat': '38.8976989',
  'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
  'lon': '-77.036553192281',
  'osm_id': '238241022',
  'osm_type': 'way',
  'place_id': '114705257',
  'type': 'yes'},
 {'address': {'city': 'Washington',
   'country': 'United States of America',
   'country_code': 'us',
   'house_

To make it easier to see what's going on, let's pretty-print that JSON object:

    [
       {"place_id":"9163027846",
        "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
        "osm_type":"way",
        "osm_id":"11557939",
        "boundingbox": ["39.655891418457", "39.6572189331055", 
                        "-77.5709609985352", "-77.5705108642578"],
        "lat":"39.6566765",
        "lon":"-77.5708067",
        "display_name":"Pennsylvania Avenue, Smithsburg, Washington, Maryland, 21783, United States of America",
        "class":"highway",
        "type":"tertiary",
        "importance":0.41,
        "address": {"road":"Pennsylvania Avenue",
                    "town":"Smithsburg", 
                    "county":"Washington", 
                    "state":"Maryland", 
                    "postcode":"21783", 
                    "country":"United States of America", 
                    "country_code":"us"
                   }
       }
    ]

Just like in Python, `[..]` is for arrays and `{..}` is for a dictionary.  This is pretty much all there is to JSON.


**Exercise:** There's also a [free API](http://openweathermap.org) for weather information.

A sample request might look something like `http://api.openweathermap.org/data/2.5/weather?lat=35&lon=139`

Use the geocoder to write a function

        def weather_at_address(address):
            ....
            
that gets the current weather (temperature, cloudy or not) from a human-entered address.

## Handling URL parameters

`urllib2` module requires an enormous amount of work to perform the simplest of tasks. The `requests` library provides a higher-level way to do web requests. This is already nice in examples, like the above, where we need to encode parameters into the URL.  It is even more convenient when there are also `POST` parameters (or cookies, or authentication, or...) involved.  (Don't worry if you don't know what that means.)

In [3]:
import requests
def geocode(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    r = requests.get('http://nominatim.openstreetmap.org/search', params=params)
    return r.json()

In [4]:
x = geocode("107 Page St., San Francisco")

In [5]:
x[0]

{u'address': {u'city': u'SF',
  u'country': u'United States of America',
  u'country_code': u'us',
  u'county': u'SF',
  u'house_number': u'107',
  u'neighbourhood': u'Western Addition',
  u'postcode': u'94102',
  u'road': u'Page Street',
  u'state': u'California'},
 u'boundingbox': [u'37.773924413793',
  u'37.774024413793',
  u'-122.42266903448',
  u'-122.42256903448'],
 u'class': u'place',
 u'display_name': u'107, Page Street, Western Addition, SF, California, 94102, United States of America',
 u'importance': 0.201,
 u'lat': u'37.7739744137931',
 u'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
 u'lon': u'-122.422619034483',
 u'place_id': u'477462016',
 u'type': u'house'}

## Authenticated APIs

Lots of interesting APIs are free (or at least free for moderate use) but still require you to register first.  The `requests` library (together with some supporting ones, e.g. `requests_oauthlib`) make it easy to consume these too.

**Exercise:** In order to access the Twitter API, you must first sign up: create an app on http://apps.twitter.com, get an access token, et voila, you have your shiny new credentials -- consisting of four pieces of data. The file /secrets/twitter_secrets.json.sample in the datacourse repo has the format template; then rename the file to have a .nogit extension to prevent it being tracked in the repository.

In [9]:
import simplejson
from requests_oauthlib import OAuth1

with open("secrets/twitter_secrets.json.nogit") as fh:
    secrets = simplejson.loads(fh.read())

# create an auth object
auth = OAuth1(
    secrets["api_key"],
    secrets["api_secret"],
    secrets["access_token"],
    secrets["access_token_secret"]
)

In [13]:
# See all of Michael's friends
r = requests.get(
    "https://api.twitter.com/1.1/followers/ids.json",
    auth=auth,
    params={'screen_name' : 'trixandtrax'}
)
trixandtrax_friends=r.json()

r2 = requests.post(
    'https://api.twitter.com/1.1/users/lookup.json',
    auth=auth,
    data={'user_id' : trixandtrax_friends['ids'][:50]}
)
friends_info = r2.json()
[(f['screen_name'], f['name']) for f in friends_info]

[(u'dennisjjjdennis', u'dennisjjjreillys'),
 (u'lie155', u'Liedena Lopez'),
 (u'Karitho10', u'Carolina Gonzalez'),
 (u'PictuneApp', u'Pictune'),
 (u'sarahraniero', u'sarah raniero'),
 (u'LeopoldoAbreu', u'Leopoldo De Abreu'),
 (u'DanzoyExisto', u'Claudia Cordido'),
 (u'Valdivinoaugus1', u'Valdivino augusto'),
 (u'sidrerialasierr', u'sidrerialasierra'),
 (u'alfonsoguzmanlu', u'Alfonso Guzm\xe1n L\xfa'),
 (u'atenea8007', u'atenea'),
 (u'tonyelmasterva1', u'tony'),
 (u'Danipatafunk', u'Daniel Roa'),
 (u'CaracasSW', u'Caracas SW'),
 (u'GrupoMAS3', u'GRUPO MAS 3'),
 (u'Angie17111', u'Chac\xf3n Ang\xe9lica'),
 (u'1ERCLIP', u'PrimerClip: ADAELLE'),
 (u'Medicenkios_', u'Charlie Charlie\u2614\ufe0f\u25b2KC'),
 (u'Herglobalimpact', u'HER GLOBAL IMPACT'),
 (u'thiagoantonya', u'Thiago Silva'),
 (u'Andres53825791', u'Andres'),
 (u'themiguelacho', u'Miguel R'),
 (u'exp_chilepesca', u'Exp_ChilePesca'),
 (u'rixio_acosta', u'r'),
 (u'efrainhenrique6', u'efrain henriquez'),
 (u'aula7net', u'Aula7'),
 (u

In [14]:
## Requests also makes it easy to deal with simple streaming APIs.  Let's stream 100 tweets from the Twitter feed.

import json, sys
r_stream = requests.get('https://stream.twitter.com/1.1/statuses/sample.json', auth=auth, stream=True)
counter = 0
for line in r_stream.iter_lines():
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        counter +=1
        print tweet['text']
    sys.stdout.flush()
    if counter > 100:
        break

😍💙💙😍 https://t.co/EXSrCdUDgf
@hezmick @AmvYo q nos dejen volver a los del 86 nos duran cuatro pipas https://t.co/83jQgHfzLa
RT @roux_isabelle: #Bruxelles https://t.co/pXhHBZwe0B
RT @NiallOfficial: There's way too much anger in the world .
RT @whiequalenrol: ils ont décidés de mles péter ou quoi ?
RT @talkspanishtome: Sending love to Brussles 
Enviando amor a Brussles https://t.co/FsPecHTfpU
RT @11_dipa: https://t.co/umZJc8MfXX
@Peter_Hendrickx [22/03/2016 18:27]  flight (FR164) cancelled
https://t.co/U67N9oe1K2
Фильм кухня посмотрел тут https://t.co/5ZhPses92o
RT @dallascowboys: Moments after signing his new contract, @Trey_Deuces spoke with us for an exclusive interview.
https://t.co/OjtrAr0z2J
RT @anveymel: Buti nalang dumating ka sa ALDUB sobrang gv mga vids mo kiligness overload ang ganda pa, totoo yan!

#EBGodGaveMeYou https://…
@sv98 Go  Go  Go
(وقالوا لو كنا نسمع أو نعقل ما كنا في أصحاب السعير) [الملك:10] https://t.co/ygRfTU3CjA
RT @WSHHFANS: IM CRYING 😭 https://t.co/PWfwhvK18j


In [19]:
## Here's a variant that's more US-centric.
## Question: what does islice do?

import json, sys
from itertools import islice
r_stream = requests.post('https://stream.twitter.com/1.1/statuses/filter.json', auth=auth,
                          stream=True, data={"locations" : "-125,23,-70,50", "track": "Brussels, Belgium"})
for line in islice(r_stream.iter_lines(), 100):
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        print tweet['text']
    sys.stdout.flush()

RT @th3j35t3r: URGENT ALERT: Cops in #Brussels seeking info in this guy, believe he's responsible for attacks &amp; still at large &gt;&gt; https://t…
.@joshtpm @TPM Because all those Brussels armed raids since Paris prevented those attacks.  Can GOP not be smarter even if they're bigots?
RT @JaredTSwift: #Brussels https://t.co/bosEPlTUBH
Плешивый не хочет добавки санкций из-за Савченко,гэбня шурует по полной : https://t.co/Hgqv0WtQh7
Bomb blasts in Brussels reverberate in presidential race: This is not the first time a horrific terrorist attack… https://t.co/i65rnSd3IY
RT @PoliceChiefs: Following #Brussels, we are urging people to be alert but not alarmed. Report anything suspicious to the Anti-Terrorist H…
Batman v Superman: Dawn of Justice London Premiere Still On Despite Brussels Attacks, Red… https://t.co/2fikflxu50 https://t.co/P4ZiMl8UBx
RT @WDFx2EU: They explained their plains for Brussels Domination months ago. The refugee's are Sperm Donors 4 Islam. #StopIslam
 https://t.…
RT

### Exercises

1. Write a Python script that takes as input an address and outputs 50 tweets from within about 10 miles of it.
Now modify it to return the top 10 hashtags within that 10 mile range (based on, say, a 1000 tweet sample).
1. You can plot maps using this [Python Package](http://peak5390.wordpress.com/2012/12/08/matplotlib-basemap-tutorial-plotting-points-on-a-simple-map/).  Get geo-located tweets from the streaming API and plot them on the map.

### Further reading for this lecture

To learn more about JSON (there isn't much more to know!):
 - http://www.secretgeek.net/json_3mins.asp
 - http://en.wikipedia.org/wiki/JSON (esp. "Data types, syntax, and examples")
 - http://tools.ietf.org/html/rfc7159

A useful tool for playing with JSON on the command line is [jq](http://stedolan.github.io/jq/).

To learn more about about the prevailing design pattern ("REST") for web-based APIs:
 - http://en.wikipedia.org/wiki/Representational_state_transfer
 
One wildcard is the wide variety of authentication strategies employed ("basic auth", cookies, bearer token, OAuth, OAuth 2, etc.).  For several of these, the documentation at http://docs.python-requests.org/en/latest/user/authentication/ is helpful.

### Exit Tickets
1. Explain the difference between requests.get() and requests.post().
2. What data structures do JSON objects in Python use?
3. Describe what the remote site is doing when it receives an API request from you.

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*