<h3> Getting data from JSON files </h3>

<h4> Use the <i>requests</i> library to get the contents of a web page </h4>
First use the website <b> brit.co </b> as an example, for a search on <font color='green'> "succulents" </font> <i> (the plant, not the edible!) </i>

In [12]:
import requests

In [13]:
url = "http://www.brit.co/?s=succulents"
response = requests.get(url)
type(response)

requests.models.Response

Check if the request was successful by checking the response status code.

In [14]:
if response.status_code == 200:
    print("Success!")
response.status_code

Success!


200

To begin, list the attributes of the response...

In [10]:
# list of attributes
dir(response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__getstate__',
 '__hash__',
 '__init__',
 '__iter__',
 '__module__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

...or use <i> response.headers </i> to get general information about the response and connection.

In [15]:
dict(response.headers)

{'accept-ranges': 'bytes',
 'age': '908',
 'cache-control': 'no-cache="set-cookie"',
 'connection': 'keep-alive',
 'content-encoding': 'gzip',
 'content-length': '11077',
 'content-type': 'text/html; charset=UTF-8',
 'date': 'Thu, 15 Sep 2016 22:11:47 GMT',
 'link': '<http://www.brit.co/api/v4/>; rel="https://api.w.org/", <http://www.brit.co/api/v3>; rel="https://github.com/WP-API/WP-API", <http://www.brit.co/page/2/?s=succulents>; rel=next',
 'server': 'nginx/1.10.1',
 'set-cookie': 'AWSELB=CD4D3B131ED2265459EAAC501F754497B9D33613565B6388B48C387A9BB1C1C643C780EE32893D9D6480126402F2BC72D992C690966957671151C4CE4F8AEA42104D173F34;PATH=/;MAX-AGE=30',
 'vary': 'Accept-Encoding',
 'via': '1.1 varnish-v4',
 'x-cache': 'HIT',
 'x-varnish': '50638124 60821435'}

Getting the raw text returned by the server is not particular helpful...

In [16]:
# raw text
response.text

u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<!--[if lt IE 7]><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" class="ie ie6 lte9 lte8 lte7 no-js will-load-js-app"><![endif]-->\n<!--[if IE 7]><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" class="ie ie7 lte9 lte8 lte7 no-js will-load-js-app"><![endif]-->\n<!--[if IE 8]><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" class="ie ie8 lte9 lte8 no-js will-load-js-app"><![endif]-->\n<!--[if IE 9]><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" class="ie ie9 lte9 no-js will-load-js-app"><![endif]-->\n<!--[if (gt IE 9)|!(IE)]><!--><html data-ng-app="bc.components" class="no-js will-load-js-app" xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xml:lang="en" lang="en"><!--<![endif]-->\n<head profile="http://gmpg.org/xfn/11">\

The data is returned either as JSON, XML, or HTML. 

<h4> JSON </h4>

So the <i> requests </i> library contains a built-in JSON reader (nice).
<n> Try it out using a different example: <b> Google Maps </b> data (with its API). </n>

In [19]:
# Simply get the geo data belonging to a certain address
import requests
address = "350 Fifth Ave"
url = "https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
response = requests.get(url)
response.text

u'{\n   "results" : [\n      {\n         "address_components" : [\n            {\n               "long_name" : "350",\n               "short_name" : "350",\n               "types" : [ "street_number" ]\n            },\n            {\n               "long_name" : "5th Avenue",\n               "short_name" : "5th Ave",\n               "types" : [ "route" ]\n            },\n            {\n               "long_name" : "Park Slope",\n               "short_name" : "Park Slope",\n               "types" : [ "neighborhood", "political" ]\n            },\n            {\n               "long_name" : "Brooklyn",\n               "short_name" : "Brooklyn",\n               "types" : [ "political", "sublocality", "sublocality_level_1" ]\n            },\n            {\n               "long_name" : "Kings County",\n               "short_name" : "Kings County",\n               "types" : [ "administrative_area_level_2", "political" ]\n            },\n            {\n               "long_name" : "New York",

Look at the data in JSON. Note that the "u" prefix indicates the string is of type Unicode (a Python thing).

In [20]:
json_data = response.json()
json_data

{u'results': [{u'address_components': [{u'long_name': u'350',
     u'short_name': u'350',
     u'types': [u'street_number']},
    {u'long_name': u'5th Avenue',
     u'short_name': u'5th Ave',
     u'types': [u'route']},
    {u'long_name': u'Park Slope',
     u'short_name': u'Park Slope',
     u'types': [u'neighborhood', u'political']},
    {u'long_name': u'Brooklyn',
     u'short_name': u'Brooklyn',
     u'types': [u'political', u'sublocality', u'sublocality_level_1']},
    {u'long_name': u'Kings County',
     u'short_name': u'Kings County',
     u'types': [u'administrative_area_level_2', u'political']},
    {u'long_name': u'New York',
     u'short_name': u'NY',
     u'types': [u'administrative_area_level_1', u'political']},
    {u'long_name': u'United States',
     u'short_name': u'US',
     u'types': [u'country', u'political']},
    {u'long_name': u'11215',
     u'short_name': u'11215',
     u'types': [u'postal_code']},
    {u'long_name': u'2813',
     u'short_name': u'2813',
     u'

Some specific information in the JSON object to look at...

In [21]:
for key in json_data:
    print(key)

status
results


In [22]:
json_data["status"]

u'OK'

In [23]:
json_data["results"]

[{u'address_components': [{u'long_name': u'350',
    u'short_name': u'350',
    u'types': [u'street_number']},
   {u'long_name': u'5th Avenue',
    u'short_name': u'5th Ave',
    u'types': [u'route']},
   {u'long_name': u'Park Slope',
    u'short_name': u'Park Slope',
    u'types': [u'neighborhood', u'political']},
   {u'long_name': u'Brooklyn',
    u'short_name': u'Brooklyn',
    u'types': [u'political', u'sublocality', u'sublocality_level_1']},
   {u'long_name': u'Kings County',
    u'short_name': u'Kings County',
    u'types': [u'administrative_area_level_2', u'political']},
   {u'long_name': u'New York',
    u'short_name': u'NY',
    u'types': [u'administrative_area_level_1', u'political']},
   {u'long_name': u'United States',
    u'short_name': u'US',
    u'types': [u'country', u'political']},
   {u'long_name': u'11215',
    u'short_name': u'11215',
    u'types': [u'postal_code']},
   {u'long_name': u'2813',
    u'short_name': u'2813',
    u'types': [u'postal_code_suffix']}],
  u'

Ha, almost everything else in "results". 
<n> Next, get more specific information from the data. How about getting the latitude and longitude of a given address? </n>

In [24]:
# First know how the address string is constructed
address = "Empire State Building, New York, NY"
address.split(" ")

['Empire', 'State', 'Building,', 'New', 'York,', 'NY']

In [26]:
# Then reconstruct the address to be used to call its geocode
x = address.split(" ")
"_".join(x)

'Empire_State_Building,_New_York,_NY'

In [27]:
# Define a function to get the geocode (latitude, longitude) from the JSON data
# Putting all the previous steps together
def get_geocode(address):
    address = "_".join(address.split(" "))
    url = "https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
    import requests
    response = requests.get(url)
    if response.status_code == 200:
        json_data = response.json()
        lat = json_data["results"][0]["geometry"]["location"]["lat"]
        lng = json_data["results"][0]["geometry"]["location"]["lng"]
        return lat, lng
    else:
        return None

In [28]:
get_geocode("350 Fifth Ave, New York, NY")

(40.7487097, -73.9856556)

Put the geocode back to Google, and it is right where the Empire State Building is.