# API Exercises

Here are two exercises that involve working with APIs and dictionaries.

One is using the Open Brewery API found at https://www.openbrewerydb.org/, and the other is using the API for UK Police Data, found at https://data.police.uk/docs/.

You can complete them in either order!

Remember that you can create new cells with esc + a or b

## Breweries

### Q1: Load the first page of results with 50 results per page

In [5]:
import requests 
first_50 = requests.get('https://api.openbrewerydb.org/breweries?page=1&per_page=50')
first_50.json()
# url = 'https://api.openbrewerydb.org/breweries?page=1&per_page=50'

[{'id': 2,
  'name': 'Avondale Brewing Co',
  'brewery_type': 'micro',
  'street': '201 41st St S',
  'city': 'Birmingham',
  'state': 'Alabama',
  'postal_code': '35222-1932',
  'country': 'United States',
  'longitude': '-86.774322',
  'latitude': '33.524521',
  'phone': '2057775456',
  'website_url': 'http://www.avondalebrewing.com',
  'updated_at': '2018-08-23T23:19:57.825Z'},
 {'id': 4,
  'name': 'Band of Brothers Brewing Company',
  'brewery_type': 'micro',
  'street': '1605 23rd Ave',
  'city': 'Tuscaloosa',
  'state': 'Alabama',
  'postal_code': '35401-4653',
  'country': 'United States',
  'longitude': '-87.5621551272424',
  'latitude': '33.1984907123707',
  'phone': '2052665137',
  'website_url': 'http://www.bandofbrosbrewing.com',
  'updated_at': '2018-08-23T23:19:59.462Z'},
 {'id': 44,
  'name': 'Trim Tab Brewing',
  'brewery_type': 'micro',
  'street': '2721 5th Ave S',
  'city': 'Birmingham',
  'state': 'Alabama',
  'postal_code': '35233-3401',
  'country': 'United States

### Q2: This is only the first 50 results.  Get the next 50 and put them together.

In [83]:
second_50 = requests.get('https://api.openbrewerydb.org/breweries?page=2&per_page=50')
top_100 = first_50.json()+second_50.json()
top_100_test_lol = [first_50.json(),second_50.json()]
top_100_test_flat = [val for sublist in top_100_test_lol for val in sublist]
len(top_100_test_lol)
len(top_100_test_flat)


# is equivalent to:

# flattened = []
# for sublist in list_of_lists:
#     for val in sublist:
#         flattened.append(val)


100

### Q3: How many of these 100 breweries in are in Alaska?

In [38]:
full_list_of_states = list(map(lambda country: country['state'],top_100))
setlist_of_states_from_top_100 = list(set(full_list_of_states))
setlist_of_states_from_top_100
def breweries_in_state(unique_list_of_states, full_list):
    CF=[]
    for state in unique_list_of_states:
        CF_dict = {}
        CF_dict['state'] = state
        CF_dict['count'] = full_list.count(state)
        CF.append(CF_dict)
    #return CF_dict
    return sorted(CF, key=lambda k:k['count'],reverse=True)
breweries_in_state(setlist_of_states_from_top_100, full_list_of_states)

[{'state': 'California', 'count': 69},
 {'state': 'Arizona', 'count': 12},
 {'state': 'Arkansas', 'count': 5},
 {'state': 'Colorado', 'count': 5},
 {'state': 'Alabama', 'count': 4},
 {'state': 'Alaska', 'count': 3},
 {'state': None, 'count': 2}]

### Q4: Of these 100 breweries, what are the different unique brewery types?

In [27]:
full_list_of_brew_types = list(map(lambda typ: typ['brewery_type'],top_100))
setlist_of_brew_types_from_top_100 = list(set(full_list_of_brew_types))
setlist_of_brew_types_from_top_100

['brewpub', 'planning', 'proprietor', 'contract', 'regional', 'micro']

### Q5: What is the cloest brewery to "Devil's Potion Brewing Company LLC" ?
* Hint 1: Use Euclidian distance w/ longitude and latitude (assume longitude and latitude are a Carteisan coordinate system)
* Hint 2: You'll have to ignore the entries with "none" for latitude or longitude

In [59]:
test = list(map(lambda x: float(x['latitude']),top_100[0:2]))
test

[33.524521, 33.1984907123707]

In [61]:
def remove_invalid_distance(brewery_list):
    #remove long/lat values of "None"
    return list(filter(lambda x: x['latitude']!=None,brewery_list))

def convert_long_lat_to_float(brewery_list):
    #coerce latitude into float
    #coerce longitude into float
    new_brew_list = []
    for brewery in brewery_list:
        new_brew = brewery.copy()
        new_brew['longitude'] = float(brewery['longitude'])
        new_brew['latitude'] = float(brewery['latitude'])
        new_brew_list.append(new_brew)
    return new_brew_list

top_100_cleaned = convert_long_lat_to_float(remove_invalid_distance(top_100))
devil_potion = list(filter(lambda x: x['name']=="Devil's Potion Brewing Company LLC",top_100_cleaned))[0]
devil_potion

{'id': 525,
 'name': "Devil's Potion Brewing Company LLC",
 'brewery_type': 'planning',
 'street': '',
 'city': 'Escondido',
 'state': 'California',
 'postal_code': '92026-3187',
 'country': 'United States',
 'longitude': -117.0814849,
 'latitude': 33.1216751,
 'phone': '7605329091',
 'website_url': 'http://www.devilspotion.com',
 'updated_at': '2018-08-23T23:27:21.742Z'}

In [64]:
import math
def brewery_distance(first_brewery, second_brewery):
    a = first_brewery['longitude']-second_brewery['longitude']
    b = first_brewery['latitude']-second_brewery['latitude']
    c_squared = a**2+b**2
    return math.sqrt(c_squared)
brewery_distance(devil_potion,top_100_cleaned[0])

30.309840116145182

In [79]:
def closest_brew_buddy(first_brewery, brewery_list):
    remove_first_brewery = list(filter(lambda list_brewery: first_brewery!=list_brewery,brewery_list))
    all_brew_distances_sorted_close_far = sorted(list(map(lambda other_brewery: brewery_distance(first_brewery, other_brewery),remove_first_brewery)))
    closest_brewery = all_brew_distances_sorted_close_far[0]
    return list(filter(lambda other_brewery: closest_brewery==brewery_distance(first_brewery, other_brewery),brewery_list))[0]

In [80]:
closest_brew_buddy(devil_potion,top_100_cleaned)

{'id': 924,
 'name': 'Port Brewing Co / The Lost Abbey',
 'brewery_type': 'micro',
 'street': '155 Mata Way Ste 104',
 'city': 'San Marcos',
 'state': 'California',
 'postal_code': '92069-2983',
 'country': 'United States',
 'longitude': -117.149141,
 'latitude': 33.141537,
 'phone': '8009186816',
 'website_url': 'http://www.portbrewing.com',
 'updated_at': '2018-08-24T00:01:07.397Z'}

### Q6: Write a function to find the closest brewery to any other given brewery

In [None]:
#see closest_brew_buddy()


### Q7: How would you get the first 10 pages from this API and put them all together using a for loop?

In [100]:
def pull_api_brewery_info(number_of_pages):
    brew_lol = []
    for page in list(range(1, number_of_pages+1)):
        brew_entry = requests.get('https://api.openbrewerydb.org/breweries?page='+str(page)+'&per_page=50')
        brew_lol.append(brew_entry.json())
    return [val for sublist in brew_lol for val in sublist]
len(pull_api_brewery_info(10))

500

# Crime in the UK

### We will be analyzing different crimes reported in the UK as provided by https://data.police.uk/docs/

# Exploratory analysis
##### 1. How many total crimes were there at latitude : 52.63902 and -1.131321 on November of 2017.
Use the street level crimes data, the documentation for the API can be found at https://data.police.uk/docs/method/crime-street/

In [137]:
import requests
def total_crimes_at_location(lat,lng,YYYY_MM=None): 
    TCAL = requests.get('https://data.police.uk/api/crimes-street/all-crime?lat='+str(lat)+'&lng='+str(lng)+'&date='+str(YYYY_MM)+'')
    return TCAL.json()

total_crimes_at_location(52.63902,-1.131321,'2017-11')[0]

{'category': 'anti-social-behaviour',
 'location_type': 'Force',
 'location': {'latitude': '52.635184',
  'street': {'id': 883410, 'name': 'On or near Shopping Area'},
  'longitude': '-1.135455'},
 'context': '',
 'outcome_status': None,
 'persistent_id': '',
 'id': 61222401,
 'location_subtype': '',
 'month': '2017-11'}

##### 2. We've queried the API once, but it could get annoying to retype the url over and over again, create a function `make_api_request` that enables you to query the API.


 The parameters for the function should be:
* lat (float) : latitude
* lng (float) : longitude
* date (string): Date in the format YYYY-MM
    * default value = `None`
    
And it should return a json object of 

for more information on default values check out http://blog.thedigitalcatonline.com/blog/2015/02/11/default-arguments-in-python/

In [None]:
#See total_crimes_at_location()


##### 3. Write a function `categories_of_crime` that will determine the count of each type of crime for a given latitude and longitude. This is labelled as 'category' in the records. Your function should call the `make_api_request` function you created.

The parameters for the function should be:

* lat (float) : latitude
* lng (float) : longitude
* date (str) default = None

The function should return:
* a dictionary with the count of each type of crime



Once you've created the function, try it with these locations
* lat, lng of 51.5017861,-0.1432319   (Buckingham Palace)
* lat, lng of 53.480161, -2.245163     (Manchester)

In [19]:
def make_category_dictionary(unique_list, full_list):
    CD=[]
    for category in unique_list:
        CD_dict = {}
        CD_dict['category'] = category
        CD_dict['frequence_count'] = full_list.count(category)
        CD.append(CD_dict)
    return sorted(CD, key=lambda k:k['category'])

In [39]:
def categories_of_crime(lat,lng,YYYY_MM=None): 
    data = total_crimes_at_location(lat,lng,YYYY_MM=None)
    all_category_instances = list(map(lambda crime: crime['category'],data))
    setlist_of_categories = list(set(all_category_instances))
    return make_category_dictionary(setlist_of_categories, all_category_instances)
buckinham_palace = {'lat':51.5017861,'lng':-0.1432319,'name':'buckingham'}
manchester = {'lat':53.480161,'lng': -2.245163,'name':'manchester'}
categories_of_crime(buckinham_palace['lat'],buckinham_palace['lng'])

[{'category': 'anti-social-behaviour', 'frequence_count': 608},
 {'category': 'bicycle-theft', 'frequence_count': 65},
 {'category': 'burglary', 'frequence_count': 138},
 {'category': 'criminal-damage-arson', 'frequence_count': 82},
 {'category': 'drugs', 'frequence_count': 74},
 {'category': 'other-crime', 'frequence_count': 22},
 {'category': 'other-theft', 'frequence_count': 810},
 {'category': 'possession-of-weapons', 'frequence_count': 26},
 {'category': 'public-order', 'frequence_count': 174},
 {'category': 'robbery', 'frequence_count': 141},
 {'category': 'shoplifting', 'frequence_count': 329},
 {'category': 'theft-from-the-person', 'frequence_count': 586},
 {'category': 'vehicle-crime', 'frequence_count': 102},
 {'category': 'violent-crime', 'frequence_count': 553}]

**Bonus**: 
* Write a function that determines the difference between Buckingham Palace and Manchester in terms of the number of crimes in each category.
    * In which category is there the largest absolute difference between the category of crime?
* Create a histogram depiction of the categories of crime

In [34]:
def category_compare(full_list1, full_list2):
    CC=[]
    for category in full_list1:
        CC_dict = {}
        CC_dict['category'] = category
        CC_dict['count'] = full_list2.count(category)
        CC.append(CC_dict)
    return sorted(CC, key=lambda k:k['count'],reverse=True)

In [44]:
def categories_of_crime_compare(place_1_information,place_2_information,YYYY_MM=None): 
    place_1_stats = categories_of_crime(place_1_information['lat'],place_1_information['lng'],YYYY_MM=None)
    place_2_stats = categories_of_crime(place_2_information['lat'],place_2_information['lng'],YYYY_MM=None)
    compare_list = []
    for x in list(range(0,len(place_1_stats))):
        compare_dict = {}
        compare_dict['category'] = place_1_stats[x]['category']
        compare_dict['frequency_count_'+str(place_1_information['name'])] = place_1_stats[x]['frequence_count']
        compare_dict['frequency_count_'+str(place_2_information['name'])] = place_2_stats[x]['frequence_count']
        compare_dict['absolute difference'] = abs(place_1_stats[x]['frequence_count']-place_2_stats[x]['frequence_count'])
        compare_list.append(compare_dict)
    return compare_list

In [47]:
buckinham_manchester_compare = categories_of_crime_compare(buckinham_palace,manchester,'2011-11')

In [48]:
!pip install plotly
!pip install xlrd

import plotly

plotly.offline.init_notebook_mode(connected=True)

[31mtwisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.[0m
[33mYou are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[31mtwisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.[0m
[33mYou are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [59]:
def plot_category_histogram(place_1_information,place_2_information,YYYY_MM=None):
    CC = categories_of_crime_compare(place_1_information,place_2_information,YYYY_MM=None)
    plot_dict_1 = {}
    plot_dict_1['type'] = 'bar'
    plot_dict_1['x'] = list(map(lambda crime: crime['category'],CC))
    plot_dict_1['y'] = list(map(lambda frequency: frequency['frequency_count_'+str(place_1_information['name'])],CC))
    plot_dict_1['name'] = str(place_1_information['name'].title())
    plot_dict_2 = {}
    plot_dict_2['type'] = 'bar'
    plot_dict_2['x'] = list(map(lambda crime: crime['category'],CC))
    plot_dict_2['y'] = list(map(lambda frequency: frequency['frequency_count_'+str(place_2_information['name'])],CC))
    plot_dict_2['name'] = str(place_2_information['name'].title())
    return plotly.offline.iplot([plot_dict_1,plot_dict_2]) 


In [60]:
plot_category_histogram(buckinham_palace,manchester,'2011-11')

##### 4. Create a function `find_outcome_statuses` that will determine outcome statuses for a given latitude and longitude and date (optional)?
Investigate the data to determine where the outcome statuses are located.

**NOTE**: You'll notice that some of these crimes do not have crime outcomes. Make these into the category of "Not Resolved."

**NOTE 2**: These might take a long time to execute if you do not specify a month

**Bonus**: What is the ratio of crimes investigated to those not investigated? Is it higher near London or Manchester?

In [111]:
def crime_status(unique_list, full_list):
    CS=[]
    for status in unique_list:
        CS_dict = {}
        CS_dict['outcome_status'] = status
        CS_dict['frequency_count'] = full_list.count(status)
        CS.append(CS_dict)
    return sorted(CS, key=lambda k:k['outcome_status'],reverse=True)

In [124]:
def find_outcome_statuses(lat,lng,YYYY_MM=None):
    crime_data = total_crimes_at_location(lat,lng,YYYY_MM=None)
    crime_status_list = []
    for crime in crime_data:
        if crime['outcome_status']==None:
            crime_status_list.append("Not Resolved")
        else:
            crime_status_list.append(crime['outcome_status'])
    real_status_list = list(filter(lambda x:x!='Not Resolved',crime_status_list))
    list_of_status_categories = list(map(lambda status: status['category'],real_status_list))
    unique_list_of_status_categories = set(list_of_status_categories)
    real_status_frequency_list = crime_status(unique_list_of_status_categories, list_of_status_categories)
    add_not_resolved = {'outcome_status' : 'Not Resolved','frequency_count' : len(list(filter(lambda x:x=='Not Resolved',crime_status_list)))}
    real_status_frequency_list.append(add_not_resolved)
    return real_status_frequency_list

#how to determine investigated vs. not investigated
def outcome_ratio(lat,lng,YYYY_MM=None):
    outcome_statuses = find_outcome_statuses(lat,lng,YYYY_MM=None)
    total_incidents = sum(list(map(lambda frequency:frequency['frequency_count'],outcome_statuses)))
    total_not_resolved = 
    return 

In [126]:
#total_crimes_at_location(52.63902,-1.131321,'2017-11')[4]
find_outcome_statuses(52.63902,-1.131321,'2017-11')
#outcome_ratio(52.63902,-1.131321,'2017-11')

[{'outcome_status': 'Under investigation', 'frequency_count': 404},
 {'outcome_status': 'Unable to prosecute suspect', 'frequency_count': 204},
 {'outcome_status': 'Offender given a caution', 'frequency_count': 5},
 {'outcome_status': 'Local resolution', 'frequency_count': 14},
 {'outcome_status': 'Investigation complete; no suspect identified',
  'frequency_count': 547},
 {'outcome_status': 'Further investigation is not in the public interest',
  'frequency_count': 23},
 {'outcome_status': 'Formal action is not in the public interest',
  'frequency_count': 36},
 {'outcome_status': 'Awaiting court outcome', 'frequency_count': 76},
 {'outcome_status': 'Action to be taken by another organisation',
  'frequency_count': 4},
 {'outcome_status': 'Not Resolved', 'frequency_count': 257}]

##### 5. Write a function `month_highest_crimes` that will return the month that had the highest number of crimes for a latitude, longitude and a year.

Inputs
* lat (float) : latitude
* lng (float) : longitude
* year (str) : in the format YYYY

Output
* month with highest crime (int)

**Bonus** Make a graph of how the number of crimes changed over time for a year. This will likely require a new function. Is seasonality a factor? Do the type of crimes change over time?

In [None]:
def month_information(unique_list, full_list):
    MI=[]
    for month in unique_list:
        MI_dict = {}
        MI_dict['outcome_status'] = month
        MI_dict['frequency_count'] = full_list.count(month)
        MI.append(MI_dict)
    return sorted(MI, key=lambda k:k['outcome_status'],reverse=True)

In [134]:
def separate_date_field_YYYYMM(entry):
    cleaned_entry = entry.copy()
    cleaned_month = entry['month'].split("-")
    cleaned_entry['year'] = int(cleaned_month[0])
    cleaned_entry['month_actual'] = int(cleaned_month[1])
    return cleaned_entry
separate_date_field_YYYYMM(total_crimes_at_location(52.63902,-1.131321,'2017-11')[0])

{'category': 'anti-social-behaviour',
 'location_type': 'Force',
 'location': {'latitude': '52.635184',
  'street': {'id': 883410, 'name': 'On or near Shopping Area'},
  'longitude': '-1.135455'},
 'context': '',
 'outcome_status': None,
 'persistent_id': '',
 'id': 61222401,
 'location_subtype': '',
 'month': '2017-11',
 'year': 2017,
 'month_actual': 11}

In [135]:
def month_highest_crimes(lat,lng,YYYY):
    crime_data = total_crimes_at_location(lat,lng,YYYY)
    clean_crime_data = list(map(lambda entry:separate_date_field_YYYYMM(entry),crime_data)
    all_months = list(map(lambda x: x['month_actual'],clean_crime_data))
    setlist_months = set(all_months)
    return month_information(setlist_months, all_months)

In [139]:
month_highest_crimes(52.63902,-1.131321,'2017') #I think the function is looking for '2017-MM', solve for this?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### Bonus Open Ended Questions

1. Take a look at the https://data.police.uk/docs/method/stops-street/ API. Is there a correlation between gender and being stopped and searched? How about race and being stopped and searched?