# Showing Citizens Hospitals Distribution along Toronto and its Boroughs

### February 9, 2021

## Introduction

The project: "Showing Citizens Hospitals Distribution along Toronto and its boroughs" aims to be a friendly way to inform people where hospitals are across the most populated city of Canada according to the 2016 national Census and what their ratings are.


The present document represents the Final Project for the IBM Data Science Certificate taught through Coursera Platform. It is requiered to used "Forsquare", "an independent location data platform for understanding how people move through the real world", according to the firm.

## Business Problem

The purpose of the project is to inform and empower regular citizens from Toronto City in the canadian province of Ontario or willing to live there, in a simple and graphical way so they can see how Hospitals are located and rated across the city boroughs. 

It is important to note that the proximity to a hospital does not necessary imply a better access to health assitance, but the variable can influence the decision on where to live, especially focusing on vulnerable population, where this project may facilitate an informed decision of the user on which borough or neighborhood to choose.

## Data, Packages and Tools

The data used is the names of the hospitals, their locations, ratings and capacity, measured by number of beds. It comes principally from the following sources:

1. Canada 2016 Census, for selecting the most populated city
2. Canadian Institute for Health Information (CIHI) for the hospitals located in the Health Region of "Toronto Central LHIN" and number of beds per hospital
3. Forthsquare for Hospitals location
4. Google for Hospital location and rating
4. Several python packages for managing and showing data, as pandas, numpy, request, json, folium and geopy. 
5. IBM Cloud Pak is the tool for running the code

## Methodology

A data base is generated for each hospital shown in Foursquare for the determined are, showing its coordinates and borough. Afterwards tables and maps will be genarated in order to analyze the data and make a decision taking account the number of health facilities and their capacity.

#### Importing the libraries that will be used in the project

In [1]:
# @hidden_cel
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#conda install folium
!pip install folium
import folium # map rendering library
from folium import plugins

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

print('Libraries imported.') 

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.6 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Libraries imported.


#### Creating the data frame of the cities

In [2]:
# @hidden_cel
address1 = 'Toronto, Ontario'
geolocator1 = Nominatim(user_agent="Toronto_explorer")
location1 = geolocator1.geocode(address1)
latitude1 = location1.latitude
longitude1 = location1.longitude
#print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude1, longitude1))

address2 = 'Montreal, Quebec'
geolocator2 = Nominatim(user_agent="Montreal_explorer")
location2 = geolocator2.geocode(address2)
latitude2 = location2.latitude
longitude2 = location2.longitude
#print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude2, longitude2))

address3 = 'Calgary, Alberta'
geolocator3 = Nominatim(user_agent="Calgary_explorer")
location3 = geolocator3.geocode(address3)
latitude3 = location3.latitude
longitude3 = location3.longitude
#print('The geograpical coordinate of Calgary are {}, {}.'.format(latitude3, longitude3))

In [3]:
dfcanada = pd.DataFrame(np.array([["Toronto", "Ontario",2731571,latitude1,longitude1], ['Montreal','Quebec',1704694,latitude2,longitude2], ['Calgary','Alberta',1239220,latitude3,longitude3]]),
                   columns=['City', 'Province',"Population",'Latitude', 'Longitude'])
dfcanada[["Population",'Latitude', 'Longitude']] = dfcanada[["Population",'Latitude', 'Longitude']].apply(pd.to_numeric) 
dfcanada

Unnamed: 0,City,Province,Population,Latitude,Longitude
0,Toronto,Ontario,2731571,43.653482,-79.383935
1,Montreal,Quebec,1704694,45.497216,-73.610364
2,Calgary,Alberta,1239220,51.053423,-114.062589


We can see that Toronto is the most populated city of Canada. As a second part of the project more cities can be added.

#### Generating Canadas map and marking the cities

In [4]:
# Map of Canada using latitude and longitude values
map_canada = folium.Map(location=[60.130, -110.35], zoom_start=4)

# add markers to map
for city, province, lat, lng, in zip(dfcanada['City'], dfcanada['Province'], dfcanada['Latitude'], dfcanada['Longitude']):
    label = '{}, {}'.format(city, province)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='turquoise',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_canada)  
    
map_canada

  If you are not able to see the map, you may find a picture of it in the following link: https://raw.githubusercontent.com/verohy/Coursera_Capstone/master/Canada%20Map.png

#### Generating a table and a map using Foursquare API

In [5]:
# @hidden_cel
#credentials for the Foursquare API
CLIENT_ID = 'KHLA0MRHSLCYLHZSLZAGPQX4QZL5NYKCFQSMRG2JAXUIWC2O' # your Foursquare ID
ACCESS_TOKEN = 'DR4VOEHRB3NBWSSFJ0NEOYB43MNJ1SVDO1DQCXJ4C3J1C0JU' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 200
print('Your CLIENT_ID is: ' + CLIENT_ID)

Your CLIENT_ID is: KHLA0MRHSLCYLHZSLZAGPQX4QZL5NYKCFQSMRG2JAXUIWC2O


In [6]:
# @hidden_cel
CLIENT_SECRET = 'LMFK1NJLDAGTPRSNPMHOOANCZ55WFZYWLISDH0NLUSAWBDDO' # your Foursquare Secret

In [7]:
# @hidden_cel
#### Exercise using Toronto Data

In [8]:
# @hidden_cel
search_query = 'Hospital'
radius = 500000
#print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude1, longitude1,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
#url

In [9]:
# @hidden_cel
results = requests.get(url).json()

In [10]:
# @hidden_cel
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
#dataframe.head()



In [11]:
# @hidden_cel
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

#dataframe_filtered.head(50)

In [12]:
# @hidden_cel
#Data cleaning to get rid of unwanted categories and cities
toronto_data = dataframe_filtered[["name","categories","lat","lng","city","id"]] 
toronto_data = toronto_data.fillna('')
toronto_data =toronto_data[toronto_data["categories"].str.contains("Hospital")].reset_index(drop=True)
toronto_data = toronto_data[~toronto_data.city.str.contains('a')]

In [13]:
toronto_data

Unnamed: 0,name,categories,lat,lng,city,id
0,The Hospital for Sick Children (SickKids),Hospital,43.657499,-79.386512,Toronto,4ad4c064f964a5206ef820e3
1,St. Michael's Hospital,Hospital,43.653784,-79.377809,Toronto,4ad4c064f964a5206ff820e3
2,Toronto General Hospital,Hospital,43.658762,-79.388292,Toronto,4ad4c064f964a52070f820e3
3,Women's College Hospital,Hospital,43.661491,-79.387602,Toronto,4af0615cf964a5208cdb21e3
4,Mount Sinai Hospital Women's and Infants' Depa...,Hospital,43.659612,-79.390761,Toronto,4b1fbe8af964a5209e2824e3
5,Toronto Western Hospital,Hospital,43.653434,-79.406074,Toronto,4af2fb96f964a52086e921e3
6,"Mount Sinai Hospital, Joseph and Wolf Lebovic ...",Hospital,43.658247,-79.391473,Toronto,4ae9bd8af964a520feb521e3
7,Dundas Euclid Animal Hospital,Hospital,43.651518,-79.409984,Toronto,4bce159acc8cd13ac7b6c3cf
8,St Michael's Hospital - Gastroenterology,Hospital,43.653653,-79.378081,Toronto,59525287c4df1d49b8a1e677
9,Inpatient Lounge - St. Michael's Hospital,Hospital,43.653428,-79.379383,,4fca52d5e4b098b3ef28a713


### Results

Since the data is ready we will start to analize the results. There are several names similars between each other, suggesting duplicated hospitals.

In [14]:
venues_map = folium.Map(location=[latitude1, longitude1], zoom_start=12) # generate map centred around Toronto
# add hospitals as red circle markers
for name, lat, lng, label in zip(toronto_data.name, toronto_data.lat, toronto_data.lng, toronto_data.categories):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

  If you are not able to see the map, you may find a picture of it in the following link: https://raw.githubusercontent.com/verohy/Coursera_Capstone/master/Toronto_hospitals_Foursquare.png

## Discussion

* It is possible to see that many of the places are concentrated suggesting duplicated hospitals, when comparing to the data base from the CIHI not all of the hospitals listed are in the data, so a new data base will be generated.

### Generating a new data base

A new data base is generated due to incongruencies with the CIHI information. Ratings and location from google and number of beds per hospital from the CIHI are added.

In [15]:
# @hidden_cel
data={"name":["Baycrest","Casey House","Centre for Addiction and Mental Health","Holland Bloorview Kids Rehabilitation Hospital","Hospital for Sick Children","Michael Garron Hospital","Runnymede Healthcare Centre","Sinai Health System","Sunnybrook Health Sciences Centre","The Salvation Army Toronto Grace Health Centre","St. Joseph's Health Centre","St. Michael's Hospital","Toronto General Hospital","Toronto Western hospital","The Princess Margaret Cancer Centre","Toronto Rehabilitation Institute"],"Type of hospital":["Extended Care/Chronic, Rehabilitation, Psychiatric","Other","Psychiatric","Rehabilitation,Extended Care/Chronic","Pediatric","General","Extended Care/Chronic","General","General","Extended Care/Chronic,General","General","General","General","General","General","General"],"lat":["43.7300746225112","43.6693867285034","43.6428897749485","43.7213767933948","43.657499","43.6906452166705","43.6675445803748","43.657701","43.7253742521628","43.6711004049032","43.6483360510713","43.6581550382478","43.6591703311581","43.6534865436486","43.6585674749449","43.658967598333"],"long":["-79.4338938789395","-79.3769362272105","-79.4180492775939","-79.3534271285244","-79.386512","-79.3185044194517","-79.4736385042386","-79.389711","-79.3650557948233","-79.3804227182411","-79.4694012313738","-79.3254670824765","-79.3875968543474","-79.4041595665315","-79.3897135355979","-79.386235635609"],"borough":["North York","Downtown Toronto","West Toronto","East York","Downtown Toronto","East York","West Toronto","Downtown Toronto","Central Toronto","Downtown Toronto","West Toronto","Downtown Toronto","Downtown Toronto","Downtown Toronto","Downtown Toronto","Downtown Toronto"],"rating":["3.8","4.8","3.6","4.7","4.2","2.9","3.6","4.6","3.4","3.2","3","3.5","4.1","3.5","4.2","4.2"],"Other":["Research and teaching for the elderly","HIV/AIDS","Mental health teaching ","Children Rehabilitation","Children","","Rehabilitation and medically complex care","","","","Part of Unity Health Toronto","Part of Unity Health Toronto","Part of University Health Network","Part of University Health Network","Part of University Health Network","Part of University Health Network"],"beds":["262","13","495","63","263","367","206","732","1098","119","426","463","417","256","220",""]} 
new_toronto=pd.DataFrame(data)
new_toronto[['beds',"lat","long","rating"]] = new_toronto[['beds',"lat","long","rating"]].apply(pd.to_numeric) 

In [16]:
new_toronto

Unnamed: 0,name,Type of hospital,lat,long,borough,rating,Other,beds
0,Baycrest,"Extended Care/Chronic, Rehabilitation, Psychia...",43.730075,-79.433894,North York,3.8,Research and teaching for the elderly,262.0
1,Casey House,Other,43.669387,-79.376936,Downtown Toronto,4.8,HIV/AIDS,13.0
2,Centre for Addiction and Mental Health,Psychiatric,43.64289,-79.418049,West Toronto,3.6,Mental health teaching,495.0
3,Holland Bloorview Kids Rehabilitation Hospital,"Rehabilitation,Extended Care/Chronic",43.721377,-79.353427,East York,4.7,Children Rehabilitation,63.0
4,Hospital for Sick Children,Pediatric,43.657499,-79.386512,Downtown Toronto,4.2,Children,263.0
5,Michael Garron Hospital,General,43.690645,-79.318504,East York,2.9,,367.0
6,Runnymede Healthcare Centre,Extended Care/Chronic,43.667545,-79.473639,West Toronto,3.6,Rehabilitation and medically complex care,206.0
7,Sinai Health System,General,43.657701,-79.389711,Downtown Toronto,4.6,,732.0
8,Sunnybrook Health Sciences Centre,General,43.725374,-79.365056,Central Toronto,3.4,,1098.0
9,The Salvation Army Toronto Grace Health Centre,"Extended Care/Chronic,General",43.6711,-79.380423,Downtown Toronto,3.2,,119.0


In [17]:
hospitals_map = folium.Map(location=[latitude1, longitude1], zoom_start=12) # generate map centred around Toronto
# add hospitals as red circle markers
for name, lat, lng, bor, rat in zip(new_toronto.name, new_toronto.lat, new_toronto.long, new_toronto.borough, new_toronto.rating):
    label = 'Name: {}, Borough: {}, Rating: {}'.format(name,bor,rat)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='white',
        fill_opacity=0.6
    ).add_to(hospitals_map)

# display map
hospitals_map

  If you are not able to see the map, you may find a picture of it in the following link: https://raw.githubusercontent.com/verohy/Coursera_Capstone/master/Toronto_hospitals_CIHI.png

In the map it is possible to see that the points are less concentrated and are more, what shows a better quality of the data. In order to compare the boroughs, the zones will be clustered to understand how the facilities are distributed:

In [18]:
hospitals_map = folium.Map(location = [latitude1, longitude1], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(hospitals_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label in zip(new_toronto.lat, new_toronto.long, new_toronto.name):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

hospitals_map

If you are not able to see the map, you can find a picture of it in the following link:
    1. Zoom = 9: https://raw.githubusercontent.com/verohy/Coursera_Capstone/master/Toronto_cluster_zoom9.png
    2. Zoom = 12: https://raw.githubusercontent.com/verohy/Coursera_Capstone/master/Toronto_cluster_zoom12.png

In [19]:
# @hidden_cel
# Get the dataset metadata by passing package_id to the package_search endpoint
# For example, to retrieve the metadata for this dataset:

# url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
# params = { "id": "4def3f65-2a65-4a4f-83c4-b2a4aed72d46"}
# package = requests.get(url, params = params).json()
# print(package["result"])

# # Get the data by passing the resource_id to the datastore_search endpoint
# # See https://docs.ckan.org/en/latest/maintaining/datastore.html for detailed parameters options
# # For example, to retrieve the data content for the first resource in the datastore:

# for idx, resource in enumerate(package["result"]["resources"]):
#     if resource["datastore_active"]:
#         url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/datastore_search"
#         p = { "id": resource["id"] }
#         data = requests.get(url, params = p).json()
#         df = pd.DataFrame(data["result"]["records"])
#         break
# df

To elaborate the conclusion the new data base is summerized in the following charts:

## Results

The results are order in two charts for a better understanding of the data. The first one shows the capacity of the hospitals and their rating average.

### Chart 1. Boroughs ordered by number of hospital beds

In [20]:
# @hidden_cel
summerize = new_toronto.groupby("borough").agg({"beds":"sum","name":"count","rating":"mean"})
summerize = summerize.sort_values(by=["beds"],ascending=False)
summerize = summerize.rename(columns={"beds":"Number of beds","name":"Number of hospitals","rating":"Rating Average" })

In [21]:
summerize

Unnamed: 0_level_0,Number of beds,Number of hospitals,Rating Average
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Downtown Toronto,2483.0,9,4.033333
West Toronto,1127.0,3,3.4
Central Toronto,1098.0,1,3.4
East York,430.0,2,3.8
North York,262.0,1,3.8


The second chart shows the best hospital per borough and its rating.

### Chart 2. Top Hospital per borough and its rating

In [22]:
# @hidden_cel
rating=new_toronto[["borough","name","rating"]].sort_values(by=["rating"],ascending=False)
rating=rating.groupby("borough").agg({"name":"first","rating":"max"})
rating=rating.sort_values(by=["rating"],ascending=False)
rating=rating.rename(columns={"rating":"Top 1 Hospital Rating","name":"Top 1 Hospital"})

In [71]:
rating

Unnamed: 0_level_0,Top 1 Hospital,Top 1 Hospital Rating
borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Downtown Toronto,Casey House,4.8
East York,Holland Bloorview Kids Rehabilitation Hospital,4.7
North York,Baycrest,3.8
West Toronto,Centre for Addiction and Mental Health,3.6
Central Toronto,Sunnybrook Health Sciences Centre,3.4


## Conclusion

We can conclude through the map and the number of beds per borough that Downtown Toronto is by far the most dense in hospitals and medicals beds followed by West, Central and the York's boroughs. As for the ratings, Downtown Toronto has the best rated with "Casey House" and the best average as well, followed closely by East York with the "Holland Bloorview Kids Rehabilitation Hospital" and the same average rating of 3.8 as North York.