# This is the notebook for the Capstone Final

## Introduction

In recent history, there has been a growing correlation between having lower wages and a higher likelihood of weight problems, such as obesity (Dyson 2018). Seeing as urban areas typically are home to lower wage citizens, we would also expect to see higher instances of health issues due to weight. 

Indeed, this is the case in Toronto, but the cause of this correlation might not be readily apparent. According to the Toronto Food Policy Council, there are a multitude of neighborhoods in Toronto that classify as “food deserts” (Martin Prosperity Institute 2010). Food deserts are localities where there is difficult access to quality food. This means that, while there may be small corner or convenience stores, there is a lack of proper grocery stores, and therefore less access to quality nutrition. 

My study will aim to leverage Foursquare data to see what boroughs of Toronto are lacking in access to proper food. I expect my stakeholders to be the citizenry itself, government at local, provincial, and national level, as well as any advocacy groups involved in nutrition in this region. This study should be considered important as it could help to better target funds aimed at mitigating the problem of food deserts.

## Data

For my data, I will use several Foursquare venues for what will count as a proper source of food. Foursquare has venue categories for “Grocery Store” and “Supermarket”, however, I will also be using “Health Food Store”, “Farmers Market”, “Fruit and Vegetable Store”, and “Organic Grocery”. I will refer to allthese cllectively as "Food Distributers". I have decided not to add “Bakery” and “Butcher” as there is not a guarantee that either place meets a standard of proper nutrition. For example, a bakery could specialize only in cakes or a butcher could sell low quality meats. It is difficult to be certain I have chosen the correct data points, but this should present a reasonable overview of the situation.

This data can be clustered at the Neighborhood level, allowing me to utilize the Toronto Borough data from our previous workshops. I will be studying only the boroughs that include “Toronto” in their name as one of my principle stakeholders should be the Toronto Food Policy Council. Furthermore, this should help target the most urban areas, where food deserts are most common. I should not need additional data sources besides Foursquare, which allows for uniformity in the study.

This study could be further expanded upon by further leveraging Foursquare “Tips” data. The quality of the above food retailers can be assessed to further stratify the data in the study. This would further assist the study’s stakeholders. However, this requires an upgraded Foursquare account, so I will refrain from including that data, at least for now.


## Methodology

In the code below, I use a commbination of data from Wikipedia as well as the Foursquare API to get a count of Food Distributors. To begin, I scrape Wikipedia for Neighborhood and Borough data using the Beautiful Soup package. I clean the resulting dataframe and join it with another dataframe that contains latitudal and longitudal data by postal codes. For the sake of studying the most urban areas, all boroughs not containing "Toronto" in their name are excluded. All neighborhoods in these boroughs are then mapped using Folium to give a picture of the area of study.

Using Foursquare, I then find a count of all venues in each neighborhood. This data is then reduced to the show only the count of Food Distributors as defined above on a neighborhood level. For mapping the severity of food deserts, I then convert this data into a dataframe that contains neighborhood names as well as dummy variables for if there are no food distributors, only one distributor, or more than one. The map represents this by replacing the blue markers from the previous map with red, yellow, and green markers, representing neighborhoods with no distributors, one distributor, or more than one, respectively.

This final map is used to show a geographic representation of where the problem of food deserts exist. Each marker contains the neighborhood names that they represent as well. This helps our stakeholders identify where problem areas are, allowing for more targeted assistance.

In [1]:
# The code below scrapes Wikipedia and creates a pandas dataframe

import pandas as pd

import requests

from bs4 import BeautifulSoup

req = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

soup = BeautifulSoup(req.content,'lxml')

table = soup.find_all('table')[0]

df = pd.read_html(str(table))

neighborhood=pd.DataFrame(df[0])

In [2]:
# Remove observations where Borough = Not assigned 
neighborhood = neighborhood[(neighborhood['Borough'] != 'Not assigned')]

In [3]:
# Reset index after dropped observations
neighborhood = neighborhood.reset_index(drop=True)

In [4]:
# Replace slashes for commas 
neighborhood = neighborhood.stack().str.replace('/',',').unstack()

In [5]:
neighborhood.shape

(103, 3)

Part 2

In [6]:
# Read csv from url
url="https://cocl.us/Geospatial_data"
c =pd.read_csv(url)

In [7]:
# Join two dfs
neighborhoodll = neighborhood.join(c.set_index('Postal Code'), on='Postal code')

In [8]:
neighborhoodll

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern , Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill , Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [9]:
# Reduce to Toronto Boroughs
TorontoNeighbor = neighborhoodll[neighborhoodll['Borough'].str.contains("Toronto")]

In [10]:
TorontoNeighbor

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259


In [11]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [12]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [13]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, Neighborhood in zip(TorontoNeighbor['Latitude'], TorontoNeighbor['Longitude'], TorontoNeighbor['Borough'], TorontoNeighbor['Neighborhood']):
    label = '{}, {}'.format(Neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [14]:
# @hidden_cell
CLIENT_ID = 'SGADV34BLNPZS3ZH3O44H0IV4PVJHSVDF1BL2Z3BTXWNC0KW'
CLIENT_SECRET = 'NMNUNFLEOZJUH4RQU3WUZSF0RPAVYOVS3KPKOOQ3EGYO0D1O'
VERSION = '20200402'

In [15]:
# defining radius and limit of venues to get
radius=500
LIMIT=100

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
toronto_venues = getNearbyVenues(names=TorontoNeighbor['Neighborhood'],
                                   latitudes=TorontoNeighbor['Latitude'],
                                   longitudes=TorontoNeighbor['Longitude']
                                  )

Regent Park , Harbourfront
Queen's Park , Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond , Adelaide , King
Dufferin , Dovercourt Village
Harbourfront East , Union Station , Toronto Islands
Little Portugal , Trinity
The Danforth West , Riverdale
Toronto Dominion Centre , Design Exchange
Brockton , Parkdale Village , Exhibition Place
India Bazaar , The Beaches West
Commerce Court , Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park , The Junction South
North Toronto West
The Annex , North Midtown , Yorkville
Parkdale , Roncesvalles
Davisville
University of Toronto , Harbord
Runnymede , Swansea
Moore Park , Summerhill East
Kensington Market , Chinatown , Grange Park
Summerhill West , Rathnelly , South Hill , Forest Hill SE , Deer Park
CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst  Quay , South Niagara , Island airport
Rosed

In [18]:
# one hot encoding for analysis
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols=list(toronto_onehot.columns.values)
cols.pop(cols.index('Neighborhood'))
toronto_onehot=toronto_onehot[['Neighborhood']+cols]

In [19]:
toronto_grouped = toronto_onehot.groupby('Neighborhood')
toronto_grouped 

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fbab493f978>

In [20]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
45,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
46,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
47,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
48,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
49,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
toronto_grouped = toronto_grouped['Neighborhood', 'Grocery Store', 'Supermarket', 'Health Food Store', 'Farmers Market', 'Fruit & Vegetable Store', 'Organic Grocery']

In [22]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Grocery Store,Supermarket,Health Food Store,Farmers Market,Fruit & Vegetable Store,Organic Grocery
0,"Regent Park , Harbourfront",0,0,0,0,0,0
1,"Regent Park , Harbourfront",0,0,0,0,0,0
2,"Regent Park , Harbourfront",0,0,0,0,0,0
3,"Regent Park , Harbourfront",0,0,0,0,0,0
4,"Regent Park , Harbourfront",0,0,0,0,0,0
45,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0
46,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0
47,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0
48,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0
49,"Queen's Park , Ontario Provincial Government",0,0,0,0,0,0


In [45]:
import numpy as np
tg = toronto_grouped.apply(np.sum, axis=0)


In [46]:
tg.head()

Unnamed: 0_level_0,Neighborhood,Grocery Store,Supermarket,Health Food Store,Farmers Market,Fruit & Vegetable Store,Organic Grocery
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Berczy Park,Berczy ParkBerczy ParkBerczy ParkBerczy ParkBe...,0,0,0,2,0,0
"Brockton , Parkdale Village , Exhibition Place","Brockton , Parkdale Village , Exhibition Place...",1,0,0,0,0,0
Business reply mail Processing CentrE,Business reply mail Processing CentrEBusiness ...,0,0,0,1,0,0
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport","CN Tower , King and Spadina , Railway Lands , ...",0,0,0,0,0,0
Central Bay Street,Central Bay StreetCentral Bay StreetCentral Ba...,0,0,0,0,0,0


In [47]:
tg['sum'] = tg.sum(axis=1)
tg.head()

Unnamed: 0_level_0,Neighborhood,Grocery Store,Supermarket,Health Food Store,Farmers Market,Fruit & Vegetable Store,Organic Grocery,sum
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Berczy Park,Berczy ParkBerczy ParkBerczy ParkBerczy ParkBe...,0,0,0,2,0,0,2
"Brockton , Parkdale Village , Exhibition Place","Brockton , Parkdale Village , Exhibition Place...",1,0,0,0,0,0,1
Business reply mail Processing CentrE,Business reply mail Processing CentrEBusiness ...,0,0,0,1,0,0,1
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport","CN Tower , King and Spadina , Railway Lands , ...",0,0,0,0,0,0,0
Central Bay Street,Central Bay StreetCentral Bay StreetCentral Ba...,0,0,0,0,0,0,0


In [48]:
d = {
    'Neighborhood': [],
    'Zero': [],
    'One': [],
    'MoreThanOne': []
}
for index, row in tg.iterrows():
    d['Neighborhood'].append(index)
    store_sum = row['sum']
    if store_sum == 0:
        d['Zero'].append(1)
        d['One'].append(0)
        d['MoreThanOne'].append(0)
    elif store_sum == 1:
        d['Zero'].append(0)
        d['One'].append(1)
        d['MoreThanOne'].append(0)
    else:
        d['Zero'].append(0)
        d['One'].append(0)
        d['MoreThanOne'].append(1)

new_df = pd.DataFrame(d)
new_df.head()
        

Unnamed: 0,Neighborhood,Zero,One,MoreThanOne
0,Berczy Park,0,0,1
1,"Brockton , Parkdale Village , Exhibition Place",0,1,0
2,Business reply mail Processing CentrE,0,1,0
3,"CN Tower , King and Spadina , Railway Lands , ...",1,0,0
4,Central Bay Street,1,0,0


In [53]:
new_df.describe()

Unnamed: 0,Zero,One,MoreThanOne
count,39.0,39.0,39.0
mean,0.564103,0.205128,0.230769
std,0.502356,0.409074,0.426833
min,0.0,0.0,0.0
25%,0.0,0.0,0.0
50%,1.0,0.0,0.0
75%,1.0,0.0,0.0
max,1.0,1.0,1.0


In [52]:
map_grocery = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, Neighborhood, zero, one, morethanone in zip(TorontoNeighbor['Latitude'], TorontoNeighbor['Longitude'], TorontoNeighbor['Borough'], TorontoNeighbor['Neighborhood'], new_df['Zero'], new_df['One'], new_df['MoreThanOne']):
    label = '{}, {}'.format(Neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    if zero == 1:
        color = 'red'
        color_fill = '#eb3434'
    elif one == 1:
        color = 'yellow'
        color_fill = '#ebdf34'
    else:
        color = 'green'
        color_fill = '#4ceb34'
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color= color,
        fill=True,
        fill_color= color_fill,
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Results and Discussion

Results can be seen throughout the commented cells above. Generally, we see that, actually, a majority of neighborhoods (56%) do not have any food distributors. Only about 20.5% have one store, while 23% have more than one option for food distributor. There is a clear disparity to access to food here. Especially for poorer communities that have difficulty accessing transportation, not having readily available options for proper nutrition can hurt the health of those living there.

From a visual viewpoint, the problem is more prominent as one moves farther from the city center. This is important for two reasons. Firstly, city centers are typicaly the more expensive locations to live. Therefore, there are food distributors for the already wealthy. Secondly, the transportation issue is exaggerated as communities and neighborhoods become less dense as they more away from the city center. This means that any efforts to mitigate the problem of food deserts should take place in the periphery neighborhoods of Toronto.

Now that we know where to target any improvements, we can make a specific recommendation. Firstly, improving public transport can help citizens reach existing distributors. According to an article by Sanjana Varghese, Toronto's public transport system is lacking both in volume and expansion. This is to say, the transport system doesn't reach the periphery neighborhoods, and even if they could, they can't handle the increased load. This recommendation was chosen as it should be the most politcally popular option. Public transport benefits all and has proven economic benefits. This is a "two birds, one stone" kind of situation.



## Conclusion

Though food deserts disproportionately affect the poor, solutions to them can be politically feasible as they will have knock-on effects for all citizens of society. For a country like Canada with a more socialized healthcare system, improved nutrition should reduce the load required by the healthcare system to bare. This paper has showed where the problem exists, as well as the best way of solving such a problem. Luckily, the existing code can be easily modified by stakeholders to expand the area of research. This is only a preliminary report, and yet, we can already see areas for improvement in preventing or mitigating the problem of food deserts

## References

Dyson, T. (2018, December 11). Relationship Between Low Income and Obesity is Relatively New. Retrieved April 2, 2020, from https://news.utk.edu/2018/12/11/relationship-between-low-income-and-obesity-is-relatively-new/

Martin Prosperity Institute. “Food Deserts and Priority Neighbourhoods in Toronto” Martin Prosperity Institute Insights. Toronto, ON: Rotman School of Management, University of Toronto, June 15, 2010.

Varghese, S. (2018, January 12). The cold hard truth about Toronto's transport network. Retrieved April 4, 2020, from https://www.citymetric.com/transport/cold-hard-truth-about-torontos-transport-network-3597
