# Toronto's Neighborhoods Recommender System
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.wallpaperup.com%2Fuploads%2Fwallpapers%2F2013%2F12%2F19%2F199807%2F4d86b2357c55ff2bc433fc0af0705b97.jpg&f=1&nofb=1/toronto.jpeg%E2%80%9D" alt="toronto" align="left" width="600" />

## Table of Contents
1. **[Introduction](#introduction)**
2. **[Data](#data)**  
     A. **[Factors to consider while deciding where to settle](#factors)**  
     B. **[Description of data and data source](#source)**  
     C. **[Import data and data wrangling](#clean)**  
     D. **[Complete dataframes for Toronto's neighborhoods recommender system](#complete)**
3. **[Methodology](#methodology)**
4. **[Results](#results)**
5. **[Discussion](#discussion)**
6. **[Conclusion](#conclusion)**

## 1. Introduction <a name="introduction"></a>
According to __[CIC News](https://www.cicnews.com/2020/02/which-cities-in-canada-attract-the-most-immigrants-0213741.html#)__, Canada welcomed more than 341,000 immigrants in 2019 and Toronto has successfully attracted nearly 118,000 immigrants which contribute to almost 35% of the total number of immigrants. **The statistics indicate that most of the immigrants prefer to settle in Toronto over other cities.** Why? __[VisaPlace](https://www.visaplace.com/blog-immigration-law/why-immigrants-settle-in-toronto-heres-10-reasons/)__ has listed out 10 reasons for this question. For me, the most convincing reason is Toronto is Canada’s business and financial capital, that's why immigrants prefer it.

Toronto is Canada’s largest city, it has 6 boroughs which are Etobicoke, North York, East York, Central Toronto, York and Scarborough. These 6 boroughs can be further divided into 140 neighborhoods. According to __[City of Toronto](https://www.toronto.ca/community-people/moving-to-toronto/about-toronto/)__, Toronto is one of the most multicultural cities in the world due to its large population of immigrants all over the world, each Toronto's neighborhood might be quite different from one another. **Therefore, out of 140 neighborhoods in Toronto, how can immigrants decide which neighborhood suits them best?** This is exactly what I want to resolve in this project.

**In this project, I will try to build a Toronto's neighborhoods recommender system based on 4 factors including job opportunities, cost of living, safety and culture.** So, who would be interested in this recommender system? I can say that at least 118,000 people would and I believe that this number will be growing in the future. And of course, I can't wait to find out which neighborhood suit me best too because I wish to migrate to Canada and settle in Toronto in the future. How about you?

## 2. Data<a name="data"></a> 
Previously, I mentioned that the Toronto's neighborhoods recommender system is built on job opportunities, cost of living, safety and culture. In this section, I will explain why these factors are important, describe the data that will be used and their source, import and clean the data, and finally show the complete dataframes that will be used to create the Toronto's neighborhoods recommender system.

### A. Factors to consider while deciding where to settle<a name="factors"></a>
* **Job opportunities**: We have to make a living to support ourselves or our family. And I bet we wish to get our dream job right? So, we need to know what are the common jobs for each neighborhood.
* **Cost of living**: We would like to buy our dream house but how much does it cost? Curious of how much should we earn to afford to live in a specific neighborhood? To answer these questions, we need to know the average house price and household income for each neighborhood.
* **Safety**: We wish to live in a safe and peaceful area but how can we determine if the area is safe? To answer these questions, we need to know the crime rate for each neighborhood.
* **Culture**: We will talk and eat everyday. If possible, we would like to communicate in our favorite language and eat our favorite food right? And it's even better if our favorite things are just around us. So, it's important to know what are the language spoken most often at home and what are the popular food in each neighborhood.

### B. Description of data and data source<a name="source"></a>
|No.| Data           | Data Description  |   Data Source   | 
|:-------------| :------------- | :---------- | :----------- |
|I. | Common jobs| These data show the common jobs for each neighborhood. The data categorize jobs according to North American Industry Classification System (NAICS) 2012. For example: 54-Professional, scientific and technical services, 23-Construction, etc. | I extracted the data from the __[2016 Toronto Neighborhood Profiles](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv)__. City of Toronto uses the 2016 Canadian Census to provide a portrait of the demographic, social and economic characteristics of the people and households in each Toronto's neighbourhood. |
|II. | Average house price and household income | These data show the average house price and household income for each neighborhood in Canadian Dollar (CAD). The composite Housing Affordability Index (HAI) for each neighborhood is also calculated.| I scraped the data current as of October 2020 from __[Realosophy](https://www.realosophy.com/toronto/neighbourhood-map)__. Realosophy is a real estate brokerage company that helps their customers make better decision based on data. |
|III. |Crime rate| These data show the crime rate for each neighborhood. The types of crimes included in the data are assault, auto theft, homicide, theft over, break and enter, and robbery. | I get the data from the __[Toronto Neighborhood Crime Rates Boundary File](https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-?geometry=-79.598%2C43.673%2C-79.158%2C43.760&orderBy=OBJECTID&page=6)__ by calling a REST API from Toronto Police Service. The file contains the 2014-2019 crime data by neighbourhood. |
|IV. |Language spoken most often at home|  These data show the language spoken most often at home in each neighborhood. For example: English, Spanish, Italian, French, etc.  | I extracted the data from the __[2016 Toronto Neighborhood Profiles](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv)__. |
|V. |Boundaries of neighborhoods| These data contain the boundary, latitude and longitude coordinate of each neighborhood in geojson file. We will use these data to create the boundary of each neighborhood on a map. The latitude and longitude coordinates of each neighborhood are needed to get the popular food data. | I get the data from __[Boundaries of Toronto's Neighbourhoods](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=geojson&projection=4326)__. City of Toronto made the data available on its open data portal. |
|VI. |Popular food| These data show the popular food categories around each neighborhood according to Foursquare API. For example: Italian restaurant, Korean restaurant, Japanese restaurant, etc. | I get the data through __[Foursqure API](https://developer.foursquare.com/docs/)__. Foursquare is a location technology platform dedicated to improve how people move through the real world. |

### C. Import data and data wrangling<a name="clean"></a>

#### I. Common jobs data
Now, let's import and clean the common jobs data first.

In [1]:
# import necessary library
import pandas as pd
pd.set_option('display.max_rows',None)
pd.set_option('display.max_columns',None)

# import the 2016 toronto neighborhood profiles into toronto_df and clean the dataframe
toronto_df=pd.read_csv('https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv')
toronto_df.drop(['_id','Category','Data Source','City of Toronto'],axis=1,inplace=True)

# extract common jobs data from toronto_df into jobs_df and clean the dataframe
topic=['Industry - North American Industry Classification System (NAICS) 2012']
index=['Neighbourhood Number']
jobs_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
jobs_df=jobs_df.drop('Topic',axis=1).set_index('Characteristic').T
jobs_df.columns=jobs_df.columns.str.strip()
jobs_df=jobs_df.drop(jobs_df.columns[1:4],axis=1).replace(',','',regex=True).astype(int)
jobs_df=jobs_df.sort_values(index).rename_axis(None,axis=1).reset_index()
jobs_df.rename(columns={'index':'Neighborhood','Neighbourhood Number':'ID'},inplace=True)
print('This dataframe consists of {} jobs!'.format(jobs_df.shape[1]-2))
jobs_df.head()

This dataframe consists of 20 jobs!


Unnamed: 0,Neighborhood,ID,"11 Agriculture, forestry, fishing and hunting","21 Mining, quarrying, and oil and gas extraction",22 Utilities,23 Construction,31-33 Manufacturing,41 Wholesale trade,44-45 Retail trade,48-49 Transportation and warehousing,51 Information and cultural industries,52 Finance and insurance,53 Real estate and rental and leasing,"54 Professional, scientific and technical services",55 Management of companies and enterprises,"56 Administrative and support, waste management and remediation services",61 Educational services,62 Health care and social assistance,"71 Arts, entertainment and recreation",72 Accommodation and food services,81 Other services (except public administration),91 Public administration
0,West Humber-Clairville,1,20,20,45,1025,2835,675,2020,1695,400,755,295,965,35,1285,875,1665,335,1195,710,460
1,Mount Olive-Silverstone-Jamestown,2,40,10,15,940,2700,610,1580,1240,310,505,185,560,30,1105,545,1360,215,1155,585,295
2,Thistletown-Beaumond Heights,3,0,10,20,455,690,215,640,465,120,200,75,275,0,300,245,410,70,325,180,145
3,Rexdale-Kipling,4,25,0,20,430,610,230,655,450,95,220,105,300,0,470,335,535,90,290,210,210
4,Elms-Old Rexdale,5,0,0,0,300,510,205,540,420,110,200,90,240,20,340,260,500,50,280,190,145


Looks great! Let's get the top 5 common jobs for each neighborhood.

In [2]:
# define a function to return a dataframe of top 5 elements
def get_top5_elements(dataframe,column_name):
    first_element=[]
    second_element=[]
    third_element=[]
    fourth_element=[]
    fifth_element=[]
    first_column=dataframe.iloc[:,0].values
    second_column=dataframe.iloc[:,1].values
    for i in range(140):
        sorted_elements=dataframe.iloc[i,2:].sort_values(ascending=False).index
        first_element.append(sorted_elements[0])
        second_element.append(sorted_elements[1])
        third_element.append(sorted_elements[2])
        fourth_element.append(sorted_elements[3])
        fifth_element.append(sorted_elements[4])
    return pd.DataFrame({'Neighborhood':first_column,'ID':second_column,
                         '1st Most Common {}'.format(column_name):first_element,
                         '2nd Most Common {}'.format(column_name):second_element,
                         '3rd Most Common {}'.format(column_name):third_element,
                         '4th Most Common {}'.format(column_name):fourth_element,
                         '5th Most Common {}'.format(column_name):fifth_element})

# get top 5 common jobs and save the data into top5_jobs_df
top5_jobs_df=get_top5_elements(jobs_df,'Job')
top5_jobs_df.head()

Unnamed: 0,Neighborhood,ID,1st Most Common Job,2nd Most Common Job,3rd Most Common Job,4th Most Common Job,5th Most Common Job
0,West Humber-Clairville,1,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,62 Health care and social assistance,"56 Administrative and support, waste managemen..."
1,Mount Olive-Silverstone-Jamestown,2,31-33 Manufacturing,44-45 Retail trade,62 Health care and social assistance,48-49 Transportation and warehousing,72 Accommodation and food services
2,Thistletown-Beaumond Heights,3,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,23 Construction,62 Health care and social assistance
3,Rexdale-Kipling,4,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,"56 Administrative and support, waste managemen...",48-49 Transportation and warehousing
4,Elms-Old Rexdale,5,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,48-49 Transportation and warehousing,"56 Administrative and support, waste managemen..."


Cool! Let's sum up all the jobs for each neighborhood and normalize the common jobs data.

In [3]:
# sum up all the jobs for each neighborhood
jobs_df['Any Job']=jobs_df.iloc[:,2:].sum(axis=1)

# def a function to normalize data
def data_normalization(dataframe):
    temp_df=dataframe
    columns_to_normalize=dataframe.columns[2:]
    for column in columns_to_normalize:
        temp_df[column]=(temp_df[column]/temp_df[column].max()).round(3)
    return temp_df

# normalize the common jobs data
jobs_df=data_normalization(jobs_df)
jobs_df.head()

Unnamed: 0,Neighborhood,ID,"11 Agriculture, forestry, fishing and hunting","21 Mining, quarrying, and oil and gas extraction",22 Utilities,23 Construction,31-33 Manufacturing,41 Wholesale trade,44-45 Retail trade,48-49 Transportation and warehousing,51 Information and cultural industries,52 Finance and insurance,53 Real estate and rental and leasing,"54 Professional, scientific and technical services",55 Management of companies and enterprises,"56 Administrative and support, waste management and remediation services",61 Educational services,62 Health care and social assistance,"71 Arts, entertainment and recreation",72 Accommodation and food services,81 Other services (except public administration),91 Public administration,Any Job
0,West Humber-Clairville,1,0.333,0.108,0.18,0.492,1.0,0.486,0.68,1.0,0.125,0.078,0.182,0.079,0.14,0.574,0.325,0.492,0.228,0.387,0.559,0.249,0.343
1,Mount Olive-Silverstone-Jamestown,2,0.667,0.054,0.06,0.451,0.952,0.439,0.532,0.732,0.097,0.052,0.114,0.046,0.12,0.493,0.203,0.402,0.146,0.374,0.461,0.159,0.277
2,Thistletown-Beaumond Heights,3,0.0,0.054,0.08,0.218,0.243,0.155,0.215,0.274,0.038,0.021,0.046,0.022,0.0,0.134,0.091,0.121,0.048,0.105,0.142,0.078,0.096
3,Rexdale-Kipling,4,0.417,0.0,0.08,0.206,0.215,0.165,0.221,0.265,0.03,0.023,0.065,0.024,0.0,0.21,0.125,0.158,0.061,0.094,0.165,0.114,0.105
4,Elms-Old Rexdale,5,0.0,0.0,0.0,0.144,0.18,0.147,0.182,0.248,0.034,0.021,0.055,0.02,0.08,0.152,0.097,0.148,0.034,0.091,0.15,0.078,0.087


Awesome!

#### II. Average house price and household income data
Before scraping the average house price and household income for each neighborhood, we need to get the website link for each neighborhood first. So, let's scrape these information from Realosophy now.

In [4]:
# import necessary libraries
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
from bs4 import BeautifulSoup
firefox_options=Options()
firefox_options.add_argument('-headless')

# define a function to scrape each neighborhood's name and website by borough
def get_neighborhood_websites(borough):
    driver=webdriver.Firefox(options=firefox_options)
    driver.get('https://www.realosophy.com/{}-former-toronto/neighbourhood-map'.format(borough))
    time.sleep(5)
    html=driver.page_source
    soup=BeautifulSoup(html,'lxml')
    all_data=soup.find('div',{'class':'row mt-4'})
    neighborhood_data=all_data.find_all('a')
    neighborhood=[]
    website=[]
    for data in neighborhood_data:
        neighborhood.append(data.text)
        website_temp=data['href'].replace('/','https://www.realosophy.com/',1)
        website.append(website_temp)
    driver.quit()
    print('...',end='')
    return pd.DataFrame({'Neighborhood':neighborhood,'Website':website})

# scrape etobicoke's neighborhoods and websites into etobicoke_df then insert neighborhoods id
print('Almost...',end='')
etobicoke_df=get_neighborhood_websites('etobicoke')
etobicoke_df.drop(25,inplace=True)
etobicoke_df['ID']=[20,11,1,14,13,17,8,9,14,6,19,12,17,18,10,14,4,7,2,16,5,15,16,3,11]

# scrape north york's neighborhoods and websites into northyork_df then insert neighborhoods id
northyork_df=get_neighborhood_websites('north-york')
northyork_df.loc[len(northyork_df.index)]=northyork_df.loc[31,:]
northyork_df.loc[len(northyork_df.index)]=northyork_df.loc[39,:]
northyork_df['ID']=[38,42,34,52,49,43,24,41,30,39,33,39,42,47,45,26,44,31,25,45,53,48,41,21,23,22,38,32,41,39,29,
                    36,45,23,46,28,43,41,35,37,40,27,31,50,51]

# scrape east york's neighborhoods and websites into eastyork_df then insert neighborhoods id
eastyork_df=get_neighborhood_websites('east-york')
eastyork_df['ID']=[56,57,59,61,56,58,54,55,54,54,60]

# scrape central toronto's neighborhoods and websites into centraltoronto_df then insert neighborhoods id
centraltoronto_df=get_neighborhood_websites('central-toronto')
centraltoronto_df=centraltoronto_df.drop([16,47,69,74]).reset_index(drop=True)
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[39,:]
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[35,:]
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[20,:]
centraltoronto_df['ID']=[78,103,76,84,105,80,89,83,71,91,96,100,78,93,75,77,73,66,93,99,97,77,93,83,92,77,77,76,
                         101,102,82,77,77,90,87,94,74,78,70,82,81,103,98,73,103,67,96,92,72,68,70,86,98,95,79,77,
                         96,85,73,74,77,97,87,105,95,63,67,90,69,81,82,62,93,77,64,75,95,65,88,104]

# scrape york's neighborhoods and websites into york_df then insert neighborhoods id
york_df=get_neighborhood_websites('york')
york_df=york_df.drop(11).reset_index(drop=True)
york_df['ID']=[114,112,108,109,106,110,114,115,107,114,111,113]

# scrape scarborough's neighborhoods and websites into scarborough_df then insert neighborhoods id
scarborough_df=get_neighborhood_websites('scarborough')
scarborough_df.loc[len(scarborough_df.index)]=scarborough_df.loc[0,:]
scarborough_df['ID']=[128,127,122,120,120,123,122,126,138,140,134,125,124,117,132,119,130,135,135,121,133,131,139,
                      116,118,136,131,119,137,129]

# concatenate these six boroughs dataframe into website_df
website_df=pd.concat([etobicoke_df,northyork_df,eastyork_df,centraltoronto_df,york_df,scarborough_df])
website_df=website_df.sort_values('ID').reset_index(drop=True)
print('...Done!',end='')
website_df.head()

Almost........................Done!

Unnamed: 0,Neighborhood,Website,ID
0,Clairville,https://www.realosophy.com/clairville-toronto/...,1
1,Smithfield,https://www.realosophy.com/smithfield-toronto/...,2
2,Thistletown,https://www.realosophy.com/thistletown-toronto...,3
3,Rexdale,https://www.realosophy.com/rexdale-toronto/nei...,4
4,The Elms,https://www.realosophy.com/the-elms-toronto/ne...,5


Excellent! Now, let's scrape the average house price and household income data from Realosophy and clean the data.

In [5]:
# import necessary library
import numpy as np

# define a function to get the avg house price or household income in numerical result
def get_number_only(data):
    result=data.text.strip().replace('$','').replace(',','')
    if 'M' in result:
        result=int((float(result.replace('M','')))*1000000)
        return result
    else:
        result=int((float(result.replace('K','')))*1000)
        return result

# scrape average house price and household income for each neighborhood into website_df
print('Progress:')
driver=webdriver.Firefox(options=firefox_options)
website_df['Avg House Price']=0
website_df['Avg Household Income']=0
for i in range(len(website_df['ID'])):
    driver.get(website_df['Website'][i])
    time.sleep(5)
    html=driver.page_source
    soup=BeautifulSoup(html,'lxml')
    houseprice_data=soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})
    income_class='h3 font-sans-caption-bold mb-0 text-center text-sm-left ng-binding ng-scope'
    income_data=soup.find('p',{'class':income_class})
    while houseprice_data==None or income_data==None:
        driver.get(website_df['Website'][i])
        time.sleep(5)
        html=driver.page_source
        soup=BeautifulSoup(html,'lxml')
        houseprice_data=soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})
        income_data=soup.find('p',{'class':income_class})
    website_df.iloc[i,3]=get_number_only(houseprice_data)
    website_df.iloc[i,4]=get_number_only(income_data)
    print('.',end='')
driver.quit()

# group website_df by neighborhood id and save the data into houseprice_df
website_df.drop('Neighborhood',axis=1,inplace=True)
website_df=website_df.groupby('ID').mean().reset_index()
website_df['Avg House Price']=website_df['Avg House Price'].astype(int)
website_df['Avg Household Income']=website_df['Avg Household Income'].astype(int)
houseprice_df=jobs_df.iloc[:,0:2]
houseprice_df['Avg House Price']=website_df['Avg House Price']
houseprice_df['Avg Household Income']=website_df['Avg Household Income']
print('...Done!')
houseprice_df.head()

Progress:
..............................................................................................................................................................................................................Done!


Unnamed: 0,Neighborhood,ID,Avg House Price,Avg Household Income
0,West Humber-Clairville,1,587000,94000
1,Mount Olive-Silverstone-Jamestown,2,578000,79000
2,Thistletown-Beaumond Heights,3,898000,94000
3,Rexdale-Kipling,4,744000,91000
4,Elms-Old Rexdale,5,600000,82000


Superb! Now, let's calculate the composite __[Housing Affordability Index (HAI)](https://www.frbsf.org/education/publications/doctor-econ/2003/december/housing-affordability-index/)__ for each neighborhood by using this formula:
___
$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Housing\ Affordability\ Index\ (HAI)\ =\ \frac{median\ household\ income}{qualifying\ income}\ x\ 100$  

Assumption:  
a. The average house price and household income are equal to the median house price and household income.  
b. The __[effective mortgage rates](https://www.ratehub.ca/best-mortgage-rates/1-year/fixed?scenario=purchase&home_price=1000000&down_payment_percent=0.2&downPayment=200000&approximateMortgageAmount=800000&amount=800000&amortization=25&live_in_property=true&pre_approval=false)__ is 2%.  
c. Home buyers make a 20% __[down payment](https://www.canada.ca/en/financial-consumer-agency/services/mortgages/down-payment.html)__.  
d. The maximum monthly mortgage payment is 25% of gross monthly income.
___
If a neighborhood's Housing Affordability Index (HAI) is higher than 100, it means that most of the residents are able to afford a house in the neighborhood, the greater the HAI, the higher the housing affordability. However, if a neighborhood's HAI is lower than 100, it means that most of the residents are unable to afford a house in the neighborhood, the lower the HAI, the lower the housing affordability.

In [6]:
# calculate the housing affordability index for each neighborhood
mortgage_payment=houseprice_df['Avg House Price']*0.8*(0.02/12)/(1-(1/(1+0.02/12)**360))
qualifying_income=mortgage_payment*4*12
houseprice_df['HAI']=((houseprice_df['Avg Household Income']/qualifying_income)*100).round(3)
houseprice_df.head()

Unnamed: 0,Neighborhood,ID,Avg House Price,Avg Household Income,HAI
0,West Humber-Clairville,1,587000,94000,112.825
1,Mount Olive-Silverstone-Jamestown,2,578000,79000,96.297
2,Thistletown-Beaumond Heights,3,898000,94000,73.751
3,Rexdale-Kipling,4,744000,91000,86.175
4,Elms-Old Rexdale,5,600000,82000,96.289


Great! Now, categorize these data into three bins namely 'High', 'Medium', and 'Low'.

In [7]:
# define a function to categorize data into three bins namely high, medium, low
def data_binning(dataframe):
    for column in dataframe.columns[2:]:
        Q1=dataframe[column].quantile(0.25)
        Q3=dataframe[column].quantile(0.75)
        IQR=Q3-Q1
        dataframe_min=Q1-1.5*IQR
        dataframe_max=Q3+1.5*IQR
        temp_bins=dataframe[column][~((dataframe[column]<dataframe_min)|(dataframe[column]>dataframe_max))]
        bins=np.linspace(min(temp_bins),max(temp_bins),4)
        bins=bins.astype(int)
        dataframe['{} Categories'.format(column)]='Dunno'
        for i in range(140):
            if dataframe[column][i]<bins[1]:
                dataframe.iloc[i,-1]='Low (<{:,d})'.format(bins[1])
            elif dataframe[column][i]>bins[2]:
                dataframe.iloc[i,-1]='High (>{:,d})'.format(bins[2])
            else:
                dataframe.iloc[i,-1]='Medium ({:,d}-{:,d})'.format(bins[1],bins[2])
    return dataframe

# categorize the house price, household income and housing affordability index
houseprice_df=data_binning(houseprice_df)
houseprice_df.head()

Unnamed: 0,Neighborhood,ID,Avg House Price,Avg Household Income,HAI,Avg House Price Categories,Avg Household Income Categories,HAI Categories
0,West Humber-Clairville,1,587000,94000,112.825,"Low (<964,000)","Low (<106,666)",High (>100)
1,Mount Olive-Silverstone-Jamestown,2,578000,79000,96.297,"Low (<964,000)","Low (<106,666)",Medium (76-100)
2,Thistletown-Beaumond Heights,3,898000,94000,73.751,"Low (<964,000)","Low (<106,666)",Low (<76)
3,Rexdale-Kipling,4,744000,91000,86.175,"Low (<964,000)","Low (<106,666)",Medium (76-100)
4,Elms-Old Rexdale,5,600000,82000,96.289,"Low (<964,000)","Low (<106,666)",Medium (76-100)


Awesome!

#### III. Crime rate data
Now, let's get the crimes data by calling a REST API from Toronto Police Service and clean the data.

In [8]:
# import necessary libraries
import requests
from pandas.io.json import json_normalize

# request the crimes data by using toronto police service's api and clean the data
url='https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Neighbourhood_MCI/FeatureServer/0/query?where=1%3D1&outFields=Neighbourhood,Hood_ID,Population,Assault_AVG,AutoTheft_AVG,Homicide_AVG,TheftOver_AVG,BreakandEnter_AVG,Robbery_AVG&outSR=4326&f=json'
results=requests.get(url).json()
crime_data=results['features']
crime_df=json_normalize(crime_data)
crime_df.drop('geometry.rings',axis=1,inplace=True)
crime_df.columns=(pd.Series(crime_df.columns)).replace('attributes.','',regex=True)
crime_df.rename(columns={'Neighbourhood':'Neighborhood','Hood_ID':'ID'},inplace=True)
crime_df['ID']=crime_df['ID'].astype(int)
crime_df=crime_df.sort_values('ID').reset_index(drop=True)
crime_df['Neighborhood']=jobs_df['Neighborhood']
crime_df.head()

Unnamed: 0,Neighborhood,ID,Population,Assault_AVG,AutoTheft_AVG,Homicide_AVG,TheftOver_AVG,BreakandEnter_AVG,Robbery_AVG
0,West Humber-Clairville,1,33312,301.8,366.7,1.5,52.2,137.8,91.8
1,Mount Olive-Silverstone-Jamestown,2,32954,255.8,62.3,2.2,4.5,32.2,77.0
2,Thistletown-Beaumond Heights,3,10360,53.7,25.3,0.3,2.3,19.0,15.0
3,Rexdale-Kipling,4,10529,68.7,28.7,0.5,1.7,15.8,20.2
4,Elms-Old Rexdale,5,9456,54.5,18.8,0.3,1.5,10.5,13.5


Great! Now, let's calculate the __[crime rate](https://www.azcalculator.com/formula/crime-rate-calculator.php)__ for each neighborhood by using this formula:
___
$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ crime\ rate\ =\ \frac{number\ of\ crimes}{population}\ x\ 100,000\ people $
___

In [9]:
# calculate the crime rate for each neighborhood
for column in crime_df.columns[3:]:
    crime_df[column]=((crime_df[column]/crime_df['Population'])*100000).astype(int)
crime_df['All Crimes']=crime_df.iloc[:,3:].sum(axis=1).astype(int)
crime_df.columns=crime_df.columns.str.replace('_AVG','',regex=True)
crime_df.drop(crime_df.columns[2],axis=1,inplace=True)
crime_df.head()

Unnamed: 0,Neighborhood,ID,Assault,AutoTheft,Homicide,TheftOver,BreakandEnter,Robbery,All Crimes
0,West Humber-Clairville,1,905,1100,4,156,413,275,2853
1,Mount Olive-Silverstone-Jamestown,2,776,189,6,13,97,233,1314
2,Thistletown-Beaumond Heights,3,518,244,2,22,183,144,1113
3,Rexdale-Kipling,4,652,272,4,16,150,191,1285
4,Elms-Old Rexdale,5,576,198,3,15,111,142,1045


Cool! Now, let's categorize the crime rate data for each  neighborhood.

In [10]:
# categorize crime rate data
crime_df=data_binning(crime_df)
crime_df.head()

Unnamed: 0,Neighborhood,ID,Assault,AutoTheft,Homicide,TheftOver,BreakandEnter,Robbery,All Crimes,Assault Categories,AutoTheft Categories,Homicide Categories,TheftOver Categories,BreakandEnter Categories,Robbery Categories,All Crimes Categories
0,West Humber-Clairville,1,905,1100,4,156,413,275,2853,"Medium (588-1,027)",High (>184),Medium (2-4),High (>44),High (>353),High (>192),"High (>1,773)"
1,Mount Olive-Silverstone-Jamestown,2,776,189,6,13,97,233,1314,"Medium (588-1,027)",High (>184),High (>4),Low (<26),Low (<224),High (>192),"Medium (1,138-1,773)"
2,Thistletown-Beaumond Heights,3,518,244,2,22,183,144,1113,Low (<588),High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Low (<1,138)"
3,Rexdale-Kipling,4,652,272,4,16,150,191,1285,"Medium (588-1,027)",High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Medium (1,138-1,773)"
4,Elms-Old Rexdale,5,576,198,3,15,111,142,1045,Low (<588),High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Low (<1,138)"


Nice!

#### IV. Language spoken most often at home data
Now, let's import and clean the language spoken most often at home data first.

In [11]:
# extract language spoken most often at home data from toronto_df into language_df and clean the dataframe
topic=['Language spoken most often at home']
index=['Neighbourhood Number']
language_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
language_df=language_df.drop('Topic',axis=1).set_index('Characteristic').T.replace(',','',regex=True).astype(int)
language_df.sort_values(index,inplace=True)
language_df.columns=language_df.columns.str.strip()
condition=(language_df.sum()==0)
for i in range(condition.shape[0]):
    if(condition[i])|('response'in condition.index[i].lower())|('language'in condition.index[i].lower())\
    |('n.o.s'in condition.index[i].lower())|('number'in condition.index[i].lower())==True:
        language_df.drop(condition.index[i],axis=1,inplace=True)
language_df.drop('English and French',axis=1,inplace=True)
language_df.insert(loc=0,column='ID',value=jobs_df['ID'].values) 
language_df=language_df.rename_axis(None,axis=1).reset_index().rename(columns={'index':'Neighborhood'})
print('This dataframe consists of {} languages!'.format(language_df.shape[1]-2))
language_df.head()

This dataframe consists of 127 languages!


Unnamed: 0,Neighborhood,ID,English,French,Montagnais (Innu),Swampy Cree,Ojibway,Oji-Cree,Ottawa (Odawa),Dene,Sarsi (Sarcee),Mohawk,Dakota,Kabyle,Bilen,Oromo,Somali,Waray-Waray,Amharic,Arabic,Assyrian Neo-Aramaic,Chaldean Neo-Aramaic,Harari,Hebrew,Maltese,Tigrigna,Khmer (Cambodian),Vietnamese,Bikol,Cebuano,Fijian,Hiligaynon,Ilocano,Malagasy,Malay,"Pampangan (Kapampangan, Pampango)",Pangasinan,"Tagalog (Pilipino, Filipino)",Haitian Creole,Kannada,Malayalam,Tamil,Telugu,Albanian,Armenian,Latvian,Lithuanian,Belarusan,Bosnian,Bulgarian,Croatian,Czech,Macedonian,Polish,Russian,Serbian,Serbo-Croatian,Slovak,Slovene (Slovenian),Ukrainian,Scottish Gaelic,Welsh,Afrikaans,Danish,Dutch,German,Icelandic,Norwegian,Swedish,Vlaams (Flemish),Yiddish,Greek,Bengali,Gujarati,Hindi,Kashmiri,Konkani,Marathi,Nepali,Oriya (Odia),Punjabi (Panjabi),Sindhi,Sinhala (Sinhalese),Urdu,Kurdish,Pashto,Persian (Farsi),Catalan,Japanese,Italian,Portuguese,Romanian,Spanish,Georgian,Korean,Mongolian,Akan (Twi),Bamanankan,Edo,Ewe,"Fulah (Pular, Pulaar, Fulfulde)",Ga,Ganda,Igbo,Lingala,Rundi (Kirundi),Kinyarwanda (Rwanda),Shona,Swahili,Wolof,Yoruba,Dinka,Cantonese,Hakka,Mandarin,Min Dong,"Min Nan (Chaochow, Teochow, Fukien, Taiwanese)",Wu (Shanghainese),Burmese,Tibetan,Lao,Thai,Azerbaijani,Turkish,Uyghur,Uzbek,Estonian,Finnish,Hungarian
0,West Humber-Clairville,1,18345,95,0,0,0,0,0,0,0,0,0,0,0,0,155,10,20,105,105,5,0,0,15,30,15,170,0,5,0,0,20,0,0,10,5,450,0,5,95,420,55,60,5,0,5,0,0,0,40,5,5,75,35,20,0,5,15,15,0,0,0,0,0,15,0,0,0,0,0,60,105,1260,535,0,0,15,10,0,3435,10,60,415,5,15,120,0,15,320,70,5,675,0,55,5,155,0,20,0,0,20,5,25,0,5,0,0,20,0,35,0,170,5,165,0,15,0,15,5,5,5,0,65,0,0,0,0,55
1,Mount Olive-Silverstone-Jamestown,2,15340,105,0,0,0,0,0,0,0,0,0,0,0,0,415,0,45,1015,1755,255,0,0,0,20,10,190,0,5,5,5,35,0,10,0,0,215,15,5,60,935,20,15,30,0,0,0,0,5,15,0,5,55,30,5,5,0,5,15,0,0,5,0,0,10,0,0,0,0,0,5,170,1780,345,0,0,15,5,0,1965,5,15,625,10,80,220,0,5,350,95,15,540,0,45,0,380,0,40,5,0,5,0,45,10,0,0,5,40,5,55,0,115,10,40,0,0,0,10,5,25,15,0,45,0,5,0,0,25
2,Thistletown-Beaumond Heights,3,5615,50,0,0,0,0,0,0,0,0,0,0,0,0,60,0,5,110,230,40,0,0,0,5,5,75,0,0,0,0,0,0,0,0,0,90,0,0,15,105,15,5,5,0,0,0,0,0,25,0,0,50,10,0,0,0,5,10,0,0,0,0,0,10,0,0,5,0,0,30,50,295,100,0,0,5,5,0,535,0,5,175,0,10,70,0,0,275,50,15,375,0,15,0,65,0,5,0,0,5,0,5,0,0,0,0,5,0,10,0,40,0,25,0,5,0,0,5,0,0,0,15,0,0,0,0,25
3,Rexdale-Kipling,4,7050,65,0,0,0,0,0,0,0,0,0,0,0,25,60,0,10,55,70,0,0,0,5,10,0,95,0,10,0,5,0,0,5,10,0,105,0,0,5,60,0,20,0,0,0,0,5,5,130,0,0,90,25,5,0,5,15,15,0,0,0,0,0,20,0,0,0,0,0,40,55,70,55,0,0,0,10,0,75,0,15,140,0,15,55,0,0,145,50,25,510,5,15,0,30,0,0,0,0,0,0,5,0,0,0,0,10,0,5,0,35,0,25,0,0,0,0,5,5,0,0,20,0,0,0,0,30
4,Elms-Old Rexdale,5,6160,30,0,0,0,0,0,0,0,0,0,0,0,0,265,0,5,25,40,0,0,0,0,10,10,150,0,5,0,0,15,0,0,5,0,145,0,0,5,30,5,5,0,0,5,0,0,0,25,5,10,115,30,10,0,0,5,15,0,0,0,0,0,5,0,0,0,0,0,5,50,20,35,0,0,0,0,0,40,0,0,95,5,0,145,0,5,190,30,15,470,0,35,0,50,0,5,0,0,0,5,0,0,0,0,0,10,0,10,0,35,0,20,0,0,0,10,5,5,0,0,5,0,0,0,0,25


Cool! Let's get the top 5 languages for each neighborhood.

In [12]:
# get the top 5 languages for each neighborhood and save the data into top5_language_df
top5_language_df=get_top5_elements(language_df,'Language')
top5_language_df.head()

Unnamed: 0,Neighborhood,ID,1st Most Common Language,2nd Most Common Language,3rd Most Common Language,4th Most Common Language,5th Most Common Language
0,West Humber-Clairville,1,English,Punjabi (Panjabi),Gujarati,Spanish,Hindi
1,Mount Olive-Silverstone-Jamestown,2,English,Punjabi (Panjabi),Gujarati,Assyrian Neo-Aramaic,Arabic
2,Thistletown-Beaumond Heights,3,English,Punjabi (Panjabi),Spanish,Gujarati,Italian
3,Rexdale-Kipling,4,English,Spanish,Italian,Urdu,Croatian
4,Elms-Old Rexdale,5,English,Spanish,Somali,Italian,Vietnamese


Awesome! Now, let's normalize the language spoken most often at home data.

In [13]:
# normalize the language spoken most often at home data
language_df=data_normalization(language_df)
language_df.head()

Unnamed: 0,Neighborhood,ID,English,French,Montagnais (Innu),Swampy Cree,Ojibway,Oji-Cree,Ottawa (Odawa),Dene,Sarsi (Sarcee),Mohawk,Dakota,Kabyle,Bilen,Oromo,Somali,Waray-Waray,Amharic,Arabic,Assyrian Neo-Aramaic,Chaldean Neo-Aramaic,Harari,Hebrew,Maltese,Tigrigna,Khmer (Cambodian),Vietnamese,Bikol,Cebuano,Fijian,Hiligaynon,Ilocano,Malagasy,Malay,"Pampangan (Kapampangan, Pampango)",Pangasinan,"Tagalog (Pilipino, Filipino)",Haitian Creole,Kannada,Malayalam,Tamil,Telugu,Albanian,Armenian,Latvian,Lithuanian,Belarusan,Bosnian,Bulgarian,Croatian,Czech,Macedonian,Polish,Russian,Serbian,Serbo-Croatian,Slovak,Slovene (Slovenian),Ukrainian,Scottish Gaelic,Welsh,Afrikaans,Danish,Dutch,German,Icelandic,Norwegian,Swedish,Vlaams (Flemish),Yiddish,Greek,Bengali,Gujarati,Hindi,Kashmiri,Konkani,Marathi,Nepali,Oriya (Odia),Punjabi (Panjabi),Sindhi,Sinhala (Sinhalese),Urdu,Kurdish,Pashto,Persian (Farsi),Catalan,Japanese,Italian,Portuguese,Romanian,Spanish,Georgian,Korean,Mongolian,Akan (Twi),Bamanankan,Edo,Ewe,"Fulah (Pular, Pulaar, Fulfulde)",Ga,Ganda,Igbo,Lingala,Rundi (Kirundi),Kinyarwanda (Rwanda),Shona,Swahili,Wolof,Yoruba,Dinka,Cantonese,Hakka,Mandarin,Min Dong,"Min Nan (Chaochow, Teochow, Fukien, Taiwanese)",Wu (Shanghainese),Burmese,Tibetan,Lao,Thai,Azerbaijani,Turkish,Uyghur,Uzbek,Estonian,Finnish,Hungarian
0,West Humber-Clairville,1,0.345,0.123,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.134,0.5,0.111,0.103,0.06,0.02,0.0,0.0,0.273,0.286,0.067,0.077,0.0,0.077,0.0,0.0,0.087,0.0,0.0,1.0,1.0,0.211,0.0,0.25,0.333,0.069,0.262,0.25,0.004,0.0,0.05,0.0,0.0,0.0,0.276,0.1,0.034,0.103,0.008,0.028,0.0,0.008,0.167,0.015,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.077,0.05,0.345,1.0,0.0,0.0,0.273,0.025,0.0,1.0,0.069,0.545,0.115,0.059,0.018,0.025,0.0,0.045,0.139,0.02,0.011,0.31,0.0,0.016,0.25,0.408,0.0,0.5,0.0,0.0,1.0,0.333,0.556,0.0,0.5,0.0,0.0,0.235,0.0,0.636,0.0,0.02,0.024,0.025,0.0,0.079,0.0,0.5,0.004,0.077,0.25,0.0,0.138,0.0,0.0,0.0,0.0,0.22
1,Mount Olive-Silverstone-Jamestown,2,0.288,0.136,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359,0.0,0.25,1.0,1.0,1.0,0.0,0.0,0.0,0.19,0.044,0.086,0.0,0.077,1.0,0.167,0.152,0.0,0.286,0.0,0.0,0.101,1.0,0.25,0.211,0.155,0.095,0.062,0.022,0.0,0.0,0.0,0.0,0.038,0.103,0.0,0.034,0.076,0.007,0.007,0.111,0.0,0.056,0.015,0.0,0.0,0.5,0.0,0.0,0.133,0.0,0.0,0.0,0.0,0.0,0.006,0.08,0.488,0.645,0.0,0.0,0.273,0.013,0.0,0.572,0.034,0.136,0.173,0.118,0.098,0.046,0.0,0.015,0.153,0.027,0.032,0.248,0.0,0.013,0.0,1.0,0.0,1.0,1.0,0.0,0.25,0.0,1.0,1.0,0.0,0.0,0.333,0.471,0.5,1.0,0.0,0.013,0.049,0.006,0.0,0.0,0.0,0.333,0.004,0.385,0.75,0.0,0.096,0.0,0.091,0.0,0.0,0.1
2,Thistletown-Beaumond Heights,3,0.106,0.065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052,0.0,0.028,0.108,0.131,0.157,0.0,0.0,0.0,0.048,0.022,0.034,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042,0.0,0.0,0.053,0.017,0.071,0.021,0.004,0.0,0.0,0.0,0.0,0.0,0.172,0.0,0.0,0.069,0.002,0.0,0.0,0.0,0.056,0.01,0.0,0.0,0.0,0.0,0.0,0.133,0.0,0.0,0.25,0.0,0.0,0.039,0.024,0.081,0.187,0.0,0.0,0.091,0.013,0.0,0.156,0.0,0.045,0.048,0.0,0.012,0.015,0.0,0.0,0.12,0.014,0.032,0.172,0.0,0.004,0.0,0.171,0.0,0.125,0.0,0.0,0.25,0.0,0.111,0.0,0.0,0.0,0.0,0.059,0.0,0.182,0.0,0.005,0.0,0.004,0.0,0.026,0.0,0.0,0.004,0.0,0.0,0.0,0.032,0.0,0.0,0.0,0.0,0.1
3,Rexdale-Kipling,4,0.133,0.084,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.714,0.052,0.0,0.056,0.054,0.04,0.0,0.0,0.0,0.091,0.095,0.0,0.043,0.0,0.154,0.0,0.167,0.0,0.0,0.143,1.0,0.0,0.049,0.0,0.0,0.018,0.01,0.0,0.083,0.0,0.0,0.0,0.0,0.077,0.038,0.897,0.0,0.0,0.124,0.005,0.007,0.0,0.008,0.167,0.015,0.0,0.0,0.0,0.0,0.0,0.267,0.0,0.0,0.0,0.0,0.0,0.052,0.026,0.019,0.103,0.0,0.0,0.0,0.025,0.0,0.022,0.0,0.136,0.039,0.0,0.018,0.012,0.0,0.0,0.063,0.014,0.053,0.234,0.019,0.004,0.0,0.079,0.0,0.0,0.0,0.0,0.0,0.0,0.111,0.0,0.0,0.0,0.0,0.118,0.0,0.091,0.0,0.004,0.0,0.004,0.0,0.0,0.0,0.0,0.004,0.077,0.0,0.0,0.043,0.0,0.0,0.0,0.0,0.12
4,Elms-Old Rexdale,5,0.116,0.039,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.229,0.0,0.028,0.025,0.023,0.0,0.0,0.0,0.0,0.095,0.044,0.068,0.0,0.077,0.0,0.0,0.065,0.0,0.0,0.5,0.0,0.068,0.0,0.0,0.018,0.005,0.024,0.021,0.0,0.0,0.05,0.0,0.0,0.0,0.172,0.1,0.069,0.159,0.007,0.014,0.0,0.0,0.056,0.015,0.0,0.0,0.0,0.0,0.0,0.067,0.0,0.0,0.0,0.0,0.0,0.006,0.024,0.005,0.065,0.0,0.0,0.0,0.0,0.0,0.012,0.0,0.0,0.026,0.059,0.0,0.03,0.0,0.015,0.083,0.009,0.032,0.216,0.0,0.01,0.0,0.132,0.0,0.125,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.118,0.0,0.182,0.0,0.004,0.0,0.003,0.0,0.0,0.0,0.333,0.004,0.077,0.0,0.0,0.011,0.0,0.0,0.0,0.0,0.1


Great!

#### V. Boundaries of neighborhoods data
Now let's import and clean the boundary data first.

In [14]:
# import boundary data into boundary_df and clean the dataframe
toronto_geojson='https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=geojson&projection=4326'
results=requests.get(toronto_geojson).json()
boundary_data=results['features']
boundary_df=json_normalize(boundary_data)
boundary_df=pd.DataFrame({'Neighborhood':boundary_df['properties.AREA_NAME'],
                          'ID':boundary_df['properties.AREA_SHORT_CODE'],
                          'Latitude':boundary_df['properties.LATITUDE'],
                          'Longitude':boundary_df['properties.LONGITUDE']})
boundary_df=boundary_df.sort_values('ID').reset_index(drop=True)
boundary_df.iloc[:,0]=crime_df.iloc[:,0]
boundary_df.head()

Unnamed: 0,Neighborhood,ID,Latitude,Longitude
0,West Humber-Clairville,1,43.71618,-79.596356
1,Mount Olive-Silverstone-Jamestown,2,43.746868,-79.587259
2,Thistletown-Beaumond Heights,3,43.737988,-79.563491
3,Rexdale-Kipling,4,43.723725,-79.566228
4,Elms-Old Rexdale,5,43.721519,-79.548983


Nice! Now, let's visualize the boundary, latitude and longitude coordinates of each neighborhood on a map.

In [15]:
# import necessary library
import geocoder
import folium

# visualize the boundary data on toronto_map
g=geocoder.osm('Toronto,Ontario')
toronto_map=folium.Map(location=g.latlng,zoom_start=12,tiles='cartodbpositron')
folium.GeoJson(toronto_geojson).add_to(toronto_map)
for lat,lng,neighborhood,ID in zip(boundary_df['Latitude'],boundary_df['Longitude'],boundary_df['Neighborhood'],
                                   boundary_df['ID']):
    label=folium.Popup('{}. {}'.format(ID,neighborhood),parse_html=True)
    folium.CircleMarker([lat,lng],radius=5,popup=label,color='red',fill=True,fill_color='red',fill_opacity=0.7,
                        parse_html=False).add_to(toronto_map)  
toronto_map

Superb!

#### VI. Popular food data
Now, let's get the popular food data for each neighborhood by using Foursquare API.

In [24]:
# get the popular food data using foursquare api and save the data into food_df
print('Progress:')
food_df=pd.DataFrame(columns=['ID','Popular Food'])
for i in range(len(boundary_df['ID'])):
    url='https://api.foursquare.com/v2/venues/explore'
    params=dict(client_id='VCPPKYR4FRVFQDA22KSFIEHL0501YTNVH0KIEXLQT4VI4HHM',
                client_secret='XXHQWCHMWZFJ0SKUAGBNJIKK50X0TO2A0CAASVHA305H4RWA',
                v='20201117',
                ll='{},{}'.format(boundary_df['Latitude'][i],boundary_df['Longitude'][i]),
                categoryId='4d4b7105d754a06374d81259',
                radius=5000,
                time='any',
                day='any',
                limit=50)
    results=requests.get(url=url,params=params).json()['response']['groups'][0]['items']
    food_data=[]
    for j in range(len(results)):
        food_data.append(results[j]['venue']['categories'][0]['name'])
    temp_df=pd.DataFrame({'ID':boundary_df.iloc[i,1],'Popular Food':food_data})
    food_df=pd.concat([food_df,temp_df],ignore_index=True)
    print('.',end='')
print('...done!')
temp_df=pd.get_dummies(data=food_df['Popular Food'],prefix_sep="")
temp_df.insert(0,'ID',food_df['ID'])
food_df=temp_df.groupby('ID',as_index=False).sum()
food_df.insert(0,'Neighborhood',boundary_df['Neighborhood'])
food_df.drop('Food',axis=1,inplace=True)
print('This dataframe consists of {} food categories!'.format(food_df.shape[1]-2))
food_df.head()

Progress:
...............................................................................................................................................done!
This dataframe consists of 90 food categories!


Unnamed: 0,Neighborhood,ID,Afghan Restaurant,African Restaurant,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,Greek Restaurant,Hakka Restaurant,Hotpot Restaurant,Hungarian Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Noodle House,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant
0,West Humber-Clairville,1,1,0,2,2,0,0,1,0,0,1,3,0,0,0,0,3,0,4,0,0,0,0,1,0,2,0,0,0,0,0,0,0,2,0,1,0,0,0,2,0,1,0,0,0,0,6,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,0,0,4,0,5,0,0,0,0,0,0,0,3,1,0,0,0,1,0,0,0,0,0,0,0
1,Mount Olive-Silverstone-Jamestown,2,0,1,0,3,1,0,2,0,0,0,2,0,1,0,0,1,0,3,0,0,0,0,0,0,2,0,0,0,0,0,0,0,2,0,1,1,0,0,1,0,1,0,0,0,0,5,0,1,4,0,0,0,0,0,0,1,0,0,0,0,0,0,4,0,0,0,0,2,0,6,0,0,0,0,0,0,0,2,2,0,0,0,0,0,0,0,0,1,0,0
2,Thistletown-Beaumond Heights,3,0,1,0,3,0,0,1,0,0,1,2,0,1,0,0,2,0,3,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,1,1,0,0,3,0,0,0,0,0,0,5,0,0,3,1,0,0,0,0,0,1,0,0,0,0,0,0,5,0,0,0,0,2,0,6,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,4,0,0
3,Rexdale-Kipling,4,0,1,1,3,0,0,1,0,0,1,2,0,1,0,0,1,0,4,0,0,0,0,0,0,2,0,0,0,0,0,0,0,3,0,1,1,0,0,2,0,0,0,0,0,0,4,0,0,1,2,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,4,0,4,0,0,0,0,0,0,0,3,1,0,0,0,1,0,0,0,0,3,0,0
4,Elms-Old Rexdale,5,0,1,1,3,0,0,3,0,0,1,2,0,1,0,0,1,0,4,0,0,0,0,0,0,2,0,0,0,0,0,0,0,3,0,1,1,0,0,0,0,0,0,0,0,0,3,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,3,0,8,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,3,0,0


Great! Now, let's get the top 5 food categories for each neighborhood.

In [25]:
# get the top 5 food categories and save the data into top5_food_df
top5_food_df=get_top5_elements(food_df,'Food')
top5_food_df.head()

Unnamed: 0,Neighborhood,ID,1st Most Common Food,2nd Most Common Food,3rd Most Common Food,4th Most Common Food,5th Most Common Food
0,West Humber-Clairville,1,Indian Restaurant,Sandwich Place,Restaurant,Chinese Restaurant,Caribbean Restaurant
1,Mount Olive-Silverstone-Jamestown,2,Sandwich Place,Indian Restaurant,Italian Restaurant,Pizza Place,Asian Restaurant
2,Thistletown-Beaumond Heights,3,Sandwich Place,Pizza Place,Indian Restaurant,Vietnamese Restaurant,Fried Chicken Joint
3,Rexdale-Kipling,4,Indian Restaurant,Chinese Restaurant,Restaurant,Sandwich Place,Vietnamese Restaurant
4,Elms-Old Rexdale,5,Sandwich Place,Chinese Restaurant,Indian Restaurant,Vietnamese Restaurant,Pizza Place


Cool! Now, let's normalize the popular food data.

In [26]:
# normalize the popular food data
food_df=data_normalization(food_df)
food_df.head()

Unnamed: 0,Neighborhood,ID,Afghan Restaurant,African Restaurant,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,Greek Restaurant,Hakka Restaurant,Hotpot Restaurant,Hungarian Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Noodle House,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant
0,West Humber-Clairville,1,1.0,0.0,0.667,0.667,0.0,0.0,0.125,0.0,0.0,0.2,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.4,0.0,0.0,0.0,0.0,0.25,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.167,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222,0.0,0.0,0.0,0.0,0.571,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Mount Olive-Silverstone-Jamestown,2,0.0,0.5,0.0,1.0,0.25,0.0,0.25,0.0,0.0,0.0,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.5,0.0,0.0,0.25,0.0,0.167,0.0,0.0,0.0,0.0,0.625,0.0,1.0,0.364,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.444,0.0,0.0,0.0,0.0,0.286,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.143,0.0,0.0
2,Thistletown-Beaumond Heights,3,0.0,0.5,0.0,1.0,0.0,0.0,0.125,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.333,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.5,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.625,0.0,0.0,0.273,0.2,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.556,0.0,0.0,0.0,0.0,0.286,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.571,0.0,0.0
3,Rexdale-Kipling,4,0.0,0.5,0.333,1.0,0.0,0.0,0.125,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.5,0.5,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.091,0.4,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.222,0.0,0.0,0.0,0.0,0.571,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.429,0.0,0.0
4,Elms-Old Rexdale,5,0.0,0.5,0.333,1.0,0.0,0.0,0.375,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.091,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.429,0.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.429,0.0,0.0


Nice!

### D. Complete dataframes for Toronto's neighborhoods recommender system<a name="complete"></a>

Let's merge all the dataframes consist of numerical values only first.

In [27]:
# import necessary library
from functools import reduce

# merge all the dataframes consist of numerical values into complete_df
complete_list=[boundary_df,jobs_df,houseprice_df,crime_df,language_df,food_df]
complete_df=reduce(lambda left,right:pd.merge(left,right),complete_list)
print('This dataframe consists of {} rows and {} columns!'.format(complete_df.shape[0],complete_df.shape[1]))
complete_df.head()

This dataframe consists of 140 rows and 262 columns!


Unnamed: 0,Neighborhood,ID,Latitude,Longitude,"11 Agriculture, forestry, fishing and hunting","21 Mining, quarrying, and oil and gas extraction",22 Utilities,23 Construction,31-33 Manufacturing,41 Wholesale trade,44-45 Retail trade,48-49 Transportation and warehousing,51 Information and cultural industries,52 Finance and insurance,53 Real estate and rental and leasing,"54 Professional, scientific and technical services",55 Management of companies and enterprises,"56 Administrative and support, waste management and remediation services",61 Educational services,62 Health care and social assistance,"71 Arts, entertainment and recreation",72 Accommodation and food services,81 Other services (except public administration),91 Public administration,Any Job,Avg House Price,Avg Household Income,HAI,Avg House Price Categories,Avg Household Income Categories,HAI Categories,Assault,AutoTheft,Homicide,TheftOver,BreakandEnter,Robbery,All Crimes,Assault Categories,AutoTheft Categories,Homicide Categories,TheftOver Categories,BreakandEnter Categories,Robbery Categories,All Crimes Categories,English,French,Montagnais (Innu),Swampy Cree,Ojibway,Oji-Cree,Ottawa (Odawa),Dene,Sarsi (Sarcee),Mohawk,Dakota,Kabyle,Bilen,Oromo,Somali,Waray-Waray,Amharic,Arabic,Assyrian Neo-Aramaic,Chaldean Neo-Aramaic,Harari,Hebrew,Maltese,Tigrigna,Khmer (Cambodian),Vietnamese,Bikol,Cebuano,Fijian,Hiligaynon,Ilocano,Malagasy,Malay,"Pampangan (Kapampangan, Pampango)",Pangasinan,"Tagalog (Pilipino, Filipino)",Haitian Creole,Kannada,Malayalam,Tamil,Telugu,Albanian,Armenian,Latvian,Lithuanian,Belarusan,Bosnian,Bulgarian,Croatian,Czech,Macedonian,Polish,Russian,Serbian,Serbo-Croatian,Slovak,Slovene (Slovenian),Ukrainian,Scottish Gaelic,Welsh,Afrikaans,Danish,Dutch,German,Icelandic,Norwegian,Swedish,Vlaams (Flemish),Yiddish,Greek,Bengali,Gujarati,Hindi,Kashmiri,Konkani,Marathi,Nepali,Oriya (Odia),Punjabi (Panjabi),Sindhi,Sinhala (Sinhalese),Urdu,Kurdish,Pashto,Persian (Farsi),Catalan,Japanese,Italian,Portuguese,Romanian,Spanish,Georgian,Korean,Mongolian,Akan (Twi),Bamanankan,Edo,Ewe,"Fulah (Pular, Pulaar, Fulfulde)",Ga,Ganda,Igbo,Lingala,Rundi (Kirundi),Kinyarwanda (Rwanda),Shona,Swahili,Wolof,Yoruba,Dinka,Cantonese,Hakka,Mandarin,Min Dong,"Min Nan (Chaochow, Teochow, Fukien, Taiwanese)",Wu (Shanghainese),Burmese,Tibetan,Lao,Thai,Azerbaijani,Turkish,Uyghur,Uzbek,Estonian,Finnish,Hungarian,Afghan Restaurant,African Restaurant,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,Greek Restaurant,Hakka Restaurant,Hotpot Restaurant,Hungarian Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Noodle House,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant
0,West Humber-Clairville,1,43.71618,-79.596356,0.333,0.108,0.18,0.492,1.0,0.486,0.68,1.0,0.125,0.078,0.182,0.079,0.14,0.574,0.325,0.492,0.228,0.387,0.559,0.249,0.343,587000,94000,112.825,"Low (<964,000)","Low (<106,666)",High (>100),905,1100,4,156,413,275,2853,"Medium (588-1,027)",High (>184),Medium (2-4),High (>44),High (>353),High (>192),"High (>1,773)",0.345,0.123,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.134,0.5,0.111,0.103,0.06,0.02,0.0,0.0,0.273,0.286,0.067,0.077,0.0,0.077,0.0,0.0,0.087,0.0,0.0,1.0,1.0,0.211,0.0,0.25,0.333,0.069,0.262,0.25,0.004,0.0,0.05,0.0,0.0,0.0,0.276,0.1,0.034,0.103,0.008,0.028,0.0,0.008,0.167,0.015,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.077,0.05,0.345,1.0,0.0,0.0,0.273,0.025,0.0,1.0,0.069,0.545,0.115,0.059,0.018,0.025,0.0,0.045,0.139,0.02,0.011,0.31,0.0,0.016,0.25,0.408,0.0,0.5,0.0,0.0,1.0,0.333,0.556,0.0,0.5,0.0,0.0,0.235,0.0,0.636,0.0,0.02,0.024,0.025,0.0,0.079,0.0,0.5,0.004,0.077,0.25,0.0,0.138,0.0,0.0,0.0,0.0,0.22,1.0,0.0,0.667,0.667,0.0,0.0,0.125,0.0,0.0,0.2,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.4,0.0,0.0,0.0,0.0,0.25,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.167,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222,0.0,0.0,0.0,0.0,0.571,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Mount Olive-Silverstone-Jamestown,2,43.746868,-79.587259,0.667,0.054,0.06,0.451,0.952,0.439,0.532,0.732,0.097,0.052,0.114,0.046,0.12,0.493,0.203,0.402,0.146,0.374,0.461,0.159,0.277,578000,79000,96.297,"Low (<964,000)","Low (<106,666)",Medium (76-100),776,189,6,13,97,233,1314,"Medium (588-1,027)",High (>184),High (>4),Low (<26),Low (<224),High (>192),"Medium (1,138-1,773)",0.288,0.136,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359,0.0,0.25,1.0,1.0,1.0,0.0,0.0,0.0,0.19,0.044,0.086,0.0,0.077,1.0,0.167,0.152,0.0,0.286,0.0,0.0,0.101,1.0,0.25,0.211,0.155,0.095,0.062,0.022,0.0,0.0,0.0,0.0,0.038,0.103,0.0,0.034,0.076,0.007,0.007,0.111,0.0,0.056,0.015,0.0,0.0,0.5,0.0,0.0,0.133,0.0,0.0,0.0,0.0,0.0,0.006,0.08,0.488,0.645,0.0,0.0,0.273,0.013,0.0,0.572,0.034,0.136,0.173,0.118,0.098,0.046,0.0,0.015,0.153,0.027,0.032,0.248,0.0,0.013,0.0,1.0,0.0,1.0,1.0,0.0,0.25,0.0,1.0,1.0,0.0,0.0,0.333,0.471,0.5,1.0,0.0,0.013,0.049,0.006,0.0,0.0,0.0,0.333,0.004,0.385,0.75,0.0,0.096,0.0,0.091,0.0,0.0,0.1,0.0,0.5,0.0,1.0,0.25,0.0,0.25,0.0,0.0,0.0,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.5,0.0,0.0,0.25,0.0,0.167,0.0,0.0,0.0,0.0,0.625,0.0,1.0,0.364,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.444,0.0,0.0,0.0,0.0,0.286,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.143,0.0,0.0
2,Thistletown-Beaumond Heights,3,43.737988,-79.563491,0.0,0.054,0.08,0.218,0.243,0.155,0.215,0.274,0.038,0.021,0.046,0.022,0.0,0.134,0.091,0.121,0.048,0.105,0.142,0.078,0.096,898000,94000,73.751,"Low (<964,000)","Low (<106,666)",Low (<76),518,244,2,22,183,144,1113,Low (<588),High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Low (<1,138)",0.106,0.065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052,0.0,0.028,0.108,0.131,0.157,0.0,0.0,0.0,0.048,0.022,0.034,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042,0.0,0.0,0.053,0.017,0.071,0.021,0.004,0.0,0.0,0.0,0.0,0.0,0.172,0.0,0.0,0.069,0.002,0.0,0.0,0.0,0.056,0.01,0.0,0.0,0.0,0.0,0.0,0.133,0.0,0.0,0.25,0.0,0.0,0.039,0.024,0.081,0.187,0.0,0.0,0.091,0.013,0.0,0.156,0.0,0.045,0.048,0.0,0.012,0.015,0.0,0.0,0.12,0.014,0.032,0.172,0.0,0.004,0.0,0.171,0.0,0.125,0.0,0.0,0.25,0.0,0.111,0.0,0.0,0.0,0.0,0.059,0.0,0.182,0.0,0.005,0.0,0.004,0.0,0.026,0.0,0.0,0.004,0.0,0.0,0.0,0.032,0.0,0.0,0.0,0.0,0.1,0.0,0.5,0.0,1.0,0.0,0.0,0.125,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.333,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.5,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.625,0.0,0.0,0.273,0.2,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.556,0.0,0.0,0.0,0.0,0.286,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.571,0.0,0.0
3,Rexdale-Kipling,4,43.723725,-79.566228,0.417,0.0,0.08,0.206,0.215,0.165,0.221,0.265,0.03,0.023,0.065,0.024,0.0,0.21,0.125,0.158,0.061,0.094,0.165,0.114,0.105,744000,91000,86.175,"Low (<964,000)","Low (<106,666)",Medium (76-100),652,272,4,16,150,191,1285,"Medium (588-1,027)",High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Medium (1,138-1,773)",0.133,0.084,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.714,0.052,0.0,0.056,0.054,0.04,0.0,0.0,0.0,0.091,0.095,0.0,0.043,0.0,0.154,0.0,0.167,0.0,0.0,0.143,1.0,0.0,0.049,0.0,0.0,0.018,0.01,0.0,0.083,0.0,0.0,0.0,0.0,0.077,0.038,0.897,0.0,0.0,0.124,0.005,0.007,0.0,0.008,0.167,0.015,0.0,0.0,0.0,0.0,0.0,0.267,0.0,0.0,0.0,0.0,0.0,0.052,0.026,0.019,0.103,0.0,0.0,0.0,0.025,0.0,0.022,0.0,0.136,0.039,0.0,0.018,0.012,0.0,0.0,0.063,0.014,0.053,0.234,0.019,0.004,0.0,0.079,0.0,0.0,0.0,0.0,0.0,0.0,0.111,0.0,0.0,0.0,0.0,0.118,0.0,0.091,0.0,0.004,0.0,0.004,0.0,0.0,0.0,0.0,0.004,0.077,0.0,0.0,0.043,0.0,0.0,0.0,0.0,0.12,0.0,0.5,0.333,1.0,0.0,0.0,0.125,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.5,0.5,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.091,0.4,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.0,0.222,0.0,0.0,0.0,0.0,0.571,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.429,0.0,0.0
4,Elms-Old Rexdale,5,43.721519,-79.548983,0.0,0.0,0.0,0.144,0.18,0.147,0.182,0.248,0.034,0.021,0.055,0.02,0.08,0.152,0.097,0.148,0.034,0.091,0.15,0.078,0.087,600000,82000,96.289,"Low (<964,000)","Low (<106,666)",Medium (76-100),576,198,3,15,111,142,1045,Low (<588),High (>184),Medium (2-4),Low (<26),Low (<224),Medium (109-192),"Low (<1,138)",0.116,0.039,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.229,0.0,0.028,0.025,0.023,0.0,0.0,0.0,0.0,0.095,0.044,0.068,0.0,0.077,0.0,0.0,0.065,0.0,0.0,0.5,0.0,0.068,0.0,0.0,0.018,0.005,0.024,0.021,0.0,0.0,0.05,0.0,0.0,0.0,0.172,0.1,0.069,0.159,0.007,0.014,0.0,0.0,0.056,0.015,0.0,0.0,0.0,0.0,0.0,0.067,0.0,0.0,0.0,0.0,0.0,0.006,0.024,0.005,0.065,0.0,0.0,0.0,0.0,0.0,0.012,0.0,0.0,0.026,0.059,0.0,0.03,0.0,0.015,0.083,0.009,0.032,0.216,0.0,0.01,0.0,0.132,0.0,0.125,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.0,0.118,0.0,0.182,0.0,0.004,0.0,0.003,0.0,0.0,0.0,0.333,0.004,0.077,0.0,0.0,0.011,0.0,0.0,0.0,0.0,0.1,0.0,0.5,0.333,1.0,0.0,0.0,0.375,0.0,0.0,0.2,0.333,0.0,0.067,0.0,0.0,0.167,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.091,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333,0.0,0.0,0.0,0.0,0.429,0.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.429,0.0,0.0


Awesome! Now, let's merge all the dataframes consist of top 5 common elements.

In [28]:
# merge all the dataframes consist of top 5 common elements into complete_top5_df
complete_top5_list=[boundary_df,top5_jobs_df,top5_language_df,top5_food_df]
complete_top5_df=reduce(lambda left,right:pd.merge(left,right),complete_top5_list)
print('This dataframe consists of {} rows and {} columns!'.format(complete_top5_df.shape[0],
                                                                  complete_top5_df.shape[1]))
complete_top5_df.head()

This dataframe consists of 140 rows and 19 columns!


Unnamed: 0,Neighborhood,ID,Latitude,Longitude,1st Most Common Job,2nd Most Common Job,3rd Most Common Job,4th Most Common Job,5th Most Common Job,1st Most Common Language,2nd Most Common Language,3rd Most Common Language,4th Most Common Language,5th Most Common Language,1st Most Common Food,2nd Most Common Food,3rd Most Common Food,4th Most Common Food,5th Most Common Food
0,West Humber-Clairville,1,43.71618,-79.596356,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,62 Health care and social assistance,"56 Administrative and support, waste managemen...",English,Punjabi (Panjabi),Gujarati,Spanish,Hindi,Indian Restaurant,Sandwich Place,Restaurant,Chinese Restaurant,Caribbean Restaurant
1,Mount Olive-Silverstone-Jamestown,2,43.746868,-79.587259,31-33 Manufacturing,44-45 Retail trade,62 Health care and social assistance,48-49 Transportation and warehousing,72 Accommodation and food services,English,Punjabi (Panjabi),Gujarati,Assyrian Neo-Aramaic,Arabic,Sandwich Place,Indian Restaurant,Italian Restaurant,Pizza Place,Asian Restaurant
2,Thistletown-Beaumond Heights,3,43.737988,-79.563491,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,23 Construction,62 Health care and social assistance,English,Punjabi (Panjabi),Spanish,Gujarati,Italian,Sandwich Place,Pizza Place,Indian Restaurant,Vietnamese Restaurant,Fried Chicken Joint
3,Rexdale-Kipling,4,43.723725,-79.566228,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,"56 Administrative and support, waste managemen...",48-49 Transportation and warehousing,English,Spanish,Italian,Urdu,Croatian,Indian Restaurant,Chinese Restaurant,Restaurant,Sandwich Place,Vietnamese Restaurant
4,Elms-Old Rexdale,5,43.721519,-79.548983,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,48-49 Transportation and warehousing,"56 Administrative and support, waste managemen...",English,Spanish,Somali,Italian,Vietnamese,Sandwich Place,Chinese Restaurant,Indian Restaurant,Vietnamese Restaurant,Pizza Place


Superb! Now we are ready to build a Toronto's neighborhoods recommender system!

## 3. Methodology<a name="methodology"></a>

## 4. Results<a name="results"></a>

## 5. Discussion<a name="discussion"></a>

## 6. Conclusion<a name="conclusion"></a>

### Thank you for reading this notebook! Feel free to read the __[report]()__ and the __[blogpost]()__ too! 

## Author  
__[Titus Chin Jun Hong]()__  
**18 November 2020**