# Toronto's Neighborhoods Recommender System
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.wallpaperup.com%2Fuploads%2Fwallpapers%2F2013%2F12%2F19%2F199807%2F4d86b2357c55ff2bc433fc0af0705b97.jpg&f=1&nofb=1/toronto.jpeg%E2%80%9D" alt="toronto" align="left" width="600" />

## Table of Contents
1. **[Introduction](#introduction)**
2. **[Data](#data)**  
3. **[Methodology](#methodology)**
4. **[Results](#results)**
5. **[Discussion](#discussion)**
6. **[Conclusion](#conclusion)**

## 1. Introduction <a name="introduction"></a>
According to __[CIC News](https://www.cicnews.com/2020/02/which-cities-in-canada-attract-the-most-immigrants-0213741.html#)__, Canada welcomed more than 341,000 immigrants in 2019 and Toronto has successfully attracted nearly 118,000 immigrants which contribute to almost 35% of the total number of immigrants. **The statistics indicate that most of the immigrants prefer to settle in Toronto over other cities.** Why? __[VisaPlace](https://www.visaplace.com/blog-immigration-law/why-immigrants-settle-in-toronto-heres-10-reasons/)__ has listed out 10 reasons for this question. For me, the most convincing reason is Toronto is Canada’s business and financial capital, that's why immigrants prefer it.

Toronto is Canada’s largest city, it has 6 boroughs which are Etobicoke, North York, East York, Central Toronto, York and Scarborough. These 6 boroughs can be further divided into 140 neighborhoods. According to __[City of Toronto](https://www.toronto.ca/community-people/moving-to-toronto/about-toronto/)__, Toronto is one of the most multicultural cities in the world due to its large population of immigrants all over the world, each Toronto's neighborhood might be quite different from one another. **Therefore, out of 140 neighborhoods in Toronto, how can immigrants decide which neighborhood suits them best?** This is exactly what I want to resolve in this project.

**In this project, I will try to build a Toronto's neighborhoods recommender system based on 4 factors including job opportunities, cost of living, safety and culture.** So, who would be interested in this recommender system? I can say that at least 118,000 people would and I believe that this number will be growing in the future. And of course, I can't wait to find out which neighborhood suit me best too because I wish to migrate to Canada and settle in Toronto in the future. How about you?

## 2. Data<a name="data"></a> 
Previously, I mentioned that the Toronto's neighborhoods recommender system is built on job opportunities, cost of living, safety and culture. In this section, I will explain why these factors are important, describe the data that will be used and their source, finally import and clean the data.

### A. Factors to consider while deciding where to settle
* **Job opportunities**: We have to make a living to support ourselves or our family. And I bet we wish to get our dream job right? So, we need to know what are the common jobs for each neighborhood.
* **Cost of living**: We would like to buy our dream house but how much does it cost? Curious of how much should we earn to afford to live in a specific neighborhood? To answer these questions, we need to know the average house price and household income for each neighborhood.
* **Safety**: We wish to live in a safe and peaceful area but how can we determine if the area is safe? To answer these questions, we need to know the crime rate for each neighborhood.
* **Culture**: We will talk and eat everyday. If possible, we would like to communicate in our favorite language and eat our favorite food right? And it's even better if our favorite things are just around us. So, it's important to know what are the language spoken most often at home and what are the popular food in each neighborhood.

### B. Description of data and data source
|No.| Data           | Data Description  |   Data Source   | 
|:-------------| :------------- | :---------- | :----------- |
|I. | Common jobs| These data show the common jobs for each neighborhood. The data categorize jobs according to North American Industry Classification System (NAICS) 2012. For example: 54-Professional, scientific and technical services, 23-Construction, etc. | I extracted the data from the __[2016 Toronto Neighborhood Profiles](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv)__. City of Toronto uses the 2016 Canadian Census to provide a portrait of the demographic, social and economic characteristics of the people and households in each Toronto's neighbourhood. |
|II. | Average house price and household income | These data show the average house price and household income for each neighborhood in Canadian Dollar (CAD). The home affordability for each neighborhood also calculated.| I scraped the data current as of October 2020 from __[Realosophy](https://www.realosophy.com/toronto/neighbourhood-map)__. Realosophy is a real estate brokerage company that helps their customers make better decision based on data. |
|III. |Crime rate| These data show the crime rate per 100,000 people for each neighborhood. | I get the data from the __[Toronto Neighborhood Crime Rates Boundary File](https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-?geometry=-79.598%2C43.673%2C-79.158%2C43.760&orderBy=OBJECTID&page=6)__ by calling a REST API from Toronto Police Service. The file contains the 2014-2019 crime data by neighbourhood. |
|IV. |Language spoken most often at home|  These data show the language spoken most often at home in each neighborhood. For example: English, Spanish, Italian, French, etc.  | I extracted the data from the __[2016 Toronto Neighborhood Profiles](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv)__. |
|V. |Popular food| These data show the popular food categories around each neighborhood according to Foursquare API. For example: Italian food, Korean food, Japanese food, etc. | I get the data through __[Foursqure API](https://developer.foursquare.com/docs/)__. Foursquare is a location technology platform dedicated to improve how people move through the real world. |
|VI. |Boundaries of neighborhoods| These data contain the boundary of each neighborhood in GeoJSON file. These data are used to create the boundary of each neighborhood on a map. | I get the data from __[Boundaries of Toronto's Neighbourhoods](https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=geojson&projection=4326)__. City of Toronto made the data available on its open data portal. |

### C. Import data and data wrangling

#### I. Common jobs data
Now, let's import and clean the common jobs data first.

In [1]:
# import necessary library
import pandas as pd
pd.set_option('display.max_rows',None)
pd.set_option('display.max_columns',None)

# import the 2016 toronto neighborhood profiles into toronto_df and clean the dataframe
toronto_df=pd.read_csv('https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv')
toronto_df.drop(['_id','Category','Data Source','City of Toronto'],axis=1,inplace=True)

# extract common jobs data from toronto_df into jobs_df and clean the dataframe
topic=['Industry - North American Industry Classification System (NAICS) 2012']
index=['Neighbourhood Number']
jobs_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
jobs_df=jobs_df.drop('Topic',axis=1).set_index('Characteristic').T
jobs_df.columns=jobs_df.columns.str.strip()
jobs_df=jobs_df.drop(jobs_df.columns[1:4],axis=1).replace(',','',regex=True).astype(int)
jobs_df=jobs_df.sort_values(index).rename_axis(None,axis=1).reset_index()
jobs_df.rename(columns={'index':'Neighborhood','Neighbourhood Number':'ID'},inplace=True)
jobs_df.head()

Unnamed: 0,Neighborhood,ID,"11 Agriculture, forestry, fishing and hunting","21 Mining, quarrying, and oil and gas extraction",22 Utilities,23 Construction,31-33 Manufacturing,41 Wholesale trade,44-45 Retail trade,48-49 Transportation and warehousing,51 Information and cultural industries,52 Finance and insurance,53 Real estate and rental and leasing,"54 Professional, scientific and technical services",55 Management of companies and enterprises,"56 Administrative and support, waste management and remediation services",61 Educational services,62 Health care and social assistance,"71 Arts, entertainment and recreation",72 Accommodation and food services,81 Other services (except public administration),91 Public administration
0,West Humber-Clairville,1,20,20,45,1025,2835,675,2020,1695,400,755,295,965,35,1285,875,1665,335,1195,710,460
1,Mount Olive-Silverstone-Jamestown,2,40,10,15,940,2700,610,1580,1240,310,505,185,560,30,1105,545,1360,215,1155,585,295
2,Thistletown-Beaumond Heights,3,0,10,20,455,690,215,640,465,120,200,75,275,0,300,245,410,70,325,180,145
3,Rexdale-Kipling,4,25,0,20,430,610,230,655,450,95,220,105,300,0,470,335,535,90,290,210,210
4,Elms-Old Rexdale,5,0,0,0,300,510,205,540,420,110,200,90,240,20,340,260,500,50,280,190,145


Looks great! Let's get the top 5 common jobs for each neighborhood.

In [2]:
# define a function to return a dataframe of top 5 elements
def get_top5_elements(dataframe,column_name):
    first_element=[]
    second_element=[]
    third_element=[]
    fourth_element=[]
    fifth_element=[]
    first_column=dataframe.iloc[:,0].values
    second_column=dataframe.iloc[:,1].values
    for i in range(140):
        sorted_elements=dataframe.iloc[i,2:].sort_values(ascending=False).index
        first_element.append(sorted_elements[0])
        second_element.append(sorted_elements[1])
        third_element.append(sorted_elements[2])
        fourth_element.append(sorted_elements[3])
        fifth_element.append(sorted_elements[4])
    return pd.DataFrame({'Neighborhood':first_column,'ID':second_column,
                         '1st Most Common {}'.format(column_name):first_element,
                         '2nd Most Common {}'.format(column_name):second_element,
                         '3rd Most Common {}'.format(column_name):third_element,
                         '4th Most Common {}'.format(column_name):fourth_element,
                         '5th Most Common {}'.format(column_name):fifth_element})

# get top 5 common jobs and save the data into top5_jobs_df
top5_jobs_df=get_top5_elements(jobs_df,'Job')
top5_jobs_df.head()

Unnamed: 0,Neighborhood,ID,1st Most Common Job,2nd Most Common Job,3rd Most Common Job,4th Most Common Job,5th Most Common Job
0,West Humber-Clairville,1,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,62 Health care and social assistance,"56 Administrative and support, waste managemen..."
1,Mount Olive-Silverstone-Jamestown,2,31-33 Manufacturing,44-45 Retail trade,62 Health care and social assistance,48-49 Transportation and warehousing,72 Accommodation and food services
2,Thistletown-Beaumond Heights,3,31-33 Manufacturing,44-45 Retail trade,48-49 Transportation and warehousing,23 Construction,62 Health care and social assistance
3,Rexdale-Kipling,4,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,"56 Administrative and support, waste managemen...",48-49 Transportation and warehousing
4,Elms-Old Rexdale,5,44-45 Retail trade,31-33 Manufacturing,62 Health care and social assistance,48-49 Transportation and warehousing,"56 Administrative and support, waste managemen..."


Cool! Let's calculate the job opportunity by summing up all the jobs for each neighborhood. After that, normalize the common jobs data.

In [3]:
# calculate the job opportunity for each neighborhood
jobs_df['Job Opportunity']=jobs_df.iloc[:,2:].sum(axis=1)

# def a function to normalize data
def data_normalization(dataframe):
    temp_df=dataframe
    columns_to_normalize=dataframe.columns[2:]
    for column in columns_to_normalize:
        temp_df[column]=(temp_df[column]/temp_df[column].max()).round(3)
    return temp_df

# normalize the common jobs data
jobs_df=data_normalization(jobs_df)
jobs_df.head()

Unnamed: 0,Neighborhood,ID,"11 Agriculture, forestry, fishing and hunting","21 Mining, quarrying, and oil and gas extraction",22 Utilities,23 Construction,31-33 Manufacturing,41 Wholesale trade,44-45 Retail trade,48-49 Transportation and warehousing,51 Information and cultural industries,52 Finance and insurance,53 Real estate and rental and leasing,"54 Professional, scientific and technical services",55 Management of companies and enterprises,"56 Administrative and support, waste management and remediation services",61 Educational services,62 Health care and social assistance,"71 Arts, entertainment and recreation",72 Accommodation and food services,81 Other services (except public administration),91 Public administration,Job Opportunity
0,West Humber-Clairville,1,0.333,0.108,0.18,0.492,1.0,0.486,0.68,1.0,0.125,0.078,0.182,0.079,0.14,0.574,0.325,0.492,0.228,0.387,0.559,0.249,0.343
1,Mount Olive-Silverstone-Jamestown,2,0.667,0.054,0.06,0.451,0.952,0.439,0.532,0.732,0.097,0.052,0.114,0.046,0.12,0.493,0.203,0.402,0.146,0.374,0.461,0.159,0.277
2,Thistletown-Beaumond Heights,3,0.0,0.054,0.08,0.218,0.243,0.155,0.215,0.274,0.038,0.021,0.046,0.022,0.0,0.134,0.091,0.121,0.048,0.105,0.142,0.078,0.096
3,Rexdale-Kipling,4,0.417,0.0,0.08,0.206,0.215,0.165,0.221,0.265,0.03,0.023,0.065,0.024,0.0,0.21,0.125,0.158,0.061,0.094,0.165,0.114,0.105
4,Elms-Old Rexdale,5,0.0,0.0,0.0,0.144,0.18,0.147,0.182,0.248,0.034,0.021,0.055,0.02,0.08,0.152,0.097,0.148,0.034,0.091,0.15,0.078,0.087


Awesome!

#### II. Average house price and household income data
Before scraping the average house price and household income for each neighborhood, we need to get the website link for each neighborhood first. So, let's scrape these information from Realosophy now.

In [None]:
# import necessary libraries
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
from bs4 import BeautifulSoup
firefox_options=Options()
firefox_options.add_argument('-headless')

# define a function to scrape each neighborhood's name and website by borough
def get_neighborhood_websites(borough):
    driver=webdriver.Firefox(options=firefox_options)
    driver.get('https://www.realosophy.com/{}-former-toronto/neighbourhood-map'.format(borough))
    time.sleep(5)
    html=driver.page_source
    soup=BeautifulSoup(html,'lxml')
    all_data=soup.find('div',{'class':'row mt-4'})
    neighborhood_data=all_data.find_all('a')
    neighborhood=[]
    website=[]
    for data in neighborhood_data:
        neighborhood.append(data.text)
        website_temp=data['href'].replace('/','https://www.realosophy.com/',1)
        website.append(website_temp)
    driver.quit()
    print('...',end='')
    return pd.DataFrame({'Neighborhood':neighborhood,'Website':website})

# scrape etobicoke's neighborhoods and websites into etobicoke_df then insert neighborhoods id
print('Almost...',end='')
etobicoke_df=get_neighborhood_websites('etobicoke')
etobicoke_df.drop(25,inplace=True)
etobicoke_df['ID']=[20,11,1,14,13,17,8,9,14,6,19,12,17,18,10,14,4,7,2,16,5,15,16,3,11]

# scrape north york's neighborhoods and websites into northyork_df then insert neighborhoods id
northyork_df=get_neighborhood_websites('north-york')
northyork_df.loc[len(northyork_df.index)]=northyork_df.loc[31,:]
northyork_df.loc[len(northyork_df.index)]=northyork_df.loc[39,:]
northyork_df['ID']=[38,42,34,52,49,43,24,41,30,39,33,39,42,47,45,26,44,31,25,45,53,48,41,21,23,22,38,32,41,39,29,
                    36,45,23,46,28,43,41,35,37,40,27,31,50,51]

# scrape east york's neighborhoods and websites into eastyork_df then insert neighborhoods id
eastyork_df=get_neighborhood_websites('east-york')
eastyork_df['ID']=[56,57,59,61,56,58,54,55,54,54,60]

# scrape central toronto's neighborhoods and websites into centraltoronto_df then insert neighborhoods id
centraltoronto_df=get_neighborhood_websites('central-toronto')
centraltoronto_df=centraltoronto_df.drop([16,47,69,74]).reset_index(drop=True)
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[39,:]
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[35,:]
centraltoronto_df.loc[len(centraltoronto_df.index)]=centraltoronto_df.loc[20,:]
centraltoronto_df['ID']=[78,103,76,84,105,80,89,83,71,91,96,100,78,93,75,77,73,66,93,99,97,77,93,83,92,77,77,76,
                         101,102,82,77,77,90,87,94,74,78,70,82,81,103,98,73,103,67,96,92,72,68,70,86,98,95,79,77,
                         96,85,73,74,77,97,87,105,95,63,67,90,69,81,82,62,93,77,64,75,95,65,88,104]

# scrape york's neighborhoods and websites into york_df then insert neighborhoods id
york_df=get_neighborhood_websites('york')
york_df=york_df.drop(11).reset_index(drop=True)
york_df['ID']=[114,112,108,109,106,110,114,115,107,114,111,113]

# scrape scarborough's neighborhoods and websites into scarborough_df then insert neighborhoods id
scarborough_df=get_neighborhood_websites('scarborough')
scarborough_df.loc[len(scarborough_df.index)]=scarborough_df.loc[0,:]
scarborough_df['ID']=[128,127,122,120,120,123,122,126,138,140,134,125,124,117,132,119,130,135,135,121,133,131,139,
                      116,118,136,131,119,137,129]

# concatenate these six boroughs dataframe into website_df
website_df=pd.concat([etobicoke_df,northyork_df,eastyork_df,centraltoronto_df,york_df,scarborough_df])
website_df=website_df.sort_values('ID').reset_index(drop=True)
print('...Done!',end='')
website_df.head()

Excellent! Now, let's scrape the average house price and household income data from Realosophy and clean the data.

In [None]:
# import necessary library
import numpy as np

# define a function to get the avg house price or household income in numerical result
def get_number_only(data):
    result=data.text.strip().replace('$','').replace(',','')
    if 'M' in result:
        result=int((float(result.replace('M','')))*1000000)
        return result
    else:
        result=int((float(result.replace('K','')))*1000)
        return result

# scrape average house price and household income for each neighborhood into website_df
print('Progress:')
driver=webdriver.Firefox(options=firefox_options)
website_df['Avg House Price']=0
website_df['Avg Household Income']=0
for i in range(len(website_df['ID'])):
    driver.get(website_df['Website'][i])
    time.sleep(5)
    html=driver.page_source
    soup=BeautifulSoup(html,'lxml')
    houseprice_data=soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})
    income_class='h3 font-sans-caption-bold mb-0 text-center text-sm-left ng-binding ng-scope'
    income_data=soup.find('p',{'class':income_class})
    while houseprice_data==None or income_data==None:
        driver.get(website_df['Website'][i])
        time.sleep(5)
        html=driver.page_source
        soup=BeautifulSoup(html,'lxml')
        houseprice_data=soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})
        income_data=soup.find('p',{'class':income_class})
    website_df.iloc[i,3]=get_number_only(houseprice_data)
    website_df.iloc[i,4]=get_number_only(income_data)
    print('.',end='')
driver.quit()

# group website_df by neighborhood id and save the data into houseprice_df
website_df.drop('Neighborhood',axis=1,inplace=True)
website_df=website_df.groupby('ID').mean().reset_index()
website_df['Avg House Price']=website_df['Avg House Price'].astype(int)
website_df['Avg Household Income']=website_df['Avg Household Income'].astype(int)
houseprice_df=jobs_df.iloc[:,0:2]
houseprice_df['Avg House Price']=website_df['Avg House Price']
houseprice_df['Avg Household Income']=website_df['Avg Household Income']
print('...Done!')
houseprice_df.head()

Superb! Now, let's divide average household income by average house price to get the home affordability for each neighborhood. After that, categorize these data into three bins namely 'High', 'Medium', and 'Low'.

In [None]:
# calculate the home affordability for each neighborhood
temp_data=houseprice_df['Avg Household Income']/houseprice_df['Avg House Price']
houseprice_df['Home Affordability']=((temp_data/temp_data.max())*100).round(3)

# define a function to categorize data into three bins namely high, medium, low
def data_binning(dataframe):
    for column in dataframe.columns[2:]:
        Q1=dataframe[column].quantile(0.25)
        Q3=dataframe[column].quantile(0.75)
        IQR=Q3-Q1
        dataframe_min=Q1-1.5*IQR
        dataframe_max=Q3+1.5*IQR
        temp_bins=dataframe[column][~((dataframe[column]<dataframe_min)|(dataframe[column]>dataframe_max))]
        bins=np.linspace(min(temp_bins),max(temp_bins),4)
        bins=bins.astype(int)
        dataframe['{} Categories'.format(column)]='Dunno'
        for i in range(140):
            if dataframe[column][i]<bins[1]:
                dataframe.iloc[i,-1]='Low (<{:,d})'.format(bins[1])
            elif dataframe[column][i]>bins[2]:
                dataframe.iloc[i,-1]='High (>{:,d})'.format(bins[2])
            else:
                dataframe.iloc[i,-1]='Medium ({:,d}-{:,d})'.format(bins[1],bins[2])
    return dataframe

# categorize the house price, household income and home affordability
houseprice_df=data_binning(houseprice_df)
houseprice_df.head()

Awesome!

#### III. Crime rate data
Now, let's get the crime rate data by calling a REST API from Toronto Police Service and clean the data.

In [None]:
# import necessary libraries
import requests
from pandas.io.json import json_normalize

# request the crime rate data by using toronto police service's api and clean the data
url='https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Neighbourhood_MCI/FeatureServer/0/query?where=1%3D1&outFields=Neighbourhood,Hood_ID,Population,Assault_AVG,AutoTheft_AVG,Homicide_AVG,TheftOver_AVG,BreakandEnter_AVG,Robbery_AVG&outSR=4326&f=json'
results=requests.get(url).json()
crime_data=results['features']
crime_df=json_normalize(crime_data)
crime_df.drop('geometry.rings',axis=1,inplace=True)
crime_df.columns=(pd.Series(crime_df.columns)).replace('attributes.','',regex=True)
crime_df.rename(columns={'Neighbourhood':'Neighborhood','Hood_ID':'ID'},inplace=True)
crime_df['ID']=crime_df['ID'].astype(int)
crime_df=crime_df.sort_values('ID').reset_index(drop=True)
crime_df.head()

Great! Now, let's get the crime rate per 100,000 people for each neighborhood by using this formula:
___
$ \ \ \ \ \ \ \ \ \ \ \ crime\ rate\ per\ 100,000\ people\ =\ \frac{total\ number\ of\ crimes}{population}\ x\ 100,000\ people $
___

In [None]:
# calculate the crime rate per 100,000 people
crime_df['Crime Rate']=(((crime_df.iloc[:,3:].sum(axis=1))/crime_df['Population'])*100000).astype(int)
crime_df.drop(crime_df.columns[2:-1],axis=1,inplace=True)
crime_df.head()

Cool!

#### IV. Language spoken most often at home data
Now, let's import and clean the language spoken most often at home data first.

In [103]:
a=(language_df.replace(',','',regex=True).astype(int).sum())
a=pd.DataFrame(a)
a=a.sort_values(0,ascending=False)
b=a.head(50)
b

Unnamed: 0_level_0,0
Characteristic,Unnamed: 1_level_1
Language spoken most often at home for the total population excluding institutional residents,2704400
Single responses,2458460
Official languages,1756630
English,1739530
Non-official languages,701750
Non-Aboriginal languages,701645
Indo-European languages,317015
Multiple responses,245990
English and non-official language,233310
Sino-Tibetan languages,185425


In [102]:
topic=['Language spoken most often at home']
index=['Neighbourhood Number']
language_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
language_df=language_df.drop('Topic',axis=1).set_index('Characteristic').T.replace(',','',regex=True).astype(int)
language_df.columns=language_df.columns.str.strip()
language_df.shape

(140, 270)

In [100]:
topic=['Mother tongue']
index=['Neighbourhood Number']
language_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
language_df=language_df.drop('Topic',axis=1).set_index('Characteristic').T.replace(',','',regex=True).astype(int)
language_df.columns=language_df.columns.str.strip()
language_df.shape

(140, 254)

In [None]:
# extract language spoken most often at home data from toronto_df into language_df and clean the dataframe
topic=['Language spoken most often at home']
index=['Neighbourhood Number']
language_df=toronto_df[(toronto_df['Topic'].isin(topic))|toronto_df['Characteristic'].isin(index)]
language_df=language_df.drop('Topic',axis=1).set_index('Characteristic').T.replace(',','',regex=True).astype(int)
language_df.columns=language_df.columns.str.strip()


jobs_df=jobs_df.drop(jobs_df.columns[1:4],axis=1)
jobs_df=jobs_df.sort_values(index).rename_axis(None,axis=1).reset_index()
jobs_df.rename(columns={'index':'Neighborhood','Neighbourhood Number':'ID'},inplace=True)
jobs_df.head()

## 3. Methodology<a name="methodology"></a>

## 4. Results<a name="results"></a>

## 5. Discussion<a name="discussion"></a>

## 6. Conclusion<a name="conclusion"></a>

### Thank you for reading this notebook! Feel free to read the __[full report]()__ and the __[blogpost]()__ too! 

## Author  
__[Titus Chin Jun Hong]()__  
**15 November 2020**