# Toronto's Neighborhoods Recommender System
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.wallpaperup.com%2Fuploads%2Fwallpapers%2F2013%2F12%2F19%2F199807%2F4d86b2357c55ff2bc433fc0af0705b97.jpg&f=1&nofb=1/toronto.jpeg%E2%80%9D" alt="toronto" align="left" width="600" />

## Table of Contents
1. [Introduction](#introduction)
2. [Data](#data)  
3. [Methodology](#methodology)
4. [Results](#results)
5. [Discussion](#discussion)
6. [Conclusion](#conclusion)

## 1. Introduction<a name="introduction"></a>
According to __[CIC News](https://www.cicnews.com/2020/02/which-cities-in-canada-attract-the-most-immigrants-0213741.html#)__, Canada welcomed more than 341,000 immigrants in 2019 and Toronto has successfully attracted nearly 118,000 immigrants which contribute to almost 35% of the total number of immigrants. **The statistics indicate that most of the immigrants prefer to settle in Toronto over other cities.** Why? __[VisaPlace](https://www.visaplace.com/blog-immigration-law/why-immigrants-settle-in-toronto-heres-10-reasons/)__ has listed out 10 reasons for this question. For me, the most convincing reason is Toronto is Canada’s business and financial capital, that's why immigrants prefer it.

Toronto is Canada’s largest city, it has 6 boroughs which are Etobicoke, North York, East York, Central Toronto, York and Scarborough. These 6 boroughs can be further divided into 140 neighborhoods. According to __[City of Toronto](https://www.toronto.ca/community-people/moving-to-toronto/about-toronto/)__, Toronto is one of the most multicultural cities in the world due to its large population of immigrants all over the world, each Toronto's neighborhood might be quite different from one another. **Therefore, out of 140 neighborhoods in Toronto, how can immigrants decide which neighborhood suits them best?** This is exactly what I want to resolve in this project.

**In this project, I will try to build a Toronto's neighborhoods recommender system based on 5 factors including job opportunities, cost of living, ease of transportation, safety and culture.** So, who would be interested in this recommender system? I can say that at least 118,000 people would and I believe that this number will be growing in the future. And of course, I can't wait to find out which neighborhood suit me best too because I wish to migrate to Canada and settle in Toronto in the future. How about you?

## 2. Data<a name="data"></a> 
Previously, I mentioned that the Toronto's neighborhoods recommender system is built on job opportunities, cost of living, ease of transportation, safety and culture. In this section, I will explain why these factors are important, describe the data that will be used and their source, finally import and clean the data.

### A. Factors to consider while deciding where to settle
* **Job opportunities**: We have to make a living to support ourselves or our family. And I bet we wish to get our dream job right? So, we need to know what are the common jobs for each neighborhood.
* **Cost of living**: We would like to buy our dream house but how much does it cost? Curious of how much should we earn to afford to live in a specific neighborhood? To answer these questions, we need to know the average house price and household income for each neighborhood.
* **Ease of transportation**: We need to travel from one point to another for different purposes but what are some available mode of transportation for each neighborhood? So, we need to know how people travel to get the answer.
* **Safety**: We wish to live in a safe and peaceful area but how can we determine if the area is safe? To answer these questions, we need to know the crime rate and coronavirus cases for each neighborhood.
* **Culture**: Everyone likes to have fun right? So, it's important to know what are some popular places around each neighorhood. For some people, English is not their first language, so they will prefer certain neighborhoods in which they can still communicate in their mother tongue. Hence, we need to know what non-English language spoken most often at home in each neighborhood.

### B. Description of data and data source
| Data       | Data Source     | Data 
| :------------- | :----------: | -----------: |
|  Cell Contents | More Stuff   | And Again    |
| You Can Also   | Put Pipes In | Like this \| |

### C. Import data and data wrangling

In [2]:
import requests
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np

In [8]:
neighborhood_profile_url = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv'
neighborhood_profile_df = pd.read_csv(neighborhood_profile_url)
jobs_df = neighborhood_profile_df.loc[1932:1954]
transport_df = neighborhood_profile_df.loc[1965:1971]
language_df = neighborhood_profile_df.loc[393:670]

In [None]:
url = 'https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Neighbourhood_MCI/FeatureServer/0/query?where=1%3D1&outFields=Neighbourhood,Hood_ID,Population,Assault_AVG,AutoTheft_AVG,Homicide_AVG,TheftOver_AVG,BreakandEnter_AVG,Robbery_AVG&outSR=4326&f=json'
results = requests.get(url).json()
crime_data = results['features']
dataframe = json_normalize(crime_data)
df_temp = pd.DataFrame(columns=['ID','Neighborhood'])
df_temp['ID']=dataframe['attributes.Hood_ID']
df_temp['Neighborhood']=dataframe['attributes.Neighbourhood']
df_temp.sort_values(by='ID',inplace=True)
df_temp.reset_index(drop=True,inplace=True)
df_temp['Borough']=0
df_temp['Borough'][0:20]='Etobicoke'
df_temp['Borough'][20:53]='North York'
df_temp['Borough'][53:61]='East York'
df_temp['Borough'][61:105] = 'Central Toronto'
df_temp['Borough'][105:115] = 'York'
df_temp['Borough'][115:140] = 'Scarborough'

In [None]:

df_temp['ID']=np.arange(1,141,1)
df_temp

In [None]:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
import time

def get_neigh_data(borough):
    url='https://www.realosophy.com/{}-former-toronto/neighbourhood-map'.format(borough)
    driver.get(url)
    time.sleep(5)
    html= driver.page_source

    soup = BeautifulSoup(html,'lxml')
    all_divs = soup.find('div',{'class':'row mt-4'})
    neighborhoods_data = all_divs.find_all('a')

    names = []
    websites = []
    for neighborhood in neighborhoods_data: 
        names.append(neighborhood.text)
        websites.append(neighborhood['href'])
        
    pd.DataFrame({'Names':names,'Websites':websites}).to_csv('{}.csv'.format(borough))
    print('{} done!'.format(borough))

firefoxOptions = Options()
firefoxOptions.add_argument('-headless')
driver = webdriver.Firefox(options=firefoxOptions)

boroughs=['etobicoke','north-york','east-york','central-toronto','york','scarborough']

for borough in boroughs:
    get_neigh_data(borough)

driver.quit()

In [None]:
eto_df=pd.read_csv('etobicoke.csv')
north_df=pd.read_csv('north-york.csv')
east_df=pd.read_csv('east-york.csv')
central_df=pd.read_csv('central-toronto.csv')
york_df=pd.read_csv('york.csv')
scar_df=pd.read_csv('scarborough.csv')

In [None]:
eto_df.head()

In [None]:
neigh_id_website_df = pd.concat([eto_df,north_df,east_df,central_df,york_df,scar_df])

In [None]:
neigh_id_website_df.drop(['Unnamed: 0','Names'],axis=1,inplace=True)
neigh_id_website_df.sort_values('ID',inplace=True)
neigh_id_website_df.reset_index(drop=True,inplace=True)

In [None]:
neigh_id_website_df.to_csv('neigh_id_website.csv')

In [None]:
neigh_id_website_df.head()

In [None]:
neigh_id_website_df['Avg House Price']=0
neigh_id_website_df['Avg Income']=0
neigh_id_website_df.head()

In [None]:
def get_number_only(variable):
    clear_space = variable.text.strip()
    clear_dollarsign = clear_space.replace('$','')
    clear_character = clear_dollarsign.replace(',','')
    if 'M' in clear_character:
        string = clear_character.replace('M','')
        number = float(string)
        result = int(number*1000000)
        return result
    else:
        string = clear_character.replace('K','')
        number = int(string)
        result = number*1000
        return result

In [None]:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
import time

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

firefoxOptions = Options()
firefoxOptions.add_argument('-headless')
driver = webdriver.Firefox(options=firefoxOptions)

for i in range(len(neigh_id_website_df['Websites'])):
    url='https://www.realosophy.com{}'.format(neigh_id_website_df['Websites'][i])
    driver.get(url)
    time.sleep(5)
    html= driver.page_source

    soup = BeautifulSoup(html,'lxml')
    avg_house_price_data = soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})        
    avg_income_data = soup.find('p',{'class': 'h3 font-sans-caption-bold mb-0 text-center text-sm-left ng-binding ng-scope'})

    while avg_house_price_data == None or avg_income_data == None:
        url='https://www.realosophy.com{}'.format(neigh_id_website_df['Websites'][i])
        driver.get(url)
        time.sleep(5)
        html= driver.page_source

        soup = BeautifulSoup(html,'lxml')
        avg_house_price_data = soup.find('div',{'class':'key-stats__avg-sale-price ng-binding ng-scope'})        
        avg_income_data = soup.find('p',{'class': 'h3 font-sans-caption-bold mb-0 text-center text-sm-left ng-binding ng-scope'})
    
    neigh_id_website_df['Avg House Price'][i] = get_number_only(avg_house_price_data)
    neigh_id_website_df['Avg Income'][i] = get_number_only(avg_income_data)
    
    print(neigh_id_website_df['ID'][i])
    
driver.quit()

In [None]:
neigh_id_website_df.to_csv('neigh_houseprice_raw.csv')

In [None]:
neigh_houseprice_raw = neigh_id_website_df.groupby('ID').mean()

In [None]:
neigh_houseprice_raw['Avg House Price']=neigh_houseprice_raw['Avg House Price'].astype(int)

In [None]:
neigh_houseprice_raw['Avg Income']=neigh_houseprice_raw['Avg Income'].astype(int)

In [None]:
neigh_houseprice_raw.to_csv('neigh_houseprice_grouped.csv')

In [None]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
neigh_houseprice_grouped = pd.read_csv('neigh_houseprice_grouped.csv')
neigh_houseprice_grouped

In [None]:
toronto_neigh_houseprice_df = df_temp.set_index('ID').join(neigh_houseprice_grouped.set_index('ID'))

In [None]:
toronto_neigh_houseprice_df.to_csv('toronto_neigh_houseprice.csv')

## 3. Methodology<a name="methodology"></a>

## 4. Results<a name="results"></a>

## 5. Discussion<a name="discussion"></a>

## 6. Conclusion<a name="conclusion"></a>

### Thank you for reading this notebook! Feel free to read the __[full report]()__ and the __[blogpost]()__ too! 

## Author  
__[Titus Chin Jun Hong](https://www.linkedin.com/in/joseph-s-50398b136/)__  
**10 November 2020**