# Recommendation Where to Open a Restaurant in Toronto

### 1. Introduction / Business Problem

#### One of the most important factors in the long-term sucess of a restaurant is location. Often the top priority of any restaurant along with good service and food is to attract new customers. For this purpose it would greatly help if you can open your restaurant in a bustling neigbourhood or where the type of restaurant you intend to open are common. Here is where data science can help by building models to identify perfect locations for restaurateurs. The model can illustrate which neigborhoods are the right locations to open a specific type of type restaurant and which ones are not. We will use Foursquare API to get geospatial information about different neighborhoods in Toronto, group the neighboirhoods in Toronto in clusters and combine the results to reach to our conclusion. The intended end use of the results of this project is to identify the locations that are near perfect to open a South Asian restaurant and provide this recommendation to a restaurateur.

### 2. Data

#### For the Toronto neighborhood data, the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, has all the information we need to explore and cluster the neighborhoods in Toronto. The data will be scraped from the Wikipedia page, the data will be wrangled, cleaned, and then will be read it into a pandas dataframe so that it is in a structured format to be used for exploratory data analysis. Once the data is in a structured format, we will analyse the dataset to explore and cluster the neighborhoods in the city of Toronto. After scraping the data off the Wikipedia page, the dataset has 289 rows and 3 columns, namely Postcode, Borough and Neighborhoods. The Borough and Neighborhood columns have the names of boroughs and neigborhoods as well as "Not Assigned" values. In order to know which neigborhood is the right location for the restaurant we would like to open, we will use the Foursquare location data. In order to utilize the Foursquare location data, we need to get the latitude and longitude coodinates of each neighborhood in Toronto. The following link has the geographical coordinates of each postal code in Toronto: https://cocl.us/Geospatial_data

### 2.1 Data Wrangling

#### The columns in the dataframe had default column name (i.e. 0, 1, 3) so there name changed to Postcode, Borough and Neighborhood.

![image.png](attachment:image.png)

#### Moreover, only the boroughs that have an assigned borough will be processed in the analysis, otherwise not assigned boroughs will be ignored.

![image.png](attachment:image.png)

#### Additionally, if a cell has a borough but a not assigned neighborhood, then the neighborhood will be the same as the borough.

![image.png](attachment:image.png)

#### Furthermore, neighborhoods that exist in one postal code area will combined into one row with neigborhoods separeated with commas.

![image.png](attachment:image.png)

#### The geospatial data with the coordinates of all the postcode in Toronto will be merged with the previous dataframe 'df3' to form a new dataframe 'New_df_for_maps'.

![image.png](attachment:image.png)

#### Lastly, we would like open a restaurant in Downtown Toronto. Hence, a new dataframe 'New_df_for_maps2' will be created to explore Downtown Toronto only.

![image.png](attachment:image.png)

### 3. Exploratory Data Analysis

#### Map of Toronto with neigborhoods superimposed on top.

![image.png](attachment:image.png)

###### The blue dots in the above map represent the different neighborhoods in Toronto.

#### Let us simplify the above map and visualize only the neighborhoods in Downtown Toronto.

![image.png](attachment:image.png)

###### The blue dots in the above map represent the different neighborhoods in Downtown Toronto.

### 4. Results

#### The table below shows the number of venues in each neighboirhood in Downtown Toronto.

![image.png](attachment:image.png)

#### For curiosity's sake there are 204 unique venues in Downtown Toronto.

![image.png](attachment:image.png)

### 4.1 Analyzing Each Neighborhood in Downtown Toronto

#### The labels for each neighborhood are in type string and must be converted to a digital type such that we can use then in our classification algorithm. One hot encoding does exactly just that. It parses your labels and assigns dummy values to each as well as creates new columns per each label and using 1 or 0 to determine weather that row of table has that feature or not. The code for one hot encoding can be found in the Notebook. The resulting table is something like this:

![image.png](attachment:image.png)

#### To gain a better insight on the nature of each neighborhood, each neigborhood will be grouped and include the top ten most common venues in that neighborhood. We can then attempt to label each neighborhood, so for instance, a neighborhood with grocery stores will be more suitable to open a grocery store while a neighborhood with more restaurants would be a better option to open a restaurant.

![image.png](attachment:image.png)

### 4.2 Cluster Neighborhoods

#### The neigborhoods will be partitioned into clusters to know which area is the best location to open a south asian restaurant. To do this an unsupervised learning method will be used. K-Means clustering was used because we have unlabeled data - data without defined categories or groups. Moreover, K-Mean clustering was used to confirm the type of group a certain area is as well as to identify unknown groups in the dataset. I chose kcluster of 5 as the number of clusers. The table below shows the result.

![image.png](attachment:image.png)

#### Visualize the resulting clusters

![image.png](attachment:image.png)

###### The map above represent the five different clusters in Downtown Toronto.  The colour of the dots are as follows: red is cluster 0, purple is cluster 1, light blue is cluster 2, light green is cluster 3 and orange is cluster 4. 

### Examine each cluster

#### Cluster 0

![image.png](attachment:image.png)

#### Cluster 0 has the most number of neigborhoods and mostly contain coffee shops and cafes.

#### Cluster 1

![image.png](attachment:image.png)

#### Cluster 1 only has one neighborhood and contains mostly park, playgrounds, trails, etc. 

#### Cluster 2

![image.png](attachment:image.png)

#### Cluster 2  has the second most number of neigborhoods. It mostly has venues related to airport in Downtown Toronto.

#### Cluster 3

![image.png](attachment:image.png)

#### Cluster 3  has three neighborhoods. Cluster 3 mostly has restaurants and mahjority of them are South Asian Restaurants.

#### Cluster 4

![image.png](attachment:image.png)

#### Cluster 4  has only one neighborhood and contains Grocery and convenience stores.

### 5. Discussion

#### There are thirty seven neigborhoods and 204 unique venues in Downtown Toronto. K-Means clustering partitioned the Downtown Toronto dataset into five clusters. Upon further analysis, the five cluster are well seprated into coffee shops, parks, airport services, restaurants and grocery stores. From the five different clusters, cluster 3 has the best location to open a South Asian Restaurant. Any of the three neighborhoods in cluster 3 from Chinatown, Grange Park to Kensington Market would be the ideal location for our restaurant. Further analysis such as real estate price, rent, restaurant demand should be taken into consideration prior to making final decision on selecting the location.

### 6. Conclusion

#### The geolocation data from Wikipedia and the geographical coordinates of each postal code in Toronto were used to recommend the ideal location to open a South Asian Restaurant in Downtown Toronto. To identify the best location, Downtown Toronto  was partitioned into five different clusters using K-Means Cluster. The clusters separated the Downtown Area into Coffee Shops, Parks, Airport services, Restaurants and Grocery Stores. The restaurant cluster have three neighborhoods: Chinatown, Grange Park and Kensington Market. All three locations have the same top ten most common venues which mostly contain South Asian Restaurants. Our recommendation is to open the South Asian Restaurant in one of the three locations depending on real estate price/rent and the demand of the niche South Asian restaurant you would like to open. 

### 6.1 Future Work

#### Other unsupervised learning mehods such as partitioning, hierarchical or density-based clustering should be used in comparing their performance in K-Means Clustering. Additionally, different cluster values should be used to determine the optimal number of clusters to reduce error and come up with more accurate recommendation.