## Comparing Neighborhoods from Coast to Coast

#### Yiwei Wang

  Please use [notebook viewer](https://nbviewer.jupyter.org/github/wangyw80/My-Projects/blob/master/Comparing%20Neighborhoods%20From%20Coast%20to%20Coast_Report.ipynb) for better view.

### Summary  
In this practice project, I use machine learning technique to compare similarities among three US cities, New York, Chicago, and Seattle. The work is done using publicly available information.

### Table of Content  
<a href="#item1">Introduction</a>  
<a href="#item2">Data</a>  
<a href="#item3">Methodology</a>  
<a href="#item4">Results</a>  
<a href="#item5">Discussion</a>  
<a href="#item6">Conclusion</a>  

### Introduction

Cities are large human settlements. Each city has its own characteristics and they are usually complex and not easy to measure. Also, within each city, different neighborhoods provide similar or different access to resources, entertainments, facilities, etc.. Studying the subtle differences or similarities among neighborhoods can provide valuable insights for both individuals and businesses. The following are some examples of how this type of studies can help us solve problems that looks complicated at first glance.   

__Help in individual relocation decisions__  
When a person decides to move to another city or a different part of the city but want to keep certain life style, it would be good to know whether the place he's going is similar to where he lives. For instance, if he is a fan of Chinese food, and the place he's moving to also has many Chinese restaurants, then it might be a good choice for his tastes. However, we are usually looking at more than one factor when we consider living environment and our preferences are not always lexicographic. Does the new place also have a park within walking distance? Is there a bank nearby? Is the subway station close enough? And if none of the candidates look perfect, are some better than the rest? Such problems can be solved using machine learning and find neighborhoods that are relatively similar.   

__Solve business problems__  
Aside from personal interests, these analyses can ease or solve some of the business problems as well. Small businesses, such as restaurants or coffee shops, can benefit when they want to start their first shop or find a new location to expand. Instead of going through rigorous assessment of opening a new shop, which is usually costly for small businesses without a lot of data and resources, the location distribution of existing venues could help in the initial screening. Take pizza shop as an example, a good place to start a new shop could be somewhere that people go for food but does not have a lot of other pizza shops. So in an initial screening, an efficient way would be picking out similar neighborhoods that have many food providers, which suggests high meal demand in the area. Then further filter out neighborhoods with relatively less direct competition, in this case, other pizza shops. Of course, if the pizza shop is worried that other types of food services are too hard to beat, they can use the same method to locate neighborhoods without enough food suppliers.   

__Scope of the work__  
In this project, I use machine learning method to compare neighborhoods in three famous US cities, New York, Chicago, and Seattle. New York locates on the east coast of US. It is the largest city of the country in terms of population and considered as one of the economic centers of the world. According to 2018 Census Bureau estimates, New York has 8.4 million residents living in five boroughs. Chicago is the most populous city in the Midwest. Located on the west shore of Lake Michigan, the city has a population of 2.7 million. Seattle is a seaport city on the west coast of the US. As a fast growing city, Seattle has 0.75 million residents and home to many fast growing firms.

There are tons of aspects to look at when comparing these cities. To simplify things, I only focus on access to venues. K-means clustering is employed to group similar neighborhoods across cities. I then explore the most popular venues in each group in the attempt to capture the uniqueness of the group. While I used a general approach in this practice project, the methods can be further refined to answer specific questions, such as finding candidate locations for a new pizza shop.

### Data

In order to gather the necessary data to conduct the analysis described in the previous section, I need information of popular venues in each neighborhood. To achieve the desired data structure, I divide the data gathering process into three steps and I pull data from three different sources.  

__Gathering the data__  
The first part of the data set is the names of each neighborhood in the cities of interest. I use Wilipedia as the source and gather list of neighborhood names from corresponding web pages listed below.  

New York, https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City;  
Chicago, https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago;  
Seattle, https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Seattle. 

Tables containing neighborhood names are scraped into data frames using Python libraries. The second part of the data is the geographic location of these neighborhoods. I use the Geocoding application programming interface (API) service provided by Google to gather the latitude and longitude of the presumed center of each neighborhood using their names, cities, and states. In the final step, I use Foursquare.com's API to gather venue information of each neighborhood. Popular venues within 500 meter radius of the neighborhood center are gathered. Number of venues can vary a lot in neighborhoods, some neighborhoods may only have a handful of venues returned from Foursquare.com, some may have have hundreds. I limit the maximum number of venues gathered in each neighborhood to 100. Name, category, and geographic location of each venue are stored in the data frame.

__Data description__  
There are a total of 127 neighborhoods in Seattle, 246 neighborhoods in Chicago, and 327 neighborhoods in New York. Foursquare.com's API fails to return any venue information for ten of the neighborhoods. As a result, these neighborhoods are dropped from the data set. A total of 690 neighborhoods enter the clustering and a total of 496 unique venue categories are recorded among these neighborhoods.

### Methodology

This project is conducted in Jupyter notebook using Python 3. In order to cluster neighborhoods with similar access to certain venue categories, I use k-means clustering method to group neighborhoods. Scikit-learn library for Python is used in creating clusters. 

__How does the algorithm work?__  
A preset number of centers are selected and assigned random initial values. In each iteration, the distance of each data point to each center are calculated. Each data is assigned to the closest center. All data points with the same center are labeled within the same cluster. Then the mean of each cluster becomes the new center in the next iteration. This process continues until the result converges.    
<img src="https://github.com/wangyw80/Test-Projects/blob/master/formula.png?raw=true" width="160" >  
In the formula above, variance in cluster m is calculated as the sum of the variance of all n members. For each member, the variance is the sum square of the difference between attribute j of the observation and the attribute j of center m. The goal is to find a set of centers that minimize the within-cluster variance. However, the locations of the initial centers matter. This approach does not always guarantee a global solution. Therefore, in practice, we need to use different starting points to check if the result converges to a global solution.

__Calculating the distance__  
Given the form of the data, all venue categories are name strings. Therefore, I need to perform transformation on the data before the k-means clustering method can be applied. All venue categories are turned into dummy variables and equals one if a venue of that category exist. The total number of venues in each category are then summed for each neighborhood and normalized to get the weight of each category. For example, if five venues are found in a neighborhood, three are coffee shops, one restaurant, and one book store, then the corresponding values would be 3 for coffee shop, 1 for restaurant, and 1 for book store. The weight would be 60% for coffee shop, 20% for restaurant, and 20% for book store. The metrics of the weight of the venue categories are used in computing the distance to the cluster centers. To simplify the calculation, I only use the weight of the top 15 venue categories of each neighborhood in clustering.  

__Finding the proper number of clusters (k)__  
In k-means clustering, one of the key parameter is number of centers. Given the nature of this study, it is hard to tell how many clusters I should set. Unlike analysis that can have limited categories of outcomes, the definition of "similar neighborhoods" is vague and cannot be quantified objectively. Therefore, I compare the share of each cluster in each city with different number of starting centers (3 to 8). The goal is to have enough clusters that can break down relatively similar neighborhoods into clusters but not creating too many clusters with only outliers and subgroups that are not meaningful.  

One of the shortfalls of the k-means algorithm is that the starting location matters and a global solution is not guaranteed. So for each number of groups, I run my codes multiple times to find the most common outcomes.    

#### Table 1. Share of clusters with different number of clusters.
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_d368de58_946c_11e9_a0ba_a7251abcc582" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_d368de58_946c_11e9_a0ba_a7251abcc582level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row0_col0" class="data row0 col0" >86.07%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row0_col1" class="data row0 col1" >0.00%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row0_col2" class="data row0 col2" >13.93%</td>
            </tr>
            <tr>
                        <th id="T_d368de58_946c_11e9_a0ba_a7251abcc582level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row1_col0" class="data row1 col0" >95.31%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row1_col1" class="data row1 col1" >1.25%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row1_col2" class="data row1 col2" >3.44%</td>
            </tr>
            <tr>
                        <th id="T_d368de58_946c_11e9_a0ba_a7251abcc582level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row2_col0" class="data row2 col0" >84.13%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row2_col1" class="data row2 col1" >2.38%</td>
                        <td id="T_d368de58_946c_11e9_a0ba_a7251abcc582row2_col2" class="data row2 col2" >13.49%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >15.16%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >81.97%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >0.00%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >2.87%</td>
            </tr>
            <tr>
                        <th id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >4.69%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >93.75%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >1.25%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >0.31%</td>
            </tr>
            <tr>
                        <th id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >11.90%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >81.75%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >1.59%</td>
                        <td id="T_2e4c46f0_9479_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >4.76%</td>
            </tr>
    </tbody></table>
</div>

<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_465f1d62_9479_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>        <th class="col_heading level0 col4" >Type V</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_465f1d62_9479_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >0.00%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >63.93%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >13.52%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >20.90%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >1.64%</td>
            </tr>
            <tr>
                        <th id="T_465f1d62_9479_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >1.25%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >53.44%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >3.12%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >41.88%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row1_col4" class="data row1 col4" >0.31%</td>
            </tr>
            <tr>
                        <th id="T_465f1d62_9479_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >1.59%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >76.19%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >12.70%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >7.94%</td>
                        <td id="T_465f1d62_9479_11e9_81f7_17060a3f0767row2_col4" class="data row2 col4" >1.59%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_831498a4_9479_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>        <th class="col_heading level0 col4" >Type V</th>        <th class="col_heading level0 col5" >Type VI</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >22.54%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >2.87%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >11.07%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >13.93%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >0.00%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >49.59%</td>
            </tr>
            <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >51.88%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >0.31%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >1.56%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >4.06%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col4" class="data row1 col4" >0.31%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col5" class="data row1 col5" >41.88%</td>
            </tr>
            <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >5.56%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >4.76%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >1.59%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >11.90%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col4" class="data row2 col4" >0.79%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col5" class="data row2 col5" >75.40%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>        <th class="col_heading level0 col4" >Type V</th>        <th class="col_heading level0 col5" >Type VI</th>        <th class="col_heading level0 col6" >Type VII</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >2.87%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >13.93%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >48.36%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >21.31%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >0.00%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >13.52%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >0.00%</td>
            </tr>
            <tr>
                        <th id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >0.31%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >3.75%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >45.94%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >38.44%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col4" class="data row1 col4" >1.25%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col5" class="data row1 col5" >9.06%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row1_col6" class="data row1 col6" >1.25%</td>
            </tr>
            <tr>
                        <th id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >4.76%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >11.90%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >70.63%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >10.32%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col4" class="data row2 col4" >1.59%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col5" class="data row2 col5" >0.79%</td>
                        <td id="T_8c1dfdba_9478_11e9_81f7_17060a3f0767row2_col6" class="data row2 col6" >0.00%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_bd407828_9478_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>        <th class="col_heading level0 col4" >Type V</th>        <th class="col_heading level0 col5" >Type VI</th>        <th class="col_heading level0 col6" >Type VII</th>        <th class="col_heading level0 col7" >Type VIII</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_bd407828_9478_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >1.23%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >8.20%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >12.70%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >14.34%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >59.84%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >2.87%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >0.82%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >0.00%</td>
            </tr>
            <tr>
                        <th id="T_bd407828_9478_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >2.19%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >0.31%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >40.00%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >3.75%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col4" class="data row1 col4" >50.31%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col5" class="data row1 col5" >0.31%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col6" class="data row1 col6" >2.81%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row1_col7" class="data row1 col7" >0.31%</td>
            </tr>
            <tr>
                        <th id="T_bd407828_9478_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >2.38%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >1.59%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >4.76%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >11.90%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col4" class="data row2 col4" >72.22%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col5" class="data row2 col5" >4.76%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col6" class="data row2 col6" >2.38%</td>
                        <td id="T_bd407828_9478_11e9_81f7_17060a3f0767row2_col7" class="data row2 col7" >0.00%</td>
            </tr>
    </tbody></table>
</div>
  
As shown in the table above, when k is less than five, over 80% of the neighborhoods in all three cities are grouped in a same cluster, which is not very helpful. When k is greater than 6, there are hardly further breakdown in Chicago and New York groups. I select the number of groups as 6 to balance concerns of effectiveness and efficiency. 

### Results

The direct output of the clustering is the group number assigned to each neighborhood. The most effective way of showing the results is plotting each neighborhood on a map. Below I show the maps of all three cities. Each marker represents a neighborhood, different clusters are marked with different colors.  
__Figure 1. Neighborhood clustering in New York__
<img src="https://github.com/wangyw80/Test-Projects/blob/master/NYC.jpg?raw=true">  
<img src="https://github.com/wangyw80/Test-Projects/blob/master/legend.png?raw=true">   

__Figure 2. Neighborhood clustering in Chicago__
<img src="https://github.com/wangyw80/Test-Projects/blob/master/Chicago.jpg?raw=true">  
<img src="https://github.com/wangyw80/Test-Projects/blob/master/legend.png?raw=true">  

__Figure 3. Neighborhood clustering in Seattle__
<img src="https://github.com/wangyw80/Test-Projects/blob/master/Seattle.jpg?raw=true">  
<img src="https://github.com/wangyw80/Test-Projects/blob/master/legend.png?raw=true">  

Not surprising, the core downtown areas are clustered in the same group across all three cities. Suggesting high similarity in these crowded, highly developed activity centers. The graphs tell us exactly how different neighborhoods spread around city centers. However, we know which neighborhoods are similar to each other, but we still can't tell what's the difference across these neighborhood groups. I further explore the "nature" of these clusters in the next section.

### Discussion

The clustering is done and the similar neighborhoods are plotted in the maps. Type VI neighborhood appears to be dominant in downtown areas of all three cities. Type I neighborhoods are quite common outside of Manhattan and outside of downtown Chicago. Chicago has quite some type III neighborhoods, which is not common in other two cities. Type IV are seen in both Chicago and Seattle, but rare in New York. We now know which neighborhoods are similar but we still want to know similar how. To explore this question and get more insights, I listed out the top 10 most commonly seen venue types in each cluster.  

#### Table 2. Top 10 venue types in clusters.  
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_77fe52b0_9489_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Pizza Place</th>        <th class="col_heading level0 col1" >Fast Food Restaurant</th>        <th class="col_heading level0 col2" >Pharmacy</th>        <th class="col_heading level0 col3" >Sandwich Place</th>        <th class="col_heading level0 col4" >Chinese Restaurant</th>        <th class="col_heading level0 col5" >Donut Shop</th>        <th class="col_heading level0 col6" >Deli / Bodega</th>        <th class="col_heading level0 col7" >Factory</th>        <th class="col_heading level0 col8" >Fabric Shop</th>        <th class="col_heading level0 col9" >Bank</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_77fe52b0_9489_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type I</th>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >4.71%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >3.04%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >2.75%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >2.63%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >2.43%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >2.40%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >2.19%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >2.05%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >2.02%</td>
                        <td id="T_77fe52b0_9489_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >2.02%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_e7b4a356_948a_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Farm</th>        <th class="col_heading level0 col1" >Fabric Shop</th>        <th class="col_heading level0 col2" >Park</th>        <th class="col_heading level0 col3" >Falafel Restaurant</th>        <th class="col_heading level0 col4" >Fair</th>        <th class="col_heading level0 col5" >Factory</th>        <th class="col_heading level0 col6" >Event Service</th>        <th class="col_heading level0 col7" >Event Space</th>        <th class="col_heading level0 col8" >Exhibit</th>        <th class="col_heading level0 col9" >Eye Doctor</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_e7b4a356_948a_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type II</th>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >6.67%</td>
                        <td id="T_e7b4a356_948a_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >6.67%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_149eba46_948b_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Mexican Restaurant</th>        <th class="col_heading level0 col1" >Pizza Place</th>        <th class="col_heading level0 col2" >Fast Food Restaurant</th>        <th class="col_heading level0 col3" >Fabric Shop</th>        <th class="col_heading level0 col4" >Fair</th>        <th class="col_heading level0 col5" >Grocery Store</th>        <th class="col_heading level0 col6" >Eye Doctor</th>        <th class="col_heading level0 col7" >Factory</th>        <th class="col_heading level0 col8" >Park</th>        <th class="col_heading level0 col9" >Exhibit</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_149eba46_948b_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type III</th>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >5.88%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >4.51%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >3.33%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >3.33%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >2.75%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >2.75%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >2.55%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >2.55%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >2.35%</td>
                        <td id="T_149eba46_948b_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >2.35%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_354a8aae_948b_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Park</th>        <th class="col_heading level0 col1" >Eye Doctor</th>        <th class="col_heading level0 col2" >Exhibit</th>        <th class="col_heading level0 col3" >Factory</th>        <th class="col_heading level0 col4" >Fair</th>        <th class="col_heading level0 col5" >Fabric Shop</th>        <th class="col_heading level0 col6" >Event Space</th>        <th class="col_heading level0 col7" >Farm</th>        <th class="col_heading level0 col8" >Falafel Restaurant</th>        <th class="col_heading level0 col9" >Fast Food Restaurant</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_354a8aae_948b_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type IV</th>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >6.67%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >6.13%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >6.02%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >6.02%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >5.91%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >5.70%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >5.59%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >5.05%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >4.95%</td>
                        <td id="T_354a8aae_948b_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >3.44%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_4f09acf4_948b_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Fair</th>        <th class="col_heading level0 col1" >Factory</th>        <th class="col_heading level0 col2" >Scenic Lookout</th>        <th class="col_heading level0 col3" >Farmers Market</th>        <th class="col_heading level0 col4" >Farm</th>        <th class="col_heading level0 col5" >Falafel Restaurant</th>        <th class="col_heading level0 col6" >Ethiopian Restaurant</th>        <th class="col_heading level0 col7" >Zoo Exhibit</th>        <th class="col_heading level0 col8" >Fabric Shop</th>        <th class="col_heading level0 col9" >Eye Doctor</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_4f09acf4_948b_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type V</th>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >6.67%</td>
                        <td id="T_4f09acf4_948b_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >6.67%</td>
            </tr>
    </tbody></table>
</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_73555540_948b_11e9_81f7_17060a3f0767" align="left" style="table-layout: fixed; width: 100%"><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Coffee Shop</th>        <th class="col_heading level0 col1" >Bar</th>        <th class="col_heading level0 col2" >Pizza Place</th>        <th class="col_heading level0 col3" >Bakery</th>        <th class="col_heading level0 col4" >Italian Restaurant</th>        <th class="col_heading level0 col5" >Mexican Restaurant</th>        <th class="col_heading level0 col6" >Sandwich Place</th>        <th class="col_heading level0 col7" >American Restaurant</th>        <th class="col_heading level0 col8" >Park</th>        <th class="col_heading level0 col9" >Gym</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_73555540_948b_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Type VI</th>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >4.10%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >2.78%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >2.69%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >2.04%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >2.04%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >1.92%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col6" class="data row0 col6" >1.85%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col7" class="data row0 col7" >1.77%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col8" class="data row0 col8" >1.77%</td>
                        <td id="T_73555540_948b_11e9_81f7_17060a3f0767row0_col9" class="data row0 col9" >1.56%</td>
            </tr>
    </tbody></table>
</div>

------------------------------------------------------------------------------------------------------------------------------------------

As shown in table two, the dominant venue types in a typical downtown neighborhood are food services. In type VI neighborhoods, seven of the top ten venue types are food providers, along with bar, gym, and park. This is not surprising for a downtown area, as so many people have to work and social there, which creates demand for these services. Type I, which is the dominant type in New York outside of Manhattan, is also populated with food providers. However, in type I neighborhoods, the dominant food providers are fast food, there are also pharmacies, factories and banks. Just by looking at these venue categories, one can already conclude that it's going to be less costly living in type I neighborhoods than living in type VI ones. Type III neighborhoods, which is only common in Chicago, has a mix of Mexican food, fast food, fabric shop, grocery stores, etc.. Type IV, uncommon to New York, but fairly popular in other two cities, concentrates on park, eye doctor, exhibits, and even farm. These dominant venue types in each group provide us with further context of how a typical neighborhood in each group would look like. We could also have a better idea of what to expect in an unfamiliar neighborhood.  

__Comparing cities__  
The idea of calculating the distance between the weight of different categories can also be applied to comparing cities. Suppose we want to estimate which of the three cities are more similar, we can calculate the variance of the weight among cluster types. Take the following cluster type statistics as an example:

<div class="output_html rendered_html output_subarea output_execute_result">
<style  type="text/css" >
</style><table id="T_831498a4_9479_11e9_81f7_17060a3f0767" ><thead>    <tr>        <th class="blank level0" ></th>        <th class="col_heading level0 col0" >Type I</th>        <th class="col_heading level0 col1" >Type II</th>        <th class="col_heading level0 col2" >Type III</th>        <th class="col_heading level0 col3" >Type IV</th>        <th class="col_heading level0 col4" >Type V</th>        <th class="col_heading level0 col5" >Type VI</th>    </tr></thead><tbody>
                <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row0" class="row_heading level0 row0" >Chicago</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col0" class="data row0 col0" >22.54%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col1" class="data row0 col1" >2.87%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col2" class="data row0 col2" >11.07%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col3" class="data row0 col3" >13.93%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col4" class="data row0 col4" >0.00%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row0_col5" class="data row0 col5" >49.59%</td>
            </tr>
            <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row1" class="row_heading level0 row1" >New York</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col0" class="data row1 col0" >51.88%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col1" class="data row1 col1" >0.31%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col2" class="data row1 col2" >1.56%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col3" class="data row1 col3" >4.06%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col4" class="data row1 col4" >0.31%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row1_col5" class="data row1 col5" >41.88%</td>
            </tr>
            <tr>
                        <th id="T_831498a4_9479_11e9_81f7_17060a3f0767level0_row2" class="row_heading level0 row2" >Seattle</th>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col0" class="data row2 col0" >5.56%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col1" class="data row2 col1" >4.76%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col2" class="data row2 col2" >1.59%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col3" class="data row2 col3" >11.90%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col4" class="data row2 col4" >0.79%</td>
                        <td id="T_831498a4_9479_11e9_81f7_17060a3f0767row2_col5" class="data row2 col5" >75.40%</td>
            </tr>
    </tbody></table>
</div>  

With the total number of clusters equals six, the squared distance between New York and Chicago is:  
(51.88-22.54)^2+(0.31-2.87)^2+(1.56-11.07)^2+(4.06-13.93)^2+(0.31-0)^2+(41.88-49.59)^2 = 1114.7864  
Like wise, the squared distance between New York and Seattle, Chicago and Seattle are 3350.6322 and 1052.664 respectively. So we can conclude that among the three cities, Chicago and Seattle are more similar.

### Conclusion

In this project, I use k-means clustering to group a total of 690 neighborhoods in New York, Chicago, and Seattle. Neighborhoods are clustered into 6 groups based on their access to different venue categories. Chicago and Seattle appear to have more in common in terms of their composition of various neighborhood types. The methods applied in this project can be further refined and extended to solve business problems as well as help individuals make relocation decisions.
  
    
      

  
  
  
### Reference

Population - Census Bureau, https://www.census.gov/topics/population.html  
Neighborhoods in New York City - Wikipedia, https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City  
List of neighborhoods in Chicago - Wikipedia, https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago  
List of neighborhoods in Seattle - Wikipedia, https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Seattle