# Homework Assignment 1 - (Question 4) - Solutions

## <u>Case Study</u>: Global City Well-Being Analysis


### Motivation:

Understanding the dynamics of urban living is useful for policymakers, businesses, and individuals considering relocation. By analyzing key factors that contribute to the quality of life in cities, we can gain insights into the strengths and weaknesses of different urban environments. 

Clustering cities based on numerical `"well-being metrics"` like such as:
* `Purchase Power`
* `Health Care`
* `Pollution`
* `Quality of Life`
* `Crime Rating`

allows us to identify patterns and group cities with similar characteristics. 

By analyzing any clusters that we find, this can help people wanting to move make informed decisions about where to live, where to invest, and how to improve urban conditions. The results of such clustering can provide a foundation for further analysis and strategic planning in urban development.




### Dataset Background


The observations in the `movehub_data_cleaned.csv` file in the zip file contains data for each of these `5 numerical attributes` for each of these for 185 global cities. This data was extracted from movehub.com.
The full dataset and more information about the full dataset can be found here:
https://www.kaggle.com/datasets/blitzr/movehub-city-rankings?select=movehubqualityoflife.csv



### Research Questions

In this analysis we would like to answer the following questions about the dataset.
1. Is this dataset clusterable?
2. If so, how many clusters are there in the dataset?
3. What `well-being metrics` characterize each of the clusters that we find? *For instance, is there a cluster that has lower `purchase power` than others (downside), but also happens to have lower `pollution` than other clusters (upside)? This might be useful for people wanting to move that don't mind having a lower purchase power, if it means living in a city with less pollution.*

<hr>



### Imports

### 4.1. Reading and Examining the csv
1. Read the `movehub_data_cleaned.csv` into a dataframe. Call it df.
2. Show the first 5 rows of this dataframe.

### 4.2. Dataset for Clustering

Let's create a dataframe of just the 5 numerical `well-being metrics` which we intend to cluster, dropping the `City` and `Country` variables.

1. First create a copy of the `df` dataframe you might call it `X`.
2. And then drop the categorical variables. 

Hint: `X=X.drop(['City','Country'], axis=1)`

### 4.3. Describing each numerical variable in the dataframe. 

Create a histogram for each of the well-being metrics in the dataset.

### 4.4. Describing the relationship between each pair numerical variable in the dataframe. 

For each pair of well-being metrics in this dataframe, plot a scatterplot.

### 4.5. Create an elbow plot for this dataframe X. 

Create an k-means elbow plot for this dataframe X.
* Your elbow plot should consider clusterings with k=1, k=2, ..., k=`18` clusters
* For each k, your elbow plot should find the average inertia of `4` trial clusterings

### 4.6. Does this elbow plot suggest that the dataset is clusterable by k-means? Explain.


### 4.7. How many clusters does this elbow plot suggest we should use in k-means? Explain.
*Somewhat Subjective: As long as your explain yourself and the logic is correct then you will get full credit.*

### 4.8. Cluster the data

Using the cluster number that you selected in question above, clusters the dataframe using k-means.

**<u>Note</u>: In sklearn, you can set a random seed for non-deterministic functions by using the random_state parameter. Within your KMeans() function in this problem, you should set an additional parameter with `random_state=101`.**

Save the cluster labels of your final clustering in a new column of the **df** dataframe called 'predicted_cluster'.



### 4.9 Create side-by-side boxplots visualizations. 

For each of the `well-being metrics` create a side-by-side boxplots visualization. Within each visualization, there should be a boxplot that corresponds to each `predicted_cluster` label.

### 4.10. Describing the clusters 

Finally, let's describe what distinguishes these clusters. For each cluster answer the following questions.
* Did this cluster have the `highest` scores for a particular `well-being metric(s)` (compared to the other clusters)? If so, which metrics(s)?
* Did this cluster have the `lowest` scores for a particular `well-being metric(s)` (compared to the other clusters)? If so, which metrics(s)?

**Note: Unfortunately, even if you fix a random state in the KMeans() function, the LABELS assigned to each of your $k$ CLUSTERS may change, while your actual CLUSTERS themselves will not change.**

For instance, the following two tables below represent two separate ways to *represent* the same clustering of a set of observations (1,2,3,4,5).
* Because observations 1 and 2 were given the same label, they are together in one cluster.
* Because observations 4 and 5 were given the same label, they are together in another cluster.
* Because observations 3 is the only one with this distinct label it is by itself in it's own cluster.


Because the cluster labels (but not the clusters themselves) change in the KMeans() function, it's best to not actually use the cluster label numbers "0,1,2,..." in your description. Just say "one cluster of counties prefers this...", "while another group of counties prefers this...".

<p>&nbsp;</p>
<table style="border: none;border-collapse: collapse;width:267pt;">
    <tbody>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:.5pt solid windowtext;height:28.5pt;width:58pt;">Observation</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-left:none;width:45pt;">Cluster Labels</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;width:51pt;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:.5pt solid windowtext;width:62pt;">Observation</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-left:none;width:51pt;">Cluster Labels</td>
        </tr>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;height:14.25pt;border-top:none;">1</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">0</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;">1</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">Nick</td>
        </tr>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;height:14.25pt;border-top:none;">2</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">0</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;">2</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">Nick</td>
        </tr>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;height:14.25pt;border-top:none;">3</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">1</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;">3</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">Joe</td>
        </tr>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;height:14.25pt;border-top:none;">4</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">2</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;">4</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">Kevin</td>
        </tr>
        <tr>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;height:14.25pt;border-top:none;">5</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">2</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:general;vertical-align:bottom;border:none;"><br></td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;">5</td>
            <td style="color:black;font-size:15px;font-weight:400;font-style:normal;text-decoration:none;font-family:Calibri, sans-serif;text-align:right;vertical-align:bottom;border:.5pt solid windowtext;border-top:none;border-left:none;">Kevin</td>
        </tr>
    </tbody>
</table>