# Analysis | Quality of Life per US State

#### The goal of this analysis is to find some insights regarding the quality of life per state in the United States of America
Dataset Resource: https://www.kaggle.com/datasets/msjahid/statewise-quality-of-life-index-2024

In [60]:
# Include this line to make plots interactive
%matplotlib notebook

In [61]:
# Import dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [62]:
# Read in your csv:
df = pd.read_csv("Resources/qol_states_2024.csv")  

# Show dataframe first 5 row sample
df.head()


Unnamed: 0,state,QualityOfLifeTotalScore,QualityOfLifeQualityOfLife,QualityOfLifeAffordability,QualityOfLifeEconomy,QualityOfLifeEducationAndHealth,QualityOfLifeSafety
0,Alabama,45.61,40,1,40,48,32
1,Alaska,40.93,50,42,22,30,45
2,Arizona,48.31,21,25,14,39,40
3,Arkansas,42.42,46,4,34,45,47
4,California,52.03,2,50,15,24,27


In [63]:
# Show the data types and name of columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   state                            50 non-null     object 
 1   QualityOfLifeTotalScore          50 non-null     float64
 2   QualityOfLifeQualityOfLife       50 non-null     int64  
 3   QualityOfLifeAffordability       50 non-null     int64  
 4   QualityOfLifeEconomy             50 non-null     int64  
 5   QualityOfLifeEducationAndHealth  50 non-null     int64  
 6   QualityOfLifeSafety              50 non-null     int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 2.9+ KB


In [64]:
# Show name of columns as a list
df.columns.to_list()

['state',
 'QualityOfLifeTotalScore',
 'QualityOfLifeQualityOfLife',
 'QualityOfLifeAffordability',
 'QualityOfLifeEconomy',
 'QualityOfLifeEducationAndHealth',
 'QualityOfLifeSafety']

In [65]:
# Show statistical information
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
QualityOfLifeTotalScore,50.0,51.8302,5.728049,39.77,48.43,52.16,56.735,62.65
QualityOfLifeQualityOfLife,50.0,25.5,14.57738,1.0,13.25,25.5,37.75,50.0
QualityOfLifeAffordability,50.0,25.5,14.57738,1.0,13.25,25.5,37.75,50.0
QualityOfLifeEconomy,50.0,25.5,14.57738,1.0,13.25,25.5,37.75,50.0
QualityOfLifeEducationAndHealth,50.0,25.5,14.57738,1.0,13.25,25.5,37.75,50.0
QualityOfLifeSafety,50.0,25.5,14.57738,1.0,13.25,25.5,37.75,50.0


In [66]:
# Find null values in each column
df.isna().sum()

state                              0
QualityOfLifeTotalScore            0
QualityOfLifeQualityOfLife         0
QualityOfLifeAffordability         0
QualityOfLifeEconomy               0
QualityOfLifeEducationAndHealth    0
QualityOfLifeSafety                0
dtype: int64

In [67]:
# See the shape of your dataset (this is to find out if data is balanced and for correlations)
df.shape


(50, 7)

In [68]:
# Get unique values of a particular column
df['state'].nunique()


50

In [69]:
# Get unique values of a particular column
df['state'].unique()

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia',
       'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype=object)

In [70]:
# Get unique values of a particular column
df['QualityOfLifeTotalScore'].unique()

array([45.61, 40.93, 48.31, 42.42, 52.03, 53.37, 52.21, 52.33, 58.07,
       50.76, 47.46, 58.73, 55.47, 49.4 , 55.37, 52.47, 46.39, 41.74,
       57.55, 54.4 , 62.65, 50.87, 57.99, 39.77, 48.79, 53.75, 53.08,
       47.58, 58.25, 62.01, 42.51, 60.64, 51.51, 54.17, 49.32, 43.82,
       48.97, 56.42, 51.65, 43.8 , 53.13, 48.85, 49.  , 56.84, 57.52,
       52.11, 46.84, 57.92, 58.  ])

#### Column values for the following were ranks from 1 to 50, which is the number of states in this dataset: 
##### QualityOfLifeQualityOfLife, 
##### QualityOfLifeAffordability, 
##### QualityOfLifeEconomy, 
##### QualityOfLifeEducationAndHealth, 
##### and QualityOfLifeSafety
This was confirmed by using the df['column_name'].unique() to verify there weren't any duplicate values in the ranking.

### Top 10 States with the best and worst Quality of Life

In [1]:
# Work on a function that does the same as the code below, to avoid DRY

In [95]:
# Top 10 Best States for overall 'Quality of life' (column QualityOfLifeQualityOfLife)
top_10_qol = df.sort_values(by='QualityOfLifeQualityOfLife').head(10)

# Add a column with ranking values 1 to 10
top_10_qol['Rank'] = range(1,11)

# Rename the 'state' column to 'State'
top_10_qol.rename(columns={'state': 'State'}, inplace=True)

# Rename the 'QualityOfLifeTotalScore' column to 'Total Score'
top_10_qol.rename(columns={'QualityOfLifeTotalScore': 'Total Score'}, inplace=True)

# Sort the Dataframe by the newly created 'Rank' column and then by the 'State' column
top_10_qol = top_10_qol.sort_values(by=['Rank','State'])

#Set the ranking as the index 
top_10_qol = top_10_qol.set_index('Rank')

# Display the Dataframe
print("Top 10 states for Best Quality of Life: ")
top_10_qol[['State', 'Total Score']]

Top 10 states for Best Quality of Life: 


Unnamed: 0_level_0,State,Total Score
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,New York,60.64
2,California,52.03
3,Pennsylvania,56.42
4,Illinois,55.47
5,Florida,58.07
6,Massachusetts,62.65
7,New Jersey,62.01
8,Minnesota,57.99
9,Washington,52.11
10,Wisconsin,57.92


In [2]:
# Work on a function that does the same as the code below, to avoid DRY

In [96]:
# Top 10 Worst States for overall 'Quality of life'
worst_10_qol2 = df.sort_values(by='QualityOfLifeQualityOfLife', ascending=False).head(10)

# Add a column with ranking values 1 to 10
worst_10_qol2['Rank'] = range(1,11)

# Rename the 'state' column to 'State'
worst_10_qol2.rename(columns={'state': 'State'}, inplace=True)

# Rename the 'QualityOfLifeTotalScore' column to 'Total Score'
worst_10_qol2.rename(columns={'QualityOfLifeTotalScore': 'Total Score'}, inplace=True)

# Sort the Dataframe by the newly created 'Rank' column and then by the 'State' column
worst_10_qol2 = worst_10_qol2.sort_values(by=['Rank','State'], ascending=True)

#Set the ranking as the index 
worst_10_qol2 = worst_10_qol2.set_index('Rank')

# Display the Dataframe
print("Top 10 states for Worst Quality of Life: (With 1 being the worst at #50)")
worst_10_qol2[['State', 'Total Score']]

Top 10 states for Worst Quality of Life: (With 1 being the worst at #50)


Unnamed: 0_level_0,State,Total Score
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Alaska,40.93
2,Mississippi,39.77
3,Delaware,52.33
4,Kentucky,46.39
5,Arkansas,42.42
6,West Virginia,46.84
7,Rhode Island,51.65
8,Hawaii,47.46
9,Vermont,57.52
10,South Dakota,53.13


### Top 10 States with the best and worst Affordability

In [None]:
# Top 10 states with the best 'Affordability' (column QualityOfLifeAffordability)


In [None]:
# Top 10 states with the worst 'Affordability'


### Top 10 States with the best and worst Economy

In [None]:
# Top 10 states with the best 'Economy' (column QualityOfLifeEconomy) 

In [None]:
# Top 10 states with the worst 'Economy'

### Top 10 States with the best and worst Education and Health

In [None]:
# Top 10 states with the best 'Education and Health' (column QualityOfLifeEducationAndHealth)


In [None]:
# Top 10 states with the worst 'Education and Health'


### Top 10 States with the best and worst Safety

In [None]:
# Top 10 states with the best 'Safety'  (column QualityOfLifeSafety)


In [None]:
# Top 10 states with the worst 'Safety'


### Total Score

#### Total score for each state based on the 'QualityOfLifeTotalScore' column

In [None]:
# Best quality of life score, top 10 states and their score


In [None]:
# Worst quality of life score, top 10 states and their score.
