# Distressed Communities

Exploratory data analysis of distressed zip codes from Economic Innovation Group.

#### Summary

The DCI "combines seven complementary economic indicators into a single summary statistic that conveys each community's standing _relative to its peers_" (emphasis added):

1. No High School Diploma
2. Housing Vacancy Rate
3. Adults Not Working
4. Poverty Rate
5. Median Income Ratio
6. Change in Employment
7. Change in Establishments

There are 26,059 zip codes in the dataset spanning the 50 U.S. States and the District of Columbia. Distressed scores range from 0 to 100, and the top quintile of scores are classified as "distressed" (here: 5,212 records). Relevant fields include the zip code number (`Zipcode`) and distressed indicator (`Quintile (5=Distressed)`).

#### Exploration

In [1]:
import pandas as pd

In [2]:
df = pd.read_excel(
    io="../data/raw/bonus/distressed/DCI-2016-2020-Academic-Non-profit-Government-Scores-Only.xlsx",
    sheet_name="Zip code"
)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26059 entries, 0 to 26058
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Zipcode                  26059 non-null  int64  
 1   Metro area               15894 non-null  object 
 2   City                     26055 non-null  object 
 3   County                   26059 non-null  object 
 4   State                    26059 non-null  object 
 5   State Abbreviation       26059 non-null  object 
 6   Census Region            26059 non-null  object 
 7   County Type              26059 non-null  object 
 8   Total Population         26059 non-null  int64  
 9   Distress Score           26059 non-null  float64
 10  Quintile (5=Distressed)  26059 non-null  int64  
dtypes: float64(1), int64(3), object(7)
memory usage: 2.2+ MB


In [3]:
df.head()

Unnamed: 0,Zipcode,Metro area,City,County,State,State Abbreviation,Census Region,County Type,Total Population,Distress Score,Quintile (5=Distressed)
0,1001,"Springfield, MA","Agawam Town, MA","Hampden County, Massachusetts",Massachusetts,MA,Northeast,Small urban,16064,35.016693,2
1,1002,,"Amherst Center, MA","Franklin County, Massachusetts",Massachusetts,MA,Northeast,Exurban,30099,71.057984,4
2,1005,"Worcester, MA-CT","Barre, MA","Worcester County, Massachusetts",Massachusetts,MA,Northeast,Small urban,5166,31.386469,2
3,1007,"Springfield, MA","Belchertown, MA","Hampshire County, Massachusetts",Massachusetts,MA,Northeast,Exurban,15080,15.725853,1
4,1008,"Springfield, MA","Blandford, MA","Hampden County, Massachusetts",Massachusetts,MA,Northeast,Small urban,1116,9.781649,1


In [4]:
len(df.State.unique())

51

In [5]:
df.State.sort_values().unique()

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'District of Columbia',
       'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana',
       'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland',
       'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi',
       'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire',
       'New Jersey', 'New Mexico', 'New York', 'North Carolina',
       'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania',
       'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee',
       'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
       'West Virginia', 'Wisconsin', 'Wyoming'], dtype=object)

In [6]:
df["Distress Score"].describe()

count    26059.000000
mean        50.001518
std         28.868081
min          0.003837
25%         25.002878
50%         50.001919
75%         74.999041
max        100.000000
Name: Distress Score, dtype: float64

In [7]:
len(df.query("`Quintile (5=Distressed)` == 5"))

5212