# Filtering NC Financial Health Data from a National Health Database
 Below are the steps to filter through the county health data from 2014-2015 to include only datapoints associating income and health in counties from North Carolina. This dataset will serve to show how financial health translates into physical health in North Carolina. We will first filter the dataset to only include counties from North Carolina, and then filter to only include the datapoints we choose.

## Working with the Raw Data

Import pandas to enable file editing

In [None]:
import numpy as np
import pandas as pd

Have pandas read the file

In [None]:
df=pd.read_csv("CountyHealthData_2014-2015.csv")

Run tests to ensure the file was read correctly

In [None]:
df.shape

(6109, 64)

In [None]:
df [5:8]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
5,AK,West,Pacific,Bethel Census Area,2050,2050,Insuff Data,1/1/2015,12864.0,0.211,...,,0.389,0.153,5317.0,0.2,175.0,42876,0.721,18.3,
6,AK,West,Pacific,Dillingham Census Area,2070,2070,Insuff Data,1/1/2014,9699.0,0.121,...,,0.384,0.147,6082.0,0.152,119.0,47498,0.636,,0.305
7,AK,West,Pacific,Dillingham Census Area,2070,2070,Insuff Data,1/1/2015,15057.0,0.121,...,,0.367,0.154,7204.0,0.152,120.0,47930,0.69,,


## Making a North Carolina dataset

Filter out all data except for counties from North Carolina

In [None]:
df[df["State"] == "NC"]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
3243,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2014,7123.0,0.192,...,10.48,0.259,0.073,8640.0,0.167,46.0,41394,0.444,4.94,0.202
3244,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2015,7291.0,0.192,...,12.38,0.249,0.088,9050.0,0.167,56.0,43001,0.455,4.60,
3245,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2014,7974.0,0.178,...,22.74,0.240,0.077,9316.0,0.205,30.0,39655,0.417,6.27,0.273
3246,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2015,8079.0,0.178,...,24.04,0.239,0.076,9242.0,0.205,32.0,46064,0.449,7.20,
3247,NC,South,South Atlantic,Alleghany County,37005,37005,Insuff Data,1/1/2014,8817.0,0.234,...,18.18,0.320,0.131,9585.0,0.210,55.0,34046,0.523,,0.215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3438,NC,South,South Atlantic,Wilson County,37195,37195,Region 20,1/1/2015,8028.0,0.159,...,7.31,0.262,0.079,9450.0,0.107,77.0,40772,0.556,9.60,
3439,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2014,7893.0,0.207,...,18.45,0.252,0.097,10084.0,0.158,32.0,40012,0.422,3.76,0.241
3440,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2015,7258.0,0.207,...,20.21,0.242,0.094,10998.0,0.158,32.0,40998,0.455,,
3441,NC,South,South Atlantic,Yancey County,37199,37199,Region 15,1/1/2014,6872.0,0.193,...,20.79,0.268,0.110,7707.0,0.158,79.0,36019,0.477,,0.176


Label this new dataset as "NC_subset"

In [None]:
NC_subset = df[df["State"] == "NC"].copy()

Run a test to make sure the data was read correctly

In [None]:
NC_subset

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
3243,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2014,7123.0,0.192,...,10.48,0.259,0.073,8640.0,0.167,46.0,41394,0.444,4.94,0.202
3244,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2015,7291.0,0.192,...,12.38,0.249,0.088,9050.0,0.167,56.0,43001,0.455,4.60,
3245,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2014,7974.0,0.178,...,22.74,0.240,0.077,9316.0,0.205,30.0,39655,0.417,6.27,0.273
3246,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2015,8079.0,0.178,...,24.04,0.239,0.076,9242.0,0.205,32.0,46064,0.449,7.20,
3247,NC,South,South Atlantic,Alleghany County,37005,37005,Insuff Data,1/1/2014,8817.0,0.234,...,18.18,0.320,0.131,9585.0,0.210,55.0,34046,0.523,,0.215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3438,NC,South,South Atlantic,Wilson County,37195,37195,Region 20,1/1/2015,8028.0,0.159,...,7.31,0.262,0.079,9450.0,0.107,77.0,40772,0.556,9.60,
3439,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2014,7893.0,0.207,...,18.45,0.252,0.097,10084.0,0.158,32.0,40012,0.422,3.76,0.241
3440,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2015,7258.0,0.207,...,20.21,0.242,0.094,10998.0,0.158,32.0,40998,0.455,,
3441,NC,South,South Atlantic,Yancey County,37199,37199,Region 15,1/1/2014,6872.0,0.193,...,20.79,0.268,0.110,7707.0,0.158,79.0,36019,0.477,,0.176


Export the new dataset

In [None]:
NC_subset.to_csv("NC_subset.csv", index=False)

## Getting our desired datapoints

Define the new dataset as "nc".

In [None]:
nc=pd.read_csv("NC_subset.csv")

Filter the new dataset to only include the following columns: Median household income, poor or fair health, uninsured adults, uninsured children, inadequate social support, healthcare costs, and could not see doctor due to cost.

In [None]:
nc[["Median household income", "Poor or fair health", "Uninsured adults", "Uninsured children", "Inadequate social support", "Could not see doctor due to cost", "Health care costs"]][:]

Unnamed: 0,Median household income,Poor or fair health,Uninsured adults,Uninsured children,Inadequate social support,Could not see doctor due to cost,Health care costs
0,41394,0.192,0.259,0.073,0.202,0.167,8640.0
1,43001,0.192,0.249,0.088,,0.167,9050.0
2,39655,0.178,0.240,0.077,0.273,0.205,9316.0
3,46064,0.178,0.239,0.076,,0.205,9242.0
4,34046,0.234,0.320,0.131,0.215,0.210,9585.0
...,...,...,...,...,...,...,...
195,40772,0.159,0.262,0.079,,0.107,9450.0
196,40012,0.207,0.252,0.097,0.241,0.158,10084.0
197,40998,0.207,0.242,0.094,,0.158,10998.0
198,36019,0.193,0.268,0.110,0.176,0.158,7707.0


Define this new set as "NC_subset1"

In [None]:
NC_subset1 = nc[["County","Median household income", "Poor or fair health", "Uninsured adults", "Uninsured children", "Inadequate social support", "Could not see doctor due to cost", "Health care costs"]].copy()

Run a test to check if the data was filtered correctly

In [None]:
NC_subset1

Unnamed: 0,County,Median household income,Poor or fair health,Uninsured adults,Uninsured children,Inadequate social support,Could not see doctor due to cost,Health care costs
0,Alamance County,41394,0.192,0.259,0.073,0.202,0.167,8640.0
1,Alamance County,43001,0.192,0.249,0.088,,0.167,9050.0
2,Alexander County,39655,0.178,0.240,0.077,0.273,0.205,9316.0
3,Alexander County,46064,0.178,0.239,0.076,,0.205,9242.0
4,Alleghany County,34046,0.234,0.320,0.131,0.215,0.210,9585.0
...,...,...,...,...,...,...,...,...
195,Wilson County,40772,0.159,0.262,0.079,,0.107,9450.0
196,Yadkin County,40012,0.207,0.252,0.097,0.241,0.158,10084.0
197,Yadkin County,40998,0.207,0.242,0.094,,0.158,10998.0
198,Yancey County,36019,0.193,0.268,0.110,0.176,0.158,7707.0


Export the new dataset

In [None]:
NC_subset1.to_csv("NC_subset1.csv", index=False)