# Husky Enrollment by town

Analysis started Friday, March 18, 2016.

## 0 Data import

### 0.A Import Husky sheet

In [198]:
import pandas as pd

## Load HUSKY A tab, skipping header lines
husky = pd.read_excel("HUSKYA and Medicaid by Town with %.xls",
                           sheetname=0,
                           skiprows=6, 
                           names=["town","population","husky_enrollment","husky_pct_enrollment"])

## (Uncomment below) Check that their calculated enrollment percent matches my own (should equal zero)
#husky["calculated_pct"] = husky["enrollment"] / husky["population"]
#husky["calculated_pct"].sum() - husky["pct_enrollment"].sum()


## Add a column for a human-friendly percent
husky["husky_pct"] = husky["husky_pct_enrollment"] * 100

## Drop origin percent col
husky = husky.drop(["husky_pct_enrollment"], 1)

## (Uncomment below) Describe the dataset
#husky.describe()

## (Uncomment below) Count rows
husky.count()

town                170
population          170
husky_enrollment    170
husky_pct           170
dtype: int64

### 0.b Import Medicaid sheet

In [199]:
## Load Sheet 2
medicaid = pd.read_excel("HUSKYA and Medicaid by Town with %.xls",
                         sheetname="Medicaid Enrollment by Town",
                         skiprows=4,
                         names=["town","population","med_enrollment","drop","med_pct"])

## fourth column is empty
medicaid = medicaid.drop("drop",1)

## drop null rows
medicaid = medicaid[medicaid["population"].notnull()]


## Check that their percentages are correct
#medicaid["calculated_pct"] = medicaid["med_enrollment"] / medicaid["population"]
#medicaid["calculated_pct"].sum() - medicaid["med_pct"].sum()

## Add a human-readable percent column
medicaid["med_pct"] = medicaid["med_pct"] * 100

## Drop original percent column
#medicaid = medicaid.drop("med_pct", 1)

## (Uncomment below) Count rows
medicaid.count()

town              170
population        170
med_enrollment    170
med_pct           170
dtype: int64

### Merge sheets

In [200]:
combined = medicaid.merge(husky,on=["town","population"])

## Count rows. Should have the same # no of rows as medicaid and husky
combined.count()

town                169
population          169
med_enrollment      169
med_pct             169
husky_enrollment    169
husky_pct           169
dtype: int64

In [201]:
## Something went wrong. Let's find the row that didn't synch

## Rows in husky but not in combined
#husky[~husky["town"].isin(combined["town"])]

## Rows in medicaid but not in combined
#medicaid[~medicaid["town"].isin(combined["town"])]


In [202]:
## North stonington is the problem. It's entered both as "No. Stonington" and "North Stonington"
medicaid = medicaid.replace("No. Stonington", "North Stonington")


## Try merge again
combined = medicaid.merge(husky,on=["town","population"])

## Count rows. Should have the same # no of rows as medicaid and husky
combined.count()

town                170
population          170
med_enrollment      170
med_pct             170
husky_enrollment    170
husky_pct           170
dtype: int64

In [203]:
## Good! Moving on...

## 1 Towns with the highest enrollment

Canaan has a population of 1,195, so the percentage each resident accounts for is significantly higher than the other places on the list. We'll look just at towns with populations over 5,000

### 1.a Top medicaid enrollment

Canaan has a population of 1,195, so the percentage each resident accounts for is significantly higher than the other places on the list.

In [211]:
combined[combined["population"] > 5000].sort_values("med_pct", ascending=False).head(10)

Unnamed: 0,town,population,med_enrollment,med_pct,husky_enrollment,husky_pct
63,Hartford,124705,71755,57.539794,39917,32.009142
150,Waterbury,109307,55828,51.074497,33099,30.280769
88,New Britain,72878,34049,46.720547,20398,27.989242
14,Bridgeport,147612,66854,45.290356,40457,27.407663
92,New Haven,130282,56711,43.529421,31913,24.495326
93,New London,27374,11716,42.799737,6830,24.950683
162,Windham,25005,10163,40.643871,5682,22.723455
103,Norwich,40178,15467,38.496192,9106,22.664145
41,East Hartford,51033,19261,37.742245,11487,22.508965
79,Meriden,60293,22091,36.639411,13079,21.692402


### 1.b Top husky enrollment

In [212]:
combined[combined["population"] > 5000].sort_values("husky_pct", ascending=False).head(10)

Unnamed: 0,town,population,med_enrollment,med_pct,husky_enrollment,husky_pct
63,Hartford,124705,71755,57.539794,39917,32.009142
150,Waterbury,109307,55828,51.074497,33099,30.280769
88,New Britain,72878,34049,46.720547,20398,27.989242
14,Bridgeport,147612,66854,45.290356,40457,27.407663
93,New London,27374,11716,42.799737,6830,24.950683
92,New Haven,130282,56711,43.529421,31913,24.495326
162,Windham,25005,10163,40.643871,5682,22.723455
103,Norwich,40178,15467,38.496192,9106,22.664145
41,East Hartford,51033,19261,37.742245,11487,22.508965
79,Meriden,60293,22091,36.639411,13079,21.692402


In [220]:
# Add a column for the share of the husky program 
combined["husky_share"] = combined["husky_enrollment"] * 100 / combined["med_enrollment"]

combined.describe()

Unnamed: 0,population,med_enrollment,med_pct,husky_enrollment,husky_pct,husky_share
count,170.0,170.0,170.0,170.0,170.0,170.0
mean,42313.847059,9690.2,16.58313,5483.376471,9.188307,54.339931
std,275407.148728,63679.145467,10.220578,36057.610465,6.085087,6.209408
min,846.0,64.0,3.715826,34.0,1.90509,28.302676
25%,5473.25,683.25,9.694484,347.5,5.097916,50.191141
50%,12821.0,1477.0,13.902793,756.0,7.337486,54.551823
75%,25938.5,3960.5,19.56285,2113.5,11.331514,58.729329
max,3596677.0,823867.0,63.598326,466087.0,32.887029,69.834711


### 1.c Town's with the largest share of Husky A enrollment vs. overall medicaid program

In [221]:
combined[combined["population"] > 5000].sort_values("husky_share", ascending=False).head(10)

Unnamed: 0,town,population,med_enrollment,med_pct,husky_enrollment,husky_pct,husky_share
33,Danbury,83784,20632,24.625227,13394,15.986346,64.918573
116,Redding,9309,485,5.210012,311,3.340853,64.123711
71,Ledyard,15121,2192,14.496396,1405,9.291714,64.096715
168,Woodstock,7860,1062,13.51145,674,8.575064,63.46516
21,Canterbury,5088,930,18.278302,586,11.517296,63.010753
19,Burlington,9576,738,7.706767,465,4.85589,63.00813
57,Griswold,11916,3012,25.276939,1895,15.902988,62.915007
1,Ansonia,18959,6099,32.169418,3810,20.095997,62.469257
87,Naugatuck,31659,8268,26.115796,5144,16.248144,62.215772
101,North Stonington,5288,726,13.729198,450,8.509834,61.983471


### 1.c Town's with the smallest share of Husky A enrollment vs. overall medicaid program


In [223]:
combined[combined["population"] > 5000].sort_values("husky_share", ascending=True).head(10)

Unnamed: 0,town,population,med_enrollment,med_pct,husky_enrollment,husky_pct,husky_share
130,Southbury,19881,2392,12.031588,677,3.405261,28.302676
160,Wilton,18692,1104,5.90627,436,2.332549,39.492754
3,Avon,18421,1208,6.557733,512,2.779437,42.384106
155,Westbrook,6902,1069,15.488264,454,6.577804,42.469598
7,Bethany,5531,419,7.575484,181,3.272464,43.198091
24,Cheshire,29250,2504,8.560684,1088,3.719658,43.450479
51,Farmington,25627,2726,10.637219,1193,4.655246,43.763756
118,Rocky Hill,20094,2602,12.949139,1142,5.683289,43.889316
114,Prospect,9723,1256,12.917824,560,5.759539,44.585987
10,Bloomfield,20819,4894,23.507373,2185,10.495221,44.646506


### 1.x Section conclusions

The lists with the highest medicaid enrollment and the highest husky A enrollment are virtually identical, with only New London and New Haven switching positions.

## 2 Output for chart

HUSKYA and Medicaid by Town with %.xls  for_chart.csv
Husky enrollment by town.ipynb
