# U.S. Medical Insurance Costs

## Scope

1) Map insurance cost to each of the criteria to see how shifts in the criteria effect insurance costs in general

    Non-Binary Numeric Values
    1) Segment age and bmi into bins
    2) Provide basic statistical analysis of charges(Sum, Avg, Max, Min, Number of Rows) based on bmi, age, and number of children.
    
    Binary and Nominal Values
    1) Calulate statistical data concerning the charges (Sum, Avg, Max, Min, Number of Rows) for each nominal output (sex, smoker, region
    
2) Create an analysis based on two columns - Age and Sex.

# Import Data, Tidy Data, and Save Columns as Lists

In [2]:
import csv 
from descriptive_analysis import *

sex=[]
age=[]
bmi=[]
children=[]
smoker=[]
region=[]
charges=[]

#Import Data and create a list for each column
with open ('./insurance.csv', newline='') as insurance_data:
    
    csv_reader=csv.DictReader(insurance_data)
  
    for row in csv_reader:
        sex.append(row['sex'])
        age.append(int(row['age']))
        bmi.append(float(row['bmi']))
        children.append(int(row['children']))
        smoker.append(row['smoker'])
        region.append(row['region'])
        charges.append(round(float(row['charges']),2))
        

smoker = smoker_status(smoker)



# General Summary of the Data

In [3]:
#Find the total number of rows in the data
total_rows = len(charges)
#Find the sum of all charges
total_charges = my_sum(charges)
#Find the average sum of all charges
average_charges = mean(charges)

#Find the max index 
max_index = charges.index(maximum(charges))
#Describe the person with the highest premium
max_charge = "The person who pays the highest premium totaling ${charge} is a/an {age} year old {sex} ({smoker_status}),\
with a BMI of {bmi}. The individual lives in the {region} region with {num_of_children} children.".format(charge= format_number(charges[max_index]), \
                                                                                                            age=age[max_index], sex=sex[max_index], \
                                                                                                            smoker_status=smoker[max_index], \
                                                                                                            bmi=bmi[max_index], \
                                                                                                            region=region[max_index], \
                                                                                                            num_of_children=children[max_index])
#Find the min index
min_index = charges.index(minimum(charges))
#Describe the person with the lowest premium
min_charge = "The person who pays the lowest premium totaling ${charge} is a/an {age} year old {sex} ({smoker_status}), \
with a BMI of {bmi}. The individual lives in the {region} region with {num_of_children} children.".format(charge=format_number(charges[min_index]), \
                                                                                                            age=age[min_index], sex=sex[min_index], \
                                                                                                            smoker_status=smoker[min_index], \
                                                                                                            bmi=bmi[min_index], \
                                                                                                            region=region[min_index], \
                                                                                                            num_of_children=children[min_index])


print("\033[1mGeneral Summary of the US Medical Insurance Costs\033[0m")
print("Total Rows in Dataset: ", total_rows)
print("Sum of All Charges: $", format_number(total_charges))
print("Average of All Charges: $", format_number(average_charges), '\n')
print("Description of Individual with the Highest Premium: ", max_charge, '\n')
print("Description of Individual with the Lowest Premium: ", min_charge)

[1mGeneral Summary of the US Medical Insurance Costs[0m
Total Rows in Dataset:  1338
Sum of All Charges: $ 17,755,825.01
Average of All Charges: $ 13,270.42 

Description of Individual with the Highest Premium:  The person who pays the highest premium totaling $63,770.43 is a/an 54 year old female (Smoker),with a BMI of 47.41. The individual lives in the southeast region with 0 children. 

Description of Individual with the Lowest Premium:  The person who pays the lowest premium totaling $1,121.87 is a/an 18 year old male (Non-Smoker), with a BMI of 23.21. The individual lives in the southeast region with 0 children.


# Sex vs. Charges

In [4]:
##########FEMALE##############

# Total Number of Rows
sex_count = count_category(sex)
female_count = sex_count['female']

#Total Sum of Charges for Females: 
female_indices = []
for i in range(len(sex)):
    if sex[i] == "female":
        female_indices.append(i)

female_charges = []
for i in female_indices:
    female_charges.append(charges[i])
    
total_female_charges = my_sum(female_charges)

#Average Charge for Females:
female_avg_charge = mean(female_charges)

#Max Charge for Females:
female_max_charge = maximum(female_charges)

#Min Charge for Females
female_min_charge = minimum(female_charges)

print("\033[1mSummary of the US Medical Insurance Costs for Females\033[0m")
print("Total Rows in Female Subset: ", female_count)
print("Sum of Female Charges: $", format_number(total_female_charges))
print("Average of Female Charges: $", format_number(female_avg_charge))
print("Max Female Charge: $", format_number(female_max_charge))
print("Min Female Charge: $", format_number(female_min_charge))

##########MALE##############

# Total Number of Rows
male_count = sex_count['male']

#Total Sum of Charges for Males: 
male_indices = []
for i in range(len(sex)):
    if sex[i] == "male":
        male_indices.append(i)

male_charges = []
for i in male_indices:
    male_charges.append(charges[i])
    
total_male_charges = my_sum(male_charges)

#Average Charge for Males:
male_avg_charge = mean(male_charges)

#Max Charge for Males:
male_max_charge = maximum(male_charges)

#Min Charge for Males
male_min_charge = minimum(male_charges)


print("\n\033[1mSummary of the US Medical Insurance Costs for Males\033[0m")
print("Total Rows in Male Subset: ", male_count)
print("Sum of Male Charges: $", format_number(total_male_charges))
print("Average of Male Charges: $", format_number(male_avg_charge))
print("Max Male Charge: $", format_number(male_max_charge))
print("Min Male Charge: $", format_number(male_min_charge))

[1mSummary of the US Medical Insurance Costs for Females[0m
Total Rows in Female Subset:  662
Sum of Female Charges: $ 8,321,061.12
Average of Female Charges: $ 12,569.58
Max Female Charge: $ 63,770.43
Min Female Charge: $ 1,607.51

[1mSummary of the US Medical Insurance Costs for Males[0m
Total Rows in Male Subset:  676
Sum of Male Charges: $ 9,434,763.89
Average of Male Charges: $ 13,956.75
Max Male Charge: $ 62,592.87
Min Male Charge: $ 1,121.87


### Interpretation

As more males are present in the dataset compared to females, the sum and average are apparently higher for males. Yet the maximum and minimum premium for Females are higher than the males, which supports the idea that females have higher premiums than males on average. Further testing is required for an accurate analysis. 

# Age vs. Charges

|Bin Name | Bin Age|
| :-: | :-: |
| Young Adults | 15-24 |
| Adults | 25-34 |
| Middle-Aged | 35-44 |
| Middle-Older Aged | 45-54 |
| Pre-Senior Citizen | 55-64 |

In [5]:
#Add bin_labels to a list
bin_labels = ['Young Adults', 'Adults', 'Middle-Aged', 'Middle-Older Aged', 'Pre-Senior Citizen']

#Starting range for the ages
x=15
for age_bin_name in bin_labels:
    #Create a List of Bin Ranges:
    age_bin_range = list(range(x, x+10))
    
    #Create a List of Charges for each Bin:
    age_bin_name_indices = sublist_indices(age, list_range=age_bin_range)
    age_bin_name_charges = sublist_charges(charges, age_bin_name_indices)
    
    #Total Number of Rows:
    age_bin_count = len(age_bin_name_charges)
    
    #Total Sum of Charges for Bin:
    total_age_bin_charges = my_sum(age_bin_name_charges)
    
    #Average Charge for Bin:
    age_bin_avg_charge = mean(age_bin_name_charges)
    
    #Max Charge for Bin:
    age_bin_max_charge = maximum(age_bin_name_charges)

    #Min Charge for Bin:
    age_bin_min_charge = minimum(age_bin_name_charges)
    
    
    age_subtitle = "\033[1mSummary of the US Medical Insurance Costs for {age} [{start}-{end}]\033[0m".format(age=age_bin_name, \
                                                                                                              start=x, end=x+9)
    
    print(age_subtitle)
    print("Total Rows in " + age_bin_name + " Subset: ", age_bin_count)
    print("Sum of " + age_bin_name + " Charges: $", format_number(total_age_bin_charges))
    print("Average of " + age_bin_name + " Charges: $", format_number(age_bin_avg_charge))
    print("Max " + age_bin_name + " Charge: $", format_number(age_bin_max_charge))
    print("Min " + age_bin_name + " Charge: $", format_number(age_bin_min_charge) + '\n')
    
    x += 10

[1mSummary of the US Medical Insurance Costs for Young Adults [15-24][0m
Total Rows in Young Adults Subset:  278
Sum of Young Adults Charges: $ 2,505,152.63
Average of Young Adults Charges: $ 9,011.34
Max Young Adults Charge: $ 44,501.4
Min Young Adults Charge: $ 1,121.87

[1mSummary of the US Medical Insurance Costs for Adults [25-34][0m
Total Rows in Adults Subset:  271
Sum of Adults Charges: $ 2,805,498.35
Average of Adults Charges: $ 10,352.39
Max Adults Charge: $ 58,571.07
Min Adults Charge: $ 2,137.65

[1mSummary of the US Medical Insurance Costs for Middle-Aged [35-44][0m
Total Rows in Middle-Aged Subset:  260
Sum of Middle-Aged Charges: $ 3,414,883.92
Average of Middle-Aged Charges: $ 13,134.17
Max Middle-Aged Charge: $ 48,885.14
Min Middle-Aged Charge: $ 4,399.73

[1mSummary of the US Medical Insurance Costs for Middle-Older Aged [45-54][0m
Total Rows in Middle-Older Aged Subset:  287
Sum of Middle-Older Aged Charges: $ 4,550,077.24
Average of Middle-Older Aged Charges

### Interpretation

Relatively, it seems that as you increase in age, so does the premium. This is supported by the averages for each category. The minimum and maximum values do not really show a pattern which is an indicator that other factors might outweight age.

# BMI vs. Charges

| BMI Name | BMI Ranges|
|:-:|:-:|
| Underweight | <18.5|
| Normal | 18.5-24.9 |
| Overweight | 25.0-29.9 |
| Obese | >= 30 |

In [6]:
#Add bin_labels to a list
bmi_bin_labels = ['Underweight', 'Normal', 'Overweight', 'Obese']
            
for bmi_bin_name in bmi_bin_labels:
    
    bmi_bin_name_indices = []
    if bmi_bin_name == 'Underweight':
        for i in range(len(bmi)):
            if bmi[i] < 18.5:
                bmi_bin_name_indices.append(i)
    elif bmi_bin_name == 'Normal':
        for i in range(len(bmi)):
            if bmi[i] >= 18.5 and bmi[i] < 25.0:
                bmi_bin_name_indices.append(i)
    elif bmi_bin_name == 'Overweight':
        for i in range(len(bmi)):
            if bmi[i] >= 25.0 and bmi[i] < 30.0:
                bmi_bin_name_indices.append(i)
    else:
        for i in range(len(bmi)):
            if bmi[i] >= 30.0:
                bmi_bin_name_indices.append(i)
    
    #Create a List of Charges for each Bin:
    bmi_bin_name_charges = sublist_charges(charges, bmi_bin_name_indices)
    bmi_bin_name_indices = []
    
    #Total Number of Rows:
    bmi_bin_count = len(bmi_bin_name_charges)
    
    #Total Sum of Charges for Bin:
    total_bmi_bin_charges = my_sum(bmi_bin_name_charges)
    
    #Average Charge for Bin:
    bmi_bin_avg_charge = mean(bmi_bin_name_charges)
    
    #Max Charge for Bin
    bmi_bin_max_charge = maximum(bmi_bin_name_charges)

    #Min Charge for Bin:
    bmi_bin_min_charge = minimum(bmi_bin_name_charges)
    
    
    bmi_subtitle = "\033[1mSummary of the US Medical Insurance Costs for {bmi}\033[0m".format(bmi=bmi_bin_name)
                                                                                                        
    
    print(bmi_subtitle)
    print("Total Rows in " + bmi_bin_name + " Subset: ", bmi_bin_count)
    print("Sum of " + bmi_bin_name + " Charges: $", format_number(total_bmi_bin_charges))
    print("Average of " + bmi_bin_name + " Charges: $", format_number(bmi_bin_avg_charge))
    print("Max " + bmi_bin_name + " Charge: $", format_number(bmi_bin_max_charge))
    print("Min " + bmi_bin_name + " Charge: $", format_number(bmi_bin_min_charge) + '\n')

[1mSummary of the US Medical Insurance Costs for Underweight[0m
Total Rows in Underweight Subset:  20
Sum of Underweight Charges: $ 177,044.03
Average of Underweight Charges: $ 8,852.2
Max Underweight Charge: $ 32,734.19
Min Underweight Charge: $ 1,621.34

[1mSummary of the US Medical Insurance Costs for Normal[0m
Total Rows in Normal Subset:  225
Sum of Normal Charges: $ 2,342,100.97
Average of Normal Charges: $ 10,409.34
Max Normal Charge: $ 35,069.37
Min Normal Charge: $ 1,121.87

[1mSummary of the US Medical Insurance Costs for Overweight[0m
Total Rows in Overweight Subset:  386
Sum of Overweight Charges: $ 4,241,178.74
Average of Overweight Charges: $ 10,987.51
Max Overweight Charge: $ 38,245.59
Min Overweight Charge: $ 1,252.41

[1mSummary of the US Medical Insurance Costs for Obese[0m
Total Rows in Obese Subset:  707
Sum of Obese Charges: $ 10,995,501.27
Average of Obese Charges: $ 15,552.34
Max Obese Charge: $ 63,770.43
Min Obese Charge: $ 1,131.51



### Interpretation

As the BMI increases, the average charges increases. I also note that the number of individuals increases as the BMI increases which could also be affecting the sum and average charges. I also noticed the min premiums are relatively similar with each BMI category, and the max premiums for Underweight-Overweight is also roughly similar.

# Number of Children vs. Charges

In [7]:
#dictionary of number of children and counts
children_count = count_category(children)

num_children = []
for num in children_count.keys():
    num_children.append(num)

num_children = sorted(num_children)

for num in num_children:
    
    #Create a List of Charges for each Bin:
    children_bin_name_indices = sublist_indices(children, category=num)
    children_bin_name_charges = sublist_charges(charges, children_bin_name_indices)

    #Total Number of Rows:
    children_bin_count = children_count[num]
    
    #Total Sum of Charges for Bin:
    total_children_bin_charges = my_sum(children_bin_name_charges)
    
    #Average Charge for Bin:
    children_bin_avg_charge = mean(children_bin_name_charges)
    
    #Max Charge for Bin:
    children_bin_max_charge = maximum(children_bin_name_charges)

    #Min Charge for Bin:
    children_bin_min_charge = minimum(children_bin_name_charges)
    
    
    children_subtitle = "\033[1mSummary of the US Medical Insurance Costs for Individuals with {numofchildren} Children\033[0m".format(numofchildren=num)
    

    print(children_subtitle)
    print("Total Rows for Individuals with " + str(num) + " Children Subset: ", children_bin_count)
    print("Sum of Charges for Individuals with " + str(num) + " Children: $", format_number(total_children_bin_charges))
    print("Average Charge for Individuals with " + str(num) + " Children: $", format_number(children_bin_avg_charge))
    print("Max Charge for Individuals with " + str(num) + " Children: $", format_number(children_bin_max_charge))
    print("Min Charge for Individuals with  " + str(num) + " Children: $", format_number(children_bin_min_charge) + '\n')
    


[1mSummary of the US Medical Insurance Costs for Individuals with 0 Children[0m
Total Rows for Individuals with 0 Children Subset:  574
Sum of Charges for Individuals with 0 Children: $ 7,098,070.02
Average Charge for Individuals with 0 Children: $ 12,365.98
Max Charge for Individuals with 0 Children: $ 63,770.43
Min Charge for Individuals with  0 Children: $ 1,121.87

[1mSummary of the US Medical Insurance Costs for Individuals with 1 Children[0m
Total Rows for Individuals with 1 Children Subset:  324
Sum of Charges for Individuals with 1 Children: $ 4,124,899.63
Average Charge for Individuals with 1 Children: $ 12,731.17
Max Charge for Individuals with 1 Children: $ 58,571.07
Min Charge for Individuals with  1 Children: $ 1,711.03

[1mSummary of the US Medical Insurance Costs for Individuals with 2 Children[0m
Total Rows for Individuals with 2 Children Subset:  240
Sum of Charges for Individuals with 2 Children: $ 3,617,655.32
Average Charge for Individuals with 2 Children: $ 1

### Interpretation
My hypothesis was the number of children would increase an individal's premium, yet the summary statistics do not support this idea.

# Smoker Status vs. Charges

In [8]:
##########NON-SMOKER##############

# Total Number of Rows
smoker_count = count_category(smoker)
nonsmoke_count = smoker_count['Non-Smoker']

#Total Sum of Charges for Non-Smokers:
nonsmoke_indices = []
for i in range(len(smoker)):
    if smoker[i] == 'Non-Smoker':
        nonsmoke_indices.append(i)

nonsmoke_charges = []
for i in nonsmoke_indices:
    nonsmoke_charges.append(charges[i])
    
total_nonsmoke_charges = my_sum(nonsmoke_charges)

#Average Charge for Non-Smokers:
nonsmoke_avg_charge = mean(nonsmoke_charges)

#Max Charge for Non-Smokers:
nonsmoke_max_charge = maximum(nonsmoke_charges)

#Min Charge for Non-Smokers
nonsmoke_min_charge = minimum(nonsmoke_charges)

print("\n\033[1mSummary of the US Medical Insurance Costs for Non-Smokers\033[0m")
print("Total Rows in Non-Smoker Subset: ", nonsmoke_count)
print("Sum of Non-Smoker Charges: $", format_number(total_nonsmoke_charges))
print("Average of Non-Smoker Charges: $", format_number(nonsmoke_avg_charge))
print("Max Non-Smoker Charge: $", format_number(nonsmoke_max_charge))
print("Min Non-Smoker Charge: $", format_number(nonsmoke_min_charge))

##########SMOKER##############

# Total Number of Rows
smoker_count = count_category(smoker)
smoke_count = smoker_count['Smoker']

#Total Sum of Charges for Smokers:
smoke_indices = []
for i in range(len(smoker)):
    if smoker[i] == 'Smoker':
        smoke_indices.append(i)

smoke_charges = []
for i in smoke_indices:
    smoke_charges.append(charges[i])
    
total_smoke_charges = my_sum(smoke_charges)

#Average Charge for Smokers:
smoke_avg_charge = mean(smoke_charges)

#Max Charge for Smokers:
smoke_max_charge = maximum(smoke_charges)

#Min Charge for Smokers
smoke_min_charge = minimum(smoke_charges)

print("\n\033[1mSummary of the US Medical Insurance Costs for Smokers\033[0m")
print("Total Rows in Smoker Subset: ", smoke_count)
print("Sum of Smoker Charges: $", format_number(total_smoke_charges))
print("Average of Smoker Charges: $", format_number(smoke_avg_charge))
print("Max Smoker Charge: $", format_number(smoke_max_charge))
print("Min Smoker Charge: $", format_number(smoke_min_charge))


[1mSummary of the US Medical Insurance Costs for Non-Smokers[0m
Total Rows in Non-Smoker Subset:  1064
Sum of Non-Smoker Charges: $ 8,974,061.47
Average of Non-Smoker Charges: $ 8,434.27
Max Non-Smoker Charge: $ 36,910.61
Min Non-Smoker Charge: $ 1,121.87

[1mSummary of the US Medical Insurance Costs for Smokers[0m
Total Rows in Smoker Subset:  274
Sum of Smoker Charges: $ 8,781,763.54
Average of Smoker Charges: $ 32,050.23
Max Smoker Charge: $ 63,770.43
Min Smoker Charge: $ 12,829.46


### Interpretation

It is very apparent that smokers have higher premiums compared to non-smokers. There are only 274 smokers in this dataset, yet the average premiumm for a smoker is greater than a Non-smoker by >$20,000. The range for a smoker is also significantly higher than that of a non-smoker. 

# Region vs. Charges

In [9]:
#Since they are 4 regions, I decided instead of repeating code 4 times, to try to use a for loop to repeat the process. 
#I also noticed that I am repeating code so I wanted to use this experience as a foundation for a function to improve the code.

#dictionary of region names and counts
regions_count = count_category(region)

#Get region names in a list
region_names = []
for name in regions_count.keys():
    region_names.append(name) #sw, se, nw, ne

#Parse through each region
for name in region_names:
    #Each region specific count
    specific_region_count = regions_count[name]
    #Get region specific charges
    specific_region_indices = []
    for i in range(len(region)):
        if region[i] == name:
            specific_region_indices.append(i)
            
    specific_region_charges = []
    for i in specific_region_indices:
        specific_region_charges.append(charges[i])
    
    #Total Sum of Charges for Specific Region:
    total_specific_region_charges = my_sum(specific_region_charges)
    
    #Average Charge for Specific Region:
    specific_region_avg_charge = mean(specific_region_charges)
    
    #Max Charge for Specific Region:
    specific_region_max_charge = maximum(specific_region_charges)
    
    #Min Charge for Specific Region:
    specific_region_min_charge = minimum(specific_region_charges)
    
    subtitle = "\033[1mSummary of the US Medical Insurance Costs for {region} Region\033[0m".format(region=name.title())
    print(subtitle)
    print("Total Rows in " + name.title() + " Region Subset: ", specific_region_count)
    print("Sum of " + name.title() + " Region Charges: $", format_number(total_specific_region_charges))
    print("Average of " + name.title() + " Region Charges: $", format_number(specific_region_avg_charge))
    print("Max " + name.title() + " Region Charge: $", format_number(specific_region_max_charge))
    print("Min " + name.title() + " Region Charge: $", format_number(specific_region_min_charge), '\n')   

[1mSummary of the US Medical Insurance Costs for Southwest Region[0m
Total Rows in Southwest Region Subset:  325
Sum of Southwest Region Charges: $ 4,012,754.69
Average of Southwest Region Charges: $ 12,346.94
Max Southwest Region Charge: $ 52,590.83
Min Southwest Region Charge: $ 1,241.57 

[1mSummary of the US Medical Insurance Costs for Southeast Region[0m
Total Rows in Southeast Region Subset:  364
Sum of Southeast Region Charges: $ 5,363,689.78
Average of Southeast Region Charges: $ 14,735.41
Max Southeast Region Charge: $ 63,770.43
Min Southeast Region Charge: $ 1,121.87 

[1mSummary of the US Medical Insurance Costs for Northwest Region[0m
Total Rows in Northwest Region Subset:  325
Sum of Northwest Region Charges: $ 4,035,711.93
Average of Northwest Region Charges: $ 12,417.58
Max Northwest Region Charge: $ 60,021.4
Min Northwest Region Charge: $ 1,621.34 

[1mSummary of the US Medical Insurance Costs for Northeast Region[0m
Total Rows in Northeast Region Subset:  324
S

### Interpretation 

It is really hard to draw a concice conclusion from the data. First, SW, NW, and NE, have similar counts. Comparing those three, Southwest Region seems to have the lowest average, max, and min premiums. 

Looking at Southeast Region, since it has a larger count, it is clear why the sum and average is also larger. Yet the max charge in the Southeast Region is the highest out of all regions, and the minimum charge in the Southeast Region is the lowest out of all regions. It does show a wider range compared to the other regions and could even potentially favor a specific type of person. 

For example, maybe if I was a young single non-smoker woman with no children and a normal bmi, I might consider living in the Southeast Region.  But if I was a smoker, or maybe even have children, I might choose the more balanced affordable route like the Southwest Region. Either way, the data needs to be further analyzed.

# Who Spends More on Healthcare: Men or Women

My idea is to divide female/male charges by the ages bin, then compare the average for each age group to further analyze a trend.

In [24]:
#Female Sublist of Indices
female_indices = sublist_indices(sex, category="female")

#Male Sublist of Indices
male_indices = sublist_indices(sex, category="male")

#Make lists for Sex_age indices
female_age_indices = []
male_age_indices = []


#Add bin_labels to a list
bin_labels = ['Young Adults', 'Adults', 'Middle-Aged', 'Middle-Older Aged', 'Pre-Senior Citizen']


#Starting range for the ages
x=15
for age_bin_name in bin_labels:
    #Create a List of Bin Ranges:
    age_bin_range = list(range(x, x+10))
    
    #Create a List of Charges for each Bin:
    age_bin_name_indices = sublist_indices(age, list_range=age_bin_range)
    for i in age_bin_name_indices:
        if i in female_indices:
            female_age_indices.append(i)
        elif i in male_indices:
            male_age_indices.append(i)
    
    female_age_bin_name_charges = sublist_charges(charges, female_age_indices)
    male_age_bin_name_charges = sublist_charges(charges, male_age_indices)
    
    #Total Number of Rows:
    female_age_bin_count = len(female_age_bin_name_charges)
    male_age_bin_count = len(male_age_bin_name_charges)
    
    #Find Average
    female_age_mean = mean(female_age_bin_name_charges)
    male_age_mean = mean(male_age_bin_name_charges)
    
    print("\033[1m" + age_bin_name + ":\033[0m")
    print("Male: $", format_number(male_age_mean))
    print("Female: $", format_number(female_age_mean), '\n')
    
    x += 10


[1mYoung Adults:[0m
Male: $ 9,366.23
Female: $ 8,629.97 

[1mAdults:[0m
Male: $ 10,358.56
Female: $ 8,944.28 

[1mMiddle-Aged:[0m
Male: $ 11,451.6
Female: $ 10,087.52 

[1mMiddle-Older Aged:[0m
Male: $ 12,830.07
Female: $ 11,371.55 

[1mPre-Senior Citizen:[0m
Male: $ 13,956.75
Female: $ 12,569.58 



# Interpretation

It appears, that no matter the age group, that on average, men generally pay more than woman from this data.

# Overall Summary of Project

My goal for this Python Notebook was to analyze the HealthCare data without the help of built-in functions, pandas, numpy, matplotlib, and other libraries.

I decided to create summary analysis for each column by dividing every column into categories, then reporting the total number of rows, sum, average, minimum, and maximum values for each category. During the process, I noticed that the code was becoming repetitive, and that led me to create functions for specific processes. 

Moving through each column, I noticed that not all columns could be analyzed the same. For instance, I could not analyze my sex column, which already have 2 defined categories, like my age column, which is a numerical variable with infinite range. This made me modify my functions a bit, and I do think there is room for growth in fixing the functions.

I did not utilize the functions I created that analyzed the spread of data such as standard deviation, variance, and range. I felt that reporting those numbers may not be beneficial if it is not coupled with a visual analysis plot such as a histogram. My original goal was to include visual plots, but I decided I wanted to limit myself to functions I created, and at this moment, I do not have the knowledge to create a function that builds different graphs. 

Lastly, I noticed that currently my summary analysis only compares a column to charges and I wanted to compare more than one comlumn for a more complex analysis. Plus my analysis provided a summary, but did not answer a question. The first question that popped up in my head was What would this data tell me about Who Spends More On HealthCare -Men vs Women. I did notice some setbacks as I would have liked a visual graph to show the differences. I also asked a very general question, and although the averages support men pay more than women, it does not actually answer the question. So I do realize I cannot say Men pay more than Women as that is not entirely true, only that the averages for men's insurance costs are higher than women's insurance costs for any age group concerning this data.

# Improvements
1) Provide visual analysis   
2) Look for anomalies in the data and do a root cause analysis  
3) Build a predictive model that, based on the information you have, attempts to predict the value of a new patient  
4) Re-analyze the Healthcare Data after learning about more libraries  