# U.S. Medical Insurance Costs

This is an analysis of the U.S Medical Insurance Costs "insurance.csv" from https://www.kaggle.com/mirichoi0218/insurance using python.

## Getting Started
The first thing I do is import the modules I will need. For this project I will use pandas to simplify importing and analyzing the data.

Then I create a data frame and print a sample so I can familiarize myself with the data and layout.

In [None]:
# Import insurance.csv, create a data frame and print a sample.

import pandas as pd

insurance_data = pd.read_csv('Resources/insurance.csv')
print(insurance_data.head())

## Deciding What to Analyze

Looking at the sample, I see there are seven fields:
1. age
2. sex
3. bmi
4. children
5. smoker
6. region
7. charges

Based off that information, I thought the below would be interesting analyses:
* Average age of male vs. female
* Average smoker age
* Average cost by sex and the difference
* Average age of patients with at least one child
* Percentage of male vs. female
* Percentage of male vs. female smokers
* Percentage of male vs. female with at least one child
* Percentage of smokers with children vs. without children
* Percentage of smokers by region

## Structuring the data
### Create a PatientInfo class with the data and methods
All values are rounded to 2 decimal places for readability.

# Notes
* how smoking affects cost = cost for smokers vs non smokers and percent difference.
* how does sex affect cost
* how is cost affected by region
* how does bmi affect cost
* Are patients with children more or less likely to smoke by sex?
* Are 

## Overall stats:
* Min/max ages
* Min/max costs

* Create a pivot table to show some of the analyses/stats.

In [None]:
# Create a class to hold the data.
class PatientInfo():
    def __init__(self, dataframe):
        self.dataframe = dataframe

# Create methods to interact with the data.
# Method to return average ages by sex and the difference.
    def avg_age_by_sex(self):
        average = round(self.dataframe.age.mean(), 2)
        male = round(self.dataframe[self.dataframe.sex=='male'].age.mean(), 2)
        female = round(self.dataframe[self.dataframe.sex=='female'].age.mean(), 2)
        difference = round(male - female, 2)
        print("The average age of users is {average}. \nThe average age of males is {male}. \nThe average age of females is {female}. \nThe difference between male and females average ages is {difference}.".format(average=average, male=male, female=female, difference=difference))

# Method to return the average ages of smokers by sex and the difference.
    def avg_smoker_age(self):
        average = round(self.dataframe[self.dataframe.smoker=='yes'].age.mean(), 2)
        male = round(self.dataframe[self.dataframe.sex=='male'].age.mean(), 2)
        female = round(self.dataframe[self.dataframe.sex=='female'].age.mean(), 2)
        difference = round(male-female, 2)
        print("The average age of smokers is {average}. \nThe average age of male smokers is {male}. \nThe average age of female smokers is {female}. \nThe difference between male and females smoker average ages is {difference}.".format(average=average, male=male, female=female, difference=difference))

# Method to return average cost by sex and the difference.
    def avg_cost(self):
        average = round(self.dataframe.charges.mean(), 2)
        male = round(self.dataframe[self.dataframe.sex=='male'].charges.mean(), 2)
        female = round(self.dataframe[self.dataframe.sex=='female'].charges.mean(), 2)
        difference = round(male-female, 2)
        print("The average cost is {average}. \nThe average cost for males is {male}. \nThe average cost for females is {female}. \nThe difference between male and females average cost is {difference}.".format(average=average, male=male, female=female, difference=difference))

# Method to return the average of patients with at least one child by sex and the difference.
    def avg_age_parents(self):
        average = round(self.dataframe[self.dataframe.children>=1].age.mean(), 2)
        male = round(self.dataframe[self.dataframe.sex=='male'].age.mean(), 2)
        female = round(self.dataframe[self.dataframe.sex=='female'].age.mean(), 2)
        difference = round(male-female, 2)
        print("The average age of parents is {average}. \nThe average age of male parents is {male}. \nThe average of female parents is {female}. \nThe difference between male and female parents average age is {difference}.".format(average=average, male=male, female=female, difference=difference))

# Method to return the percentage of male vs. female and the difference.
    def percent_by_sex(self):
        male = round(self.dataframe[self.dataframe.sex=='male'].age.count() / self.dataframe.age.count(), 2)
        female = round(self.dataframe[self.dataframe.sex=='female'].age.count() / self.dataframe.age.count(), 2)
        difference = round(male-female, 2)
        print("The percentage of male patients is {male}. \nThe percentage of female patients is {female}. \nThe difference between the percentage of male and female patients is {difference}.".format(male=male, female=female, difference=difference))

# Method to return the percentage of male vs. female smokers and the difference.
    def percent_smokers_by_sex(self):
        percent = round(self.dataframe[self.dataframe.smoker=='yes'].age.count() / self.dataframe.age.count(), 2)
        male = round(self.dataframe[(self.dataframe.sex=='male') & (self.dataframe.smoker=='yes')].age.count() / self.dataframe[self.dataframe.sex=='male'].age.count(), 2)
        female = round(self.dataframe[(self.dataframe.sex=='female') & (self.dataframe.smoker=="yes")].age.count() / self.dataframe[self.dataframe.sex=='female'].age.count(), 2)
        difference = round(male-female, 2)
        print("The percentage of patients who smoke is {percent}. \nThe percentage of males who smoke is {male}. \nThe percentage of females who smoke is {female}. \nThe difference between the percentage of male and female smokers is {difference}.".format(percent=percent, male=male, female=female, difference=difference))

# Method to return the percentage of patients with at least one child by sex and the difference.
    def percent_parents_by_sex(self):
        percent = round(self.dataframe[self.dataframe.children>=1].age.count() / self.dataframe.age.count(), 2)
        male = round(self.dataframe[(self.dataframe.sex=='male') & (self.dataframe.children>=1)].age.count() / self.dataframe[self.dataframe.sex=='male'].age.count(), 2)
        female = round(self.dataframe[(self.dataframe.sex=='female') & (self.dataframe.children>=1)].age.count() / self.dataframe[self.dataframe.sex=='female'].age.count(), 2)
        difference = round(male-female, 2)
        print("The percentage of patients with at least one child is {percent}. \nThe percentage of males with at least one child is {male}. \nThe percentage of females with at least one child is {female}. \nThe difference between the percentage of males and females with at least one child is {difference}.".format(percent=percent, male=male, female=female, difference=difference))

# Percentage of smokers with children vs. without children
    def smokers_with_children(self):
        percent = round(self.dataframe[(self.dataframe.smoker=='yes') & (self.dataframe.children>=1)].age.count() / self.dataframe[self.dataframe.children>=1].age.count(), 2)
        male = round(self.dataframe[(self.dataframe.sex=='male') & (self.dataframe.smoker=='yes') & (self.dataframe.children>=1)].age.count() / self.dataframe[(self.dataframe.sex=='male') & (self.dataframe.children>=1)].age.count(), 2)
        female = round(self.dataframe[(self.dataframe.sex=='female') & (self.dataframe.smoker=='yes') & (self.dataframe.children>=1)].age.count() / self.dataframe[(self.dataframe.sex=='female') & (self.dataframe.children>=1)].age.count(), 2)
        difference = round(male-female, 2)
        print("The percent of parents who smoke is {percent}. \nThe percent of male parents who smoke is {male}. \nThe percent of female parents who smoke is {female}. \nThe difference in the percentage of male vs female parents who somke is {difference}".format(percent=percent, male=male, female=female, difference=difference))

# Method to return the percentage of patients who smoke by region.
    def percent_smokers_by_region(self):
        southwest = round(self.dataframe[(self.dataframe.smoker=='yes') & (self.dataframe.region=='southwest')].age.count() / self.dataframe[self.dataframe.region=='southwest'].age.count(), 2)
        southeast = round(self.dataframe[(self.dataframe.smoker=='yes') & (self.dataframe.region=='southeast')].age.count() / self.dataframe[self.dataframe.region=='southeast'].age.count(), 2)
        northwest = round(self.dataframe[(self.dataframe.smoker=='yes') & (self.dataframe.region=='northwest')].age.count() / self.dataframe[self.dataframe.region=='northwest'].age.count(), 2)
        northeast = round(self.dataframe[(self.dataframe.smoker=='yes') & (self.dataframe.region=='northeast')].age.count() / self.dataframe[self.dataframe.region=='northeast'].age.count(), 2)
        print("The percent of patients who smoke by region are \nSouthwest: {sw} \nSoutheast: {se} \nNorthwest: {nw} \nNortheast: {ne}".format(sw=southwest, se=southeast, nw=northwest, ne=northeast))

# Instantiate the PatientInfo class as patient_info.
patient_info = PatientInfo(insurance_data)

## Analysis Functions
Run each cell to see the results of the analysis. 

These methods are built within the class.

In [None]:
# Average Ages by Sex
patient_info.avg_age_by_sex()

In [None]:
# Average Age of Smokers by Sex
patient_info.avg_smoker_age()

In [None]:
# Average Cost by Sex
patient_info.avg_cost()

In [None]:
# Average Age of Parents With At Least One Child
patient_info.avg_age_parents()

In [None]:
# Percent of Patients by Sex
patient_info.percent_by_sex()

In [None]:
# Percent of Patients Who Smoke by Sex
patient_info.percent_smokers_by_sex()

In [None]:
# Percent of Parents by Sex
patient_info.percent_parents_by_sex()

In [None]:
# Percent of Parents Who Smoke by Sex
patient_info.smokers_with_children()

In [None]:
# Percent of Smokers by Region
patient_info.percent_smokers_by_region()

These functions are not build within the class.

In [None]:
# Summary of the Data
def summarize():
    patients = patient_info.dataframe.age.count()
    min_age = patient_info.dataframe.age.min()
    max_age = patient_info.dataframe.age.max()
    avg_age = patient_info.dataframe.age.mean()
    min_charges = patient_info.dataframe.charges.min()
    max_charges = patient_info.dataframe.charges.max()
    avg_charges = patient_info.dataframe.charges.mean()
    total_charges = patient_info.dataframe.charges.sum()
    percent_smoker = patient_info.dataframe[patient_info.dataframe.smoker=='yes'].age.count() / patient_info.dataframe.age.count()
    print("There are a total of {patients} patients in the file, ranging in age from {min_age} to {max_age}, with an average of {avg_age}.".format(patients=patients, min_age=min_age, max_age=max_age, avg_age=round(avg_age, 2)))
    print("Their medical charges range from {min_charges} to {max_charges}, with an average of {avg_charges}, and a total of {total_charges}.".format(min_charges=round(min_charges, 2), max_charges=round(max_charges, 2), avg_charges=round(avg_charges, 2), total_charges=round(total_charges, 2)))
    print("The percentage of patients who smoke is {percent_smoker}%".format(percent_smoker=round(percent_smoker, 2)*100))
summarize()

Patient count
Min/max: age, bmi, charges
Percent smokers
Average number children