# U.S Medical Insurance Costs Project

In the United States, 36.5% of adults are obese and 32.5% of adults are overweight which totals two thirds of all adults
Obesity is linked to more than 60 chronic diseases, such as:
    - Type 2 diabetes
    - Heart disease
    - Stroke
    - Cancer, etc.
Obesity causes more deaths than being underweight. It is one of the top five leading causes of death at 2.8 million deaths a year. The other four leading causes are high blood pressure, tobacco use, high blood glucose, and physical inactivity.

For adults, WHO defines overweight and obesity as follows:

overweight is a BMI greater than or equal to 25; and
obesity is a BMI greater than or equal to 30.

** Information taken from WHO (https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight)

Throughout this project, I will be exploring how BMI, among other factors such as age, gender, region, and smoker status, affects a patient's insurance cost. Since having a higher BMI increases the chances to develop health complications, we want to consider how to encourage our clients to develop a healthier lifestyle and thus reducing the chance of payments for treatment 
of diseases. 

Once we have anaylyzed the data, we can try to answer the question of how to improve the overweight situation.

In [1]:
import csv

age = []
sex = []
bmi = []
num_children = []
smoker = []
region = []
insurance_cost = []

with open('insurance.csv', newline='') as insurance_csv:
    data = csv.DictReader(insurance_csv)
    for row in data:
        age.append(row['age'])
        sex.append(row['sex'])
        bmi.append(row['bmi'])
        num_children.append(row['children'])
        smoker.append(row['smoker'])
        region.append(row['region'])
        insurance_cost.append(int(float(row['charges'])))        

To start my project, I noticed in **insurance.csv** that there are seven columns per patient: Age, Sex, BMI, Number of Children, Smoker Status, Region, and Insurance Charges. 

I created seven empty lists to reflect those columns and imported the data into those seven lists.

Now that my data is in organized lists, I will now analyze the following:
* Geographical location of patients
* Average age of patients
* Percentage of men vs. women
* Average BMI of men vs. women
* Average smoker status in men vs. women
* Average yearly medical charges per patient

Below I created a class called Patients, and created methods to organize and interpret all the data such as: 
* `location() & average_location` (which later will be zipped into one list to see how many people come from each region)
* `average_age()`
* `men_vs_women()`
* `gender_bmi_average()`
* `smokers_gender()`
* `average_insurance_cost()`


In [2]:
class Patients:
    
    def __init__(self, patients_ages, patients_sexes, patients_bmis, patients_num_children, 
                 patients_smoker_statuses, patients_regions, patients_charges):
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges
        
    def location(self):
        regions = []
        for region in self.patients_regions:
            if region not in regions: 
                regions.append(region)
        return regions

    def average_location(self):
        total = [0, 0, 0, 0]
        for region in self.patients_regions:
            if region == 'southwest':
                total[0] += 1
            elif region == 'southeast':
                total[1] += 1
            elif region == 'northwest':
                total[2] += 1
            elif region == 'northeast':
                total[3] += 1
        return total

    def average_age(self):
        total_age = 0
        for age in self.patients_ages:
            total_age += int(age)
        return 'Average Patient Age: ' + str(round(total_age/len(self.patients_ages), 2)) + ' years old.'

    def men_vs_women(self):
        males = 0
        females = 0
        for sex in self.patients_sexes:
            if sex == 'female':
                females += 1
            elif sex == 'male':
                males += 1
        print('Percentage of Female Patients: ' + str(females) + '(' + str(round(females/(females+males)*100)) + '%)')
        print('Percentage of Male Patients: ' + str(males) + '(' + str(round(males/(females+males)*100)) + '%)')
        
    def gender_bmi_average(self):
        female_bmi = 0
        male_bmi = 0
        femalelength = 0
        malelength = 0
        for gender, bmi in zip(self.patients_sexes, self.patients_bmis):
            if gender == 'female':
                female_bmi += float(bmi)
                femalelength += 1
            elif gender == 'male':
                male_bmi += float(bmi)
                malelength += 1
        print('Average Female BMI: ' + str(round(female_bmi/femalelength, 2)))
        print('Average Male BMI: ' + str(round(male_bmi/malelength, 2)))
    
    def smokers_gender(self):
        fsmokers = 0
        msmokers = 0
        femalelength = 0
        malelength = 0
        for gender, smokers in zip(sex, smoker):
            if (gender == 'female') and (smokers == 'yes'):
                fsmokers += 1
            elif (gender == 'male') and (smokers == 'yes'):
                msmokers += 1  
            elif gender == 'female':
                femalelength += 1
            elif gender == 'male':
                malelength += 1
        print('Female Smokers: ' + str(fsmokers) + '/' + str(femalelength+fsmokers) + '(' + str(round(fsmokers/(femalelength+fsmokers)*100)) + '%)')
        print('Male Smokers: ' + str(msmokers) + '/' + str(malelength+msmokers) + '(' + str(round(msmokers/(malelength+msmokers)*100)) + '%)')
    
    
    def average_insurance_cost(self):
        average_cost = 0
        for insurance in self.patients_charges:
            average_cost += float(insurance)
        print('Average Yearly Insurance Cost: $' + str(round(average_cost/len(self.patients_charges), 2)))

Next I created an instance for the class Patients called `patient_info` which I will use to see the results of my analysis.

In [3]:
patient_info = Patients(age, sex, bmi, num_children, smoker, region, insurance_cost)

In [4]:
result = list(zip(patient_info.location(), patient_info.average_location()))
zipped_list = result[:]
zipped_list_2 = list(result)
print(zipped_list_2)

[('southwest', 325), ('southeast', 364), ('northwest', 325), ('northeast', 324)]


There are four geographical regions in this dataset, with a fairly even analysis from every region. (All from United States)

In [5]:
print(patient_info.average_age())
print ("Minimum Age in Dataset: " + min(age))
print ("Maximum Age in Dataset: " + max(age))

Average Patient Age: 39.21 years old.
Minimum Age in Dataset: 18
Maximum Age in Dataset: 64


Above you will see the average age in the dataset is 39 years old. When scoping through a dataset, it is important to make sure it is representative of a broad population. We would have to do a further analysis to make sure the patient age group in this dataset is indicative of that.

In [6]:
patient_info.men_vs_women()

Percentage of Female Patients: 662(49%)
Percentage of Male Patients: 676(51%)


I also wanted to check in the dataset if it was representative of an equal amount of men and women. As above, it is important to make sure it is representative of a broad range of individuals.

In [7]:
patient_info.gender_bmi_average()

Average Female BMI: 30.38
Average Male BMI: 30.94


As mentioned in the beginning of this project, obesity is a BMI greater than or equal to 30. According to this analysis, on average both males and females in this dataset are considered obese. The problem of excess weight is relevant. 

In [8]:
patient_info.smokers_gender()

Female Smokers: 115/662(17%)
Male Smokers: 159/676(24%)


As you can see, in the dataset we found that 17% of females are smokers and 24% of men are smokers. Being a smoker greatly affects insurance premiums as smokers are more prone to diseases which requires more payments for treatments when ill (as well as someone with a BMI over 30.0).

In [9]:
print ("Minimum Yearly Insurance Cost: $" + str(round(min(insurance_cost))))
print(patient_info.average_insurance_cost())

Minimum Yearly Insurance Cost: $1121
Average Yearly Insurance Cost: $13269.93
None


Lastly, you can see the yearly insurance premium is much higher than the lowest yearly insurance premium. This could be due to several factors, such as: how many children the patient has, how old the patient is, smoker vs. non-smoker, high BMI vs. low BMI, etc. 

In conclusion, we want to lower insurance premiums for our clients. This can be done when they client obtains insurance, as they could receive information about how their premiums will decrease if they lower their BMI to a healthy level and if they quit smoking (if they are smokers). Providing our clients with examples of how their rates will decrease by living a healthy lifestyle will encourage them to do so.