# Medical Insurance Costs

Medical insurance costs will be investigated in a given **CSV file** by using **Python Fundamentals** in this project. The goal with this project to analyze various attributes for patients in **insurance.csv** file and gain useful insights for potential use cases.

To start, all necessary libraries must be imported. The **csv** library will be used for this project, but there are also available other libraries in Python to analyze dataset in a different way. 

In [1]:
# importing CSV library
import csv

The next step is to check the given `csv file` in order to plan out how to transfer data into Python file. The following parts below will be more useful for the further processes:

- The columns and rows names
- Data types of values (numerical vs. categorical)
- Missing or unknown data

In [2]:
# open csv file using csv library
lst = []
with open('insurance.csv') as dataset:
    reader = csv.reader(dataset)
    # loop through the data and append to list
    for row in reader:
        lst.append(row)

In [348]:
lst[0]

['age', 'sex', 'bmi', 'children', 'smoker', 'region', 'charges']

The **insurance.csv** file opened above contains the following columns:
* Patient Age as `age`
* Patient Sex as `sex`
* Patient BMI as `bmi`
* Patient Number of Children as `children`
* Patient Smoking Status as `smoker`
* Patient U.S Geopraphical Region as `region`
* Patient Yearly Medical Insurance Cost `charges`

There are no missing or unknown data in the file. To store this information, seven empty lists will be created below to hold each individual column of data from **insurance.csv**.

In [349]:
# creating empty lists for the values located in csv file`s columns
ages = []
sexes = []
bmis = []
num_of_children = []
smokers = []
regions = []
insurance_costs = []

After checking `CSV file` and creating the empty lists , now `csv_into_lists` helper function below was created to make loading data into the lists as efficient as possible. The helper functions will loop for seven times and will store all data to the lists accordingly.

In [4]:
# defining helper function
def csv_into_lists (lst, csv_file, column_name):
    # opening csv file as dictionary
    with open(csv_file) as insurance_data:
        reader = csv.DictReader(insurance_data)
        # loop through the data in each row of csv
        for row in reader:
            # add the data from each row to a list
            lst.append(row[column_name])
        return lst

In [None]:
csv_into_lists(ages, 'insurance.csv', 'age')
csv_into_lists(sexes, 'insurance.csv', 'sex')
csv_into_lists(bmis, 'insurance.csv', 'bmi')
csv_into_lists(num_of_children, 'insurance.csv', 'children')
csv_into_lists(smokers, 'insurance.csv', 'smoker')
csv_into_lists(regions, 'insurance.csv', 'region')
csv_into_lists(insurance_costs, 'insurance.csv', 'charges')

Now that all the data from `insurance.csv` neatly organized into labeled lists and analysis can be started. There are many aspects of the data that could be looked into. The following operations will be implemented:

* find average age of the patients
* return the number of males vs. females counted in the dataset
* find geographical location of the patients
* return the average yearly medical charges of the patients
* creating a dictionary that contains all patient information

To perform these inspections, a class called `PatientsInfo` has been built out below which contains five methods:
* `analyze_ages()`
* `analyze_sexes()`
* `unique_regions()`
* `average_charges()`
* `create_dictionary()`

In [351]:
class PatientsInfo:
    # init method that takes in each list as parameter
    def __init__(self, ages, sexes, bmis, num_of_children, smokers, regions, insurance_costs):
        self.ages = ages
        self.sexes = sexes 
        self.bmis = bmis
        self.num_of_children = num_of_children
        self.smokers = smokers
        self.regions = regions
        self.insurance_costs = insurance_costs
    
    # method that calculates the average ages of the patients in insurance.csv
    def analyze_ages(self):
        # initialize total age at zero
        total_age = 0
        # iterate through all ages in the ages list
        for i in self.ages:
            #sum of the total age
            total_age += int(i)
        # return total age divided by the length of the patient list      
        return '{} {} {} {}'.format('Patients', 'average age is',  str(round(total_age/len(self.ages), 2)), 'years old')
    
    
    # method that calculates the number of males and females in insurance.csv
    def analyze_sexes(self):
        # initialize number of males and females to zero
        male_counts = 0
        female_counts = 0
        # iterate through each sex in the sexes list
        for i in self.sexes:
            if i == 'male':
                male_counts += 1
            else:
                female_counts += 1
                
        print('Male number is :', male_counts)
        print('Female number is :', female_counts)
        
        
    # method to find each unique region patients are from
    def unique_regions(self):
        # initialize empty list
        unique_regions = []
        # iterate through each region in regions list
        for region in self.regions:
            # if the region is not already in the unique regions list
            # then add it to the unique regions list
            if region not in unique_regions: 
                unique_regions.append(region)
        # return unique regions list
        return unique_regions
    
    
    # method to find average yearly medical charges for patients in insurance.csv
    def average_charges(self):
        total_charges = 0
        # iterate through charges in patients insurance_costs list
        # add each charge to total_charge
        for charges in self.insurance_costs:
            total_charges += float(charges)
        # return the average charges rounded to the hundredths place
        return ('Average yearly medical insurance charges: ' 
                + str(round(total_charges/len(self.insurance_costs), 2)) + " dollars.")
       
        
    # method to create dictionary with all patients information
    def create_dictionary(self):
        #defining a dictionary
        self.patients_dictionary = {}
        #creating items of dictionary using lists above
        self.patients_dictionary['age'] = [int(ages) for ages in self.ages]
        self.patients_dictionary['sex'] = self.sexes
        self.patients_dictionary['bmis'] = self.bmis
        self.patients_dictionary['number_of_children'] = self.num_of_children
        self.patients_dictionary['smokers'] = self.smokers
        self.patients_dictionary['regions'] = self.regions
        self.patients_dictionary['insurance_costs'] = self.insurance_costs
        
        return self.patients_dictionary

<b> The next step is to create an instance of the class called `patient_info`. With this instance, each method can be used to see the results of the analysis.

In [355]:
patient_info = PatientsInfo(ages, sexes, bmis, num_of_children, smokers, regions, insurance_costs)

<b> The average age of the patients in **insurance.csv** is about 39 years old below.

In [356]:
patient_info.analyze_ages()

'Patients average age is 39.21 years old'

<b> The next step of the analysis is to check the balance of males vs. females in a given file.

In [329]:
patient_info.analyze_sexes()

Male number is : 676
Female number is : 662


<b> There are four unique geographical regions in this dataset

In [319]:
patient_info.unique_regions()

['southwest', 'southeast', 'northwest', 'northeast']

<b> The average yearly medical insurance charge per individual is 13270 US dollars as well

In [328]:
patient_info.average_charges()

'Average yearly medical insurance charges: 13270.42 dollars.'

<b> Now all patient data is organized in a dictionary properly. If a decision is made to continue making further analysis with `insurance.csv` file, it will be more convenient to use that data is shown below.

In [None]:
patient_info.create_dictionary()