# U.S. Medical Insurance Costs

The goal of this project is to analyze the medical insurance dataset to uncover insights about patient demographics, regional variations, and factors influencing medical costs

	1.	Age: Age of the individual.
	2.	Sex: Gender of the individual.
	3.	BMI: Body Mass Index.
	4.	Children: Number of dependents.
	5.	Smoker: Indicates whether the individual is a smoker (yes/no).
	6.	Region: Geographic region.
	7.	Charges: Medical insurance charges (in dollars).


In [8]:

import pandas as pd

# Loads the dataset
file_path = '/Users/tetianakravchuk/Desktop/CodeAcademyProjects/DataScience/MedicalInsuranceCosts/Medical-Insurance-Costs/insurance.csv'
insurance_data = pd.read_csv(file_path)

# Displays a summary of the dataset
insurance_data.info()

# Saves dataset features into variables
age = insurance_data['age']
sex = insurance_data['sex']
bmi = insurance_data['bmi']
children = insurance_data['children']
smoker = insurance_data['smoker']
region = insurance_data['region']
charges = insurance_data['charges']


# Displays the first few rows to inspect the contents
insurance_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [15]:
# this is a Python library for data manipulation and analysis. 
# It provides DataFrame objects to store and analyze data tabular data.
import pandas as pd
file_path = '/Users/tetianakravchuk/Desktop/CodeAcademyProjects/DataScience/MedicalInsuranceCosts/Medical-Insurance-Costs/insurance.csv'
insurance_data = pd.read_csv(file_path)

# Define the class to encapsulate the analysis
class InsuranceAnalysis:
    # Constructor to initialize the data. Paramater: filepath, and Action: reads the data from the file
    def __init__(self, filepath):
        self.data = pd.read_csv(filepath)

    # Method to display the summary statistics of the data
    def summary_statistics(self):
        print("Summary Statistics:")
        print(self.data.describe())
        
    # Method to compare the average insurance costs between smokers and non-smokers
    def smoker_vs_non_smoker_costs(self):
        print("Smoker vs Non-Smoker Costs:")
        smoker_data = self.data[self.data['smoker'] == 'yes']
        non_smoker_data = self.data[self.data['smoker'] == 'no']

        avg_smoker_cost = smoker_data['charges'].mean()
        avg_non_smoker_cost = non_smoker_data['charges'].mean()

        print(f"Average cost for smokers: {avg_smoker_cost}")
        print(f"Average cost for non-smokers: {avg_non_smoker_cost}")
        
    # Method to analyze the average insurance costs based on region
    def region_wise_analysis(self):
        print("Region-Wise Analysis:")
        region_data = self.data.groupby('region')['charges'].mean()
        print(region_data)


    def average_age_with_children(self):
        print("Average age with children:")
        average_age_w_children = self.data[self.data['children'] > 0]['age'].mean()
        print(average_age_w_children)

# Example Usage
analysis = InsuranceAnalysis('insurance.csv')
analysis.summary_statistics()
analysis.smoker_vs_non_smoker_costs()
analysis.region_wise_analysis()
analysis.average_age_with_children()

Summary Statistics:
               age          bmi     children       charges
count  1338.000000  1338.000000  1338.000000   1338.000000
mean     39.207025    30.663397     1.094918  13270.422265
std      14.049960     6.098187     1.205493  12110.011237
min      18.000000    15.960000     0.000000   1121.873900
25%      27.000000    26.296250     0.000000   4740.287150
50%      39.000000    30.400000     1.000000   9382.033000
75%      51.000000    34.693750     2.000000  16639.912515
max      64.000000    53.130000     5.000000  63770.428010
Smoker vs Non-Smoker Costs:
Average cost for smokers: 32050.23183153284
Average cost for non-smokers: 8434.268297856204
Region-Wise Analysis:
region
northeast    13406.384516
northwest    12417.575374
southeast    14735.411438
southwest    12346.937377
Name: charges, dtype: float64
Average age with children:
39.78010471204188
