<a href="https://colab.research.google.com/github/yoonha315/Project-Summary/blob/main/BMI%20and%20Heart%20Attack%20Risk%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unveiling Heart Attack Risk: A Comprehensive Analysis of BMI and its Implications

# Introduction
Myocardial infarctions, more commonly known as heart attacks, continue to pose a significant global health challenge. Recent research has highlighted the role of higher Body Mass Index (BMI) values as a robust predictor of heart attack risk, contributing to the multifaceted challenge of cardiovascular disease (Adams, 2020). These events, not only a leading cause of mortality but also a substantial contributor to the global burden of disease, underscore the need for a comprehensive understanding of factors contributing to heart attack risk, with specific attention to BMI.

The Heart Attack Risk Prediction Dataset is a valuable repository, housing a wealth of patient-specific data that spans demographic information, lifestyle choices, medical history, and socio-economic factors. It represents the culmination of extensive efforts to unravel the intricate dynamics of heart health, with a primary focus on the factors influencing it, most notably BMI. The dataset holds immense potential to transform heart disease prevention and management, placing a pronounced emphasis on the significance of BMI in predicting heart attacks.

Cardiovascular diseases, particularly heart attacks, are often preventable through lifestyle adjustments and early interventions (Renninger, M., 2018). Through a meticulous analysis of the dataset, with a specific focus on BMI values, we aim to take a substantial step toward practical strategies for heart disease prevention. The primary aim of this research project is to harness the Heart Attack Risk Prediction Dataset, constructing a robust predictive model that accurately assesses an individual's heart attack risk based on the diverse attributes within the dataset. This model places a strong emphasis on BMI values as a key predictor of heart attack risk, enabling the timely identification of individuals at higher risk, particularly those with elevated BMI values, and facilitating proactive interventions and preventive measures. Addressing heart attack risk, specifically focusing on BMI management, is crucial for enhancing public health and reducing escalating healthcare costs. Furthermore, the identification of heart attack risk factors, particularly higher BMI values, empowers individuals to make informed lifestyle choices and seek medical assistance when necessary. Thus, this research project has the potential to promote preventive healthcare globally, with a pronounced emphasis on BMI-related interventions. The analysis of this dataset, with a specific focus on BMI, fosters collaboration between the fields of medicine, data science, and public health. This project aims to generate actionable recommendations and strategies for individuals and healthcare providers to mitigate heart attack risk, with a strong emphasis on addressing higher BMI as a significant predictor of heart attacks.

# Method
We used the Heart Attack Risk Prediction dataset from the internet. To answer the
question of how BMI values are affected by the risk of having a heart attack, the population of
our interest is the people at risk of heart attack. The parameter of interest is BMI values. To
better answer our question, we treated our targeted population as either a continuous or binary
population. For the continuous population, we were interested in estimating the mean BMI
values from our population, patients who are at risk of having a heart attack. For the binary
population, we set a threshold of BMI > 30 kg/𝑚2 to split our population into whether patients
had a high BMI value or not since studies have reported that individuals who have a BMI value
greater than 30 kg/𝑚2 are considered obese. Carbone et al. (2019) found that obesity and being
overweight are major risk factors for developing heart diseases and conditions, such as heart
attacks.

Further, two different sampling methods, Simple Random Sampling (SRS) and
Stratification Sampling were selected for comparison and form the basis of our research. For
SRS, we found the recommended sample size by assuming the worst-case proportion equals 0.5.
With a 95% confidence interval and the Finite Population Correction (FPC) ignored, since we
assume the population total is large enough to ignore FPC, we found the recommended sample
size to be 385 [1]. However, since we know the population total, considering FPC will yield a
more accurate result than without it. Hence, we needed a sample size of 343 [2] with FPC, given
a 95% confidence interval. Further, CLT is assumed to construct the confidence interval for the
binary population, we must check the conditions 𝑛𝑝 ≥ 10 and 𝑛(1 − 𝑝) ≥ 10 be satisfied with
our recommended sample size and worst-case proportion. Since we found that both conditions
are met, we can apply CLT in our calculations. For stratification sampling, it is critical to
determine the stratified variables because different stratified variables give us different
estimations and different standard errors. For an accurate estimate of the mean BMI value, its
standard error should be small. Since each stratum is formed based on differences between the
individuals’ shared characteristics, we found that sex, diet, obesity, and whether the patient has
diabetes or has a family history of heart-related problems are shared attributes within our
population. To compare the five study designs, we computed the within-strata variances for each
of our stratified variables and found that stratifying by sex resulted in the lowest variance,
39.09994 [3]. Since the stratification study design performs best when the within-strata variance
is the smallest, hence the between-strata variance is the largest, we decided to stratify by sex. By
optimal allocation, the resulting sample sizes chosen for the two strata, male and female, are 235
and 108 [4], respectively.

# Result and Data Analysis

For the continuous population, with a sample size of 343 using SRS, we estimated the
mean BMI value to be 29.09 kg/𝑚2 [5] with a standard error of 0.324 [7], where its respective
95% confidence interval was (28.46, 29.73) [9]. Hence, we can conclude from the result of the
SRS method that we are 95% confident that the true mean BMI of people at risk of heart attack is
between 28.46 kg/𝑚2 and 29.73 kg/𝑚2. On the other hand, with a sample size of 235 for the
stratum male and 108 for the stratum female using stratified sampling, we estimated the mean
BMI value for each stratum and then took the sum of their weighted BMI means. We found the
stratified estimator for means to be 29.14 kg/𝑚2 [6] with a standard error of 0.31 [8], where its
respective 95% confidence interval was (28.52, 29.75) [9]. Thus, we can conclude from the
result of the stratified sampling method that we are 95% confident that the true mean BMI of
people at risk of heart attack is between 28.52 kg/𝑚2 and 29.75 kg/𝑚2. It is known that a BMI
value of 25 kg/𝑚2 is considered overweight (Body Mass Index (BMI) Calculator, n.d.). As both
the 95% confidence interval using SRS and stratification sampling methods capture BMI values
greater than 25 kg/𝑚2, this suggests that our population of interest, patients at risk of heart
attack, are identified as being overweight.

For the binary population, with a sample size of 343 using SRS, we estimated the
proportion of patients with high BMI (BMI greater than 30 kg/𝑚2) to be 0.458 [10] with a
standard error of 0.027 [11]. Following a 95% confidence interval of this estimator, we
computed (0.405, 0.510) [12]. Therefore, we are 95% confident that the true proportion of
patients with high BMI (BMI values greater than 30 kg/𝑚2) is between 0.405 and 0.510.
Conversely, with a sample size of 235 for the stratum male and 108 for the stratum female using
stratified sampling, we calculated the sum of the weighted stratified proportions of patients with
high BMI, given by the overall stratified proportion estimate of 0.457 [13], with a standard error
of 0.025 [8]. Thus, its 95% confidence interval yielded the interval (0.407, 0.506) [12]. So, we
can say that we are 95% confident that the true proportion of patients with high BMI (BMI
values greater than 30 kg/𝑚2) is between 0.407 and 0.506. Since both the 95% confidence
interval of the SRS and Stratification sampling methods cover 0.5, we cannot conclude that less
than half the population who are at risk of heart attacks have high BMI.

The stratified sampling method may provide more coverage of our population, being
more easily administered and less costly than SRS. Such an idea aligns with our findings as we
computed a lower standard error of our stratification estimates than the SRS estimates for both
our continuous and binary population, implying higher accuracy and efficiency of our estimates.

# Final Conclusion and Discussion
In conclusion, our investigation utilized Simple Random Sampling (SRS) and Stratified
Sampling to assess parameters of the population at risk of heart attacks, focusing specifically on
Body Mass Index (BMI) and the proportion of individuals with elevated BMI. The comparative
analysis of sampling methods revealed that Stratified Sampling exhibited superior performance,
as evidenced by lower standard errors and increased accuracy in both continuous and binary
population estimations. This emphasizes the effectiveness of gender-based stratification in
enhancing result reliability. The confidence intervals for mean BMI generated by both sampling
methods encompassed values exceeding 25 kg/m^2, indicative of elevated BMI, while
simultaneously refuting the proposition that less than half the population is at risk of heart
attacks due to high BMI.

Despite the insightful contributions of our study, several limitations merit consideration.
Foremost among these is the reliance on a dataset not procured by our research team, raising
concerns regarding data quality, accuracy, and potential biases inherent in the original data
source. Moreover, our study failed to account for other factors that may influence BMI and heart
attack risk like high blood pressure, physical inactivity, and smoking history of patients,
necessitating future research endeavours to incorporate a more comprehensive set of variables.
and heart attack risk, necessitating future research endeavours to incorporate a more
comprehensive set of variables. The use of an AI-generated dataset introduces uncertainties in
real-world applicability. Additionally, the absence of a reference from a prior study to inform our
determination of an appropriate sample size poses implications for the precision of our
continuous estimates.

In terms of generalizability, our findings are restricted by the specific nature of our
dataset and the characteristics of the population under study. Consequently, we assert that our
dataset and results are not entirely applicable for extrapolation to broader or dissimilar
populations. While our research methodology endeavours to bolster the robustness of our
conclusions, caution is warranted when extending the implications of our findings. Our results
find optimal relevance within populations sharing characteristics akin to those in our study. To
enhance the broader applicability of our conclusions, future research initiatives should scrutinize
a more expansive array of features and execute validation studies across diverse datasets. In
summary, our study aspires to strike a judicious balance between methodological rigour and
pragmatic constraints, with findings most aptly suited for populations resembling the group
under investigation. Consequently, caution is advised, particularly when applying our
conclusions to a markedly distinct population.