# **Analysis of Heart Disease Mortality Data**

#### **Samarth Tuli, Ndenko Fontem, Yi Zhu, Kimia Samieinejad**    


### **Importance of Heart Disease Mortality Research**    
People across the world have experiencing several physical diseases ranging from diabetes to arthritis and mental health disorders from depression to suicide within the past 2 decades, which has made global health a significant priority of the research community. However, one of the most important disease types that is being studied is heart disease, especially coronary (ischemic) artery disease. According to the [World Health Organization](https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1), heart disease is the leading global cause of death, which results in 17.9 million annual deaths. 4 out of every 5 heart disease deaths are caused by heart attacks and strokes, which can be caused by any mental or physical health factors, and one third of heart disease deaths occur prematurely to people under the age of 70, which means everyone is vulnerable to heart disease. Research efforts led by NIH and WHO have shown that a wide variety of behavioral, medical, and socioeconomic risk factors can be underlying causes for heart disease mortality including tobacco and alcohol consumption, obesity, lack of physical activity, malnutrition, increased blood pressure, and restricted access to primary healthcare facilities. Other commorbidities (pre-existing conditions) such as diabetes, arthritis, chronic kidney disease, and anxiety problems. Given the massive amount of data collected on these factors and how heart disease deaths can vary for each country's population, data scientists have a significant role to break down this data into insights that can guide the future path of heart disease mortality research. In our dataset, we are defining heart disease mortality rate as the number of heart disease related deaths per 100,000 people and will cover it for all countries from 2012-2017 and will focus upon countries within North America and Europe.

### **What is coronary heart disease?**

According to the NIH's latest research, [coronary heart disease](https://www.nhlbi.nih.gov/health/coronary-heart-disease) is a cardiovascular disease where the arteries cannot provide sufficient oxygen to a person's bloodstream. The primary cause of CHD is high cholesterol forming plaque along the lining of the arteries, which can constrict blood flow, cause blood vessels to stop functioning normally, and also increase the chances of severe chest pain or heart attacks or strokes or cardiac arrest. Although the risk of coronary heart disease can be reduced through lifestyle changes, many people don't take immediate action. This has resulted in it becoming globally widespread with 650000 deaths per year due to generalized heart disease, 11% of adults being diagnosed with heart disease, and 366000 annual deaths due to CHD specifically in the US alone. This background clearly demonstrates that heart disease mortality data needs to be analyzed by data scientists to provide key insights that will mitigate risk for future heart patients.

### **Tutorial Purpose**

The objective of this tutorial is to evaluate many factors that may positively or negatively affect the heart disease mortality rates of populations across different countries so that we can get a better understanding of what factors should be focused upon most by the research community to reduce the overall risk of heart disease-related deaths in North America and European countries. Data science is the right tool to achieve this because it will allow us to deconstruct complex heart disease mortality data into specific insights and recommendations that can be used by heart disease research leaders and policymakers to take immediate action through a 5-stage pipeline: data collection and processing, exploratory data analysis and visualizations, analysis/hypothesis testing/use of ML models, and insights & policy decisions.

## **Data Collection and Processing**

In [1]:
print("Hello World. Testing notebook appears on GitHub page.")

Hello World. Testing notebook appears on GitHub page.


In [2]:
print("Yi Zhu testing")

Yi Zhu testing


In [3]:
print("Ndenko testing")

Ndenko testing


In [4]:
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import re

# Import and process crude suicide rates data for both sexes (# of people per 100,000)
suicide_rates_all_countries = pd.read_csv('data/suicide_rates_all_countries2012-2017.csv')
suicide_rates_all_countries.head()
suicide_rates_all_countries = suicide_rates_all_countries[suicide_rates_all_countries["Dim1"] == "Both sexes"]
suicide_rates_all_countries = suicide_rates_all_countries[['Location', 'Period', 'FactValueNumeric']]
suicide_rates_all_countries = suicide_rates_all_countries.rename(columns = {"Location": "Country", "Period": "Year", "FactValueNumeric":"Suicides per 10000 people"})
suicide_rates_all_countries = suicide_rates_all_countries.sort_values(["Year", "Country"], ascending = True)
suicide_rates_all_countries = suicide_rates_all_countries.reset_index(drop = True)
suicide_rates_all_countries



Unnamed: 0,Country,Year,Suicides per 10000 people
0,Afghanistan,2012,4.01
1,Albania,2012,5.18
2,Algeria,2012,2.90
3,Angola,2012,6.93
4,Antigua and Barbuda,2012,0.00
...,...,...,...
1093,Venezuela (Bolivarian Republic of),2017,2.16
1094,Viet Nam,2017,7.68
1095,Yemen,2017,5.58
1096,Zambia,2017,8.53
