# Heart Rate Diagnose Analysis

### Data Processing

_Dataset Attribution Information:_<br>
- age: This column represents the age of the patient <br/>
- sex: This column represents the sex of the patient. It's typically encoded as 0 for female and 1 for male <br/>
- cp: This column represents the chest pain type experienced by the patient. It's categorical with 4 different values indicating different types of chest pain <br/>
- trestbp: This column represents the resting blood pressure of the patient measured in mm Hg (millimeters of mercury) <br/>
- chol: This column represents the serum cholesterol level of the patient measured in mg/dl (milligrams per deciliter) <br/>
- fbs: This column represents whether the patient's fasting blood sugar is greater than 120 mg/dl. It's a binary variable encoded as 0 for false and 1 for true <br/>
- restecg: This column represents the resting electrocardiographic results of the patient. It's categorical with values 0, 1, and 2 representing different results <br/>
- thalach: This column represents the maximum heart rate achieved by the patient during exercise <br/>
- exang: This column represents whether exercise induced angina is present. It's a binary variable encoded as 0 for false and 1 for true <br/>
- oldpeak: This column represents the ST depression induced by exercise relative to rest <br/>
- slope: This column represents the slope of the peak exercise ST segment. It's categorical with several possible values indicating different types of slope <br/>
- ca: This column represents the number of major vessels (0-3) colored by fluoroscopy <br/>
- thal: This column represents a diagnostic technique called thallium scintigraphy, with values 0, 1, and 2 indicating different conditions: normal, fixed defect, and reversible defect <br/>
- target: This column represents the presence of heart disease. It's a binary variable encoded as 0 for no heart disease and 1 for presence of heart disease <br/>

In [2]:
# importing python libraries

import pandas as pd

In [3]:
# importing and reading data stored in heart-disease-dataset.csv

dataFrame = pd.read_csv('./csv/heart-disease-dataset.csv')

In [4]:
# printing dataset overview stored in dataframe

dataFrame

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1


Q. what does ST depression means in 'oldpeak'? <br><br>
A. ST depression refers to a change seen on an electrocardiogram (ECG or EKG) in which the ST segment of the ECG wave is lower or depressed compared to the baseline. The ST segment represents the interval between the end of the QRS complex and the beginning of the T wave.

"oldpeak" represents the amount of ST depression induced by exercise relative to rest. It's a measure of how much the ST segment of the ECG deviates downwards during exercise compared to the resting state. This is often indicative of myocardial ischemia, which means there's reduced blood flow to the heart muscle during exercise.

Measuring the amount of ST depression (oldpeak) can be an important diagnostic indicator for coronary artery disease and other heart conditions. Greater ST depression during exercise is generally associated with a higher likelihood of coronary artery disease or other heart problems. It's one of the many metrics used by healthcare professionals to assess the cardiovascular health of a patient.

In [5]:
# changing column names

dataFrame.columns = ['Age', 'Sex', 'Chest Pain Type', 'Resting BP', 'Serum Cholestoral (mg/dl)', 'Fasting Blood Sugar > 120 mg/dl', 'Resting Electro-Cardiographic Results', 'Max Heart Rate Achieved', 'Exercise Induced Angina', 'Old-Peak', 'Slope', 'No. of Major Vessels (fluoroscopy)', 'Thalium Scintigraphy', 'Presense of Heart Disease']

In [6]:
# printing dataset overview stored in dataframe

dataFrame

Unnamed: 0,Age,Sex,Chest Pain Type,Resting BP,Serum Cholestoral (mg/dl),Fasting Blood Sugar > 120 mg/dl,Resting Electro-Cardiographic Results,Max Heart Rate Achieved,Exercise Induced Angina,Old-Peak,Slope,No. of Major Vessels (fluoroscopy),Thalium Scintigraphy,Presense of Heart Disease
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1


In [7]:
# checking brief statistical summary of dataset

dataFrame.describe()

Unnamed: 0,Age,Sex,Chest Pain Type,Resting BP,Serum Cholestoral (mg/dl),Fasting Blood Sugar > 120 mg/dl,Resting Electro-Cardiographic Results,Max Heart Rate Achieved,Exercise Induced Angina,Old-Peak,Slope,No. of Major Vessels (fluoroscopy),Thalium Scintigraphy,Presense of Heart Disease
count,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0,1025.0
mean,54.434146,0.69561,0.942439,131.611707,246.0,0.149268,0.529756,149.114146,0.336585,1.071512,1.385366,0.754146,2.323902,0.513171
std,9.07229,0.460373,1.029641,17.516718,51.59251,0.356527,0.527878,23.005724,0.472772,1.175053,0.617755,1.030798,0.62066,0.50007
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,48.0,0.0,0.0,120.0,211.0,0.0,0.0,132.0,0.0,0.0,1.0,0.0,2.0,0.0
50%,56.0,1.0,1.0,130.0,240.0,0.0,1.0,152.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,275.0,0.0,1.0,166.0,1.0,1.8,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [8]:
# checking NaN values by each column

dataFrame.isnull().sum()

Age                                      0
Sex                                      0
Chest Pain Type                          0
Resting BP                               0
Serum Cholestoral (mg/dl)                0
Fasting Blood Sugar > 120 mg/dl          0
Resting Electro-Cardiographic Results    0
Max Heart Rate Achieved                  0
Exercise Induced Angina                  0
Old-Peak                                 0
Slope                                    0
No. of Major Vessels (fluoroscopy)       0
Thalium Scintigraphy                     0
Presense of Heart Disease                0
dtype: int64

### Data Cleaning

There is not any NaN values present in our dataset so we do not need to do perform data cleaning

### Data Analyzing

KPI's<br>
- Heart Disease Prevalence Rate = (No. of individuals having heart disease / Total number of people in dataset)
- Average Age
- Average Resting Blood Pressure
- Average Serum Cholesterol Level
- Percentage of individuals with High Fasting Blood Sugar
- Maximum Heart Rate Achieved by male & female
- Percentage of individuals with Excercise-Induced Angina
- Average ST Depression
- Percentage of individuals with Abnormal Resting Electrocardiographic Results

In [28]:
# finding heart disease prevalence rate throughout dataset

dataFrame.shape
count = 0
for i in range(len(dataFrame)):
    if dataFrame['Presense of Heart Disease'][i] == 1:
        count += 1

print(f"Heart Patient: {count}")
print(f"Non Heart Patient: {len(dataFrame) - count}")
print(f"Total Patients: {526 + 499}")
heart_disease_prevalence_rate = ((count/len(dataFrame)) * 100)
heart_disease_prevalence_rate = round(heart_disease_prevalence_rate, 1)
print(f"Heart Disease Prevalence Rate: {heart_disease_prevalence_rate} %")


Heart Patient: 526
Non Heart Patient: 499
Total Patients: 1025
Heart Disease Prevalence Rate: 51.3 %


In [10]:
# finding average age throughout dataset

average_age = dataFrame['Age'].mean()
print(f"Average age: {average_age.astype(int)} years")


Average age: 54 years


In [11]:
# finding average resting blood pressure throughout dataset

average_resting_blood_pressure = dataFrame['Resting BP'].mean()
print(f"Average resting blood pressure: {average_resting_blood_pressure.astype(int)} BPM")

Average resting blood pressure: 131 BPM


In [12]:
# finding average serum cholesterol level throughout dataset

average_serum_cholesterol = dataFrame['Serum Cholestoral (mg/dl)'].mean()
print(f"Average serum cholesterol: {average_serum_cholesterol.astype(int)} mg/dl")

Average serum cholesterol: 246 mg/dl


In [13]:
# finding percentage of individuals with high fasting blood sugar throughout dataset

count = 0
for i in range(len(dataFrame)):
    if dataFrame['Fasting Blood Sugar > 120 mg/dl'][i] == 1:
        count += 1

high_fasting_blood_sugar_rate = ((count/len(dataFrame)) * 100)
high_fasting_blood_sugar_rate = round(high_fasting_blood_sugar_rate, 1)
print(f"High fasting blood sugar rate: {high_fasting_blood_sugar_rate} %")

High fasting blood sugar rate: 14.9 %


In [14]:
# finding max heart rate achieved by male & female throughout dataset

female_count = 0
for i in range(len(dataFrame)):
    if dataFrame['Sex'][i] == 0: # sex == 0 female, 1 male
        female_count += 1

print(f"Females in dataset: {female_count}")
print(f"Males in dataset: {len(dataFrame) - female_count}")

max_heart_rate = dataFrame.groupby('Sex')['Max Heart Rate Achieved'].max()
print(f"Max Heart Rate (female): {max_heart_rate[0]} BPM")
print(f"Max Heart Rate (male): {max_heart_rate[1]} BPM")

Females in dataset: 312
Males in dataset: 713
Max Heart Rate (female): 192 BPM
Max Heart Rate (male): 202 BPM


In [24]:
# finding percentage of individuals with Excercise-Induced Angina

count = 0
for i in range(len(dataFrame)):
    if dataFrame['Exercise Induced Angina'][i] == 1:
        count += 1

exercise_induced_angina_rate = ((count/len(dataFrame)) * 100)
exercise_induced_angina_rate = round(exercise_induced_angina_rate, 1)
print(f"Exercise Induced Angina Rate: {exercise_induced_angina_rate} %")

Exercise Induced Angina Rate: 33.7 %


In [16]:
# finding average ST depression throughout dataset

average_st_depression = dataFrame['Old-Peak'].mean()
print(f"Average ST depression: {average_st_depression.astype(int)}")

Average ST depression: 1


In [17]:
# finding percentage of individuals with abnormal resting electrocardiograhic results

count = 0
for i in range(len(dataFrame)):
    if dataFrame['Resting Electro-Cardiographic Results'][i] == 1 or dataFrame['Resting Electro-Cardiographic Results'][i] == 2:
        count += 1

abnormal_resting_electrocardiographic_rate = ((count/len(dataFrame)) * 100)
abnormal_resting_electrocardiographic_rate = round(abnormal_resting_electrocardiographic_rate, 1)
print(f"Abnormal Resting Electrocardiographic Rate: {abnormal_resting_electrocardiographic_rate} %")

Abnormal Resting Electrocardiographic Rate: 51.5 %


In [20]:
# creating new column 'Gender' on the basis of sex column's 0 & 1 values

dataFrame['Gender'] = dataFrame['Sex'].apply(lambda x: 'Male' if x == 1 else 'Female')

In [21]:
dataFrame

Unnamed: 0,Age,Sex,Chest Pain Type,Resting BP,Serum Cholestoral (mg/dl),Fasting Blood Sugar > 120 mg/dl,Resting Electro-Cardiographic Results,Max Heart Rate Achieved,Exercise Induced Angina,Old-Peak,Slope,No. of Major Vessels (fluoroscopy),Thalium Scintigraphy,Presense of Heart Disease,Gender
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0,Male
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0,Male
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0,Male
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0,Male
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0,Female
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1,Male
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0,Male
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0,Male
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1,Female


Exporting dataset into csv format for visualization

In [23]:
dataFrame.to_csv('./csv/cleaned-heart-disease-dataset.csv', index = False)

### Insights

- Heart Patient: 526
- Non Heart Patient: 499
- Total Patients: 1025
- Heart Disease Prevalence Rate: 51.3 %
- Average age in dataset: 54 years
- Average resting blood pressure: 131 BPM
- Average serum cholesterol: 246 mg/dl
- High fasting blood sugar rate: 14.9 %
- Females in dataset: 312
- Males in dataset: 713
- Max Heart Rate (female): 192 BPM
- Max Heart Rate (male): 202 BPM
- Exercise Induced Angina Rate: 33.7 %
- Average ST depression: 1
- Abnormal Resting Electrocardiographic Rate: 51.5 %