# Heart Failure Prediction – Data Exploration
This notebook performs basic exploration on the heart.csv dataset from Kaggle.

In [1]:
import pandas as pd

# Load the heart failure dataset
df = pd.read_csv('heart.csv')

# Preview the first few rows
print("Sample data:")
print(df.head())

# Check shape and column info
print("\nNumber of rows and columns:", df.shape)
print("\nColumn names and data types:")
print(df.info())

Sample data:
   Age Sex ChestPainType  RestingBP  Cholesterol  FastingBS RestingECG  MaxHR  \
0   40   M           ATA        140          289          0     Normal    172   
1   49   F           NAP        160          180          0     Normal    156   
2   37   M           ATA        130          283          0         ST     98   
3   48   F           ASY        138          214          0     Normal    108   
4   54   M           NAP        150          195          0     Normal    122   

  ExerciseAngina  Oldpeak ST_Slope  HeartDisease  
0              N      0.0       Up             0  
1              N      1.0     Flat             1  
2              N      0.0       Up             0  
3              Y      1.5     Flat             1  
4              N      0.0       Up             0  

Number of rows and columns: (918, 12)

Column names and data types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column         

In [2]:
print(df.shape)
print(df.info())
print(df.isnull().sum())

(918, 12)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             918 non-null    int64  
 1   Sex             918 non-null    object 
 2   ChestPainType   918 non-null    object 
 3   RestingBP       918 non-null    int64  
 4   Cholesterol     918 non-null    int64  
 5   FastingBS       918 non-null    int64  
 6   RestingECG      918 non-null    object 
 7   MaxHR           918 non-null    int64  
 8   ExerciseAngina  918 non-null    object 
 9   Oldpeak         918 non-null    float64
 10  ST_Slope        918 non-null    object 
 11  HeartDisease    918 non-null    int64  
dtypes: float64(1), int64(6), object(5)
memory usage: 86.2+ KB
None
Age               0
Sex               0
ChestPainType     0
RestingBP         0
Cholesterol       0
FastingBS         0
RestingECG        0
MaxHR             0
ExerciseAngina    0
Oldpeak

In [3]:
# Finding how many patients have heart disease
print(df['HeartDisease'].value_counts())

HeartDisease
1    508
0    410
Name: count, dtype: int64


In [7]:
# Average Age
average_age = round(df['Age'].mean())
print(f"The average age of the patients is {average_age} years")

The average age of the patients is 54 years


See if older patients are at higher risk

In [9]:
# Create two groups
older = df[df['Age'] >= 60]
younger = df[df['Age'] < 60]

# Calculate heart disease rate for each group
older_rate = (older['HeartDisease'].sum() / len(older)) * 100
younger_rate = (younger['HeartDisease'].sum() / len(younger)) * 100

# Print the results, rounded to 1 decimal
print(f"Heart disease rate in older patients (60+): {older_rate:.1f}%")
print(f"Heart disease rate in younger patients (<60): {younger_rate:.1f}%")

Heart disease rate in older patients (60+): 73.1%
Heart disease rate in younger patients (<60): 48.6%


 Do older patients have higher risk of heart disease?

To explore this, we split the patients into two groups:
- Age 60 and above
- Under age 60

We then calculated the percentage of patients with heart disease in each group:

- **Older patients (60+): 73.1%**
- **Younger patients (<60): 48.6%**

🔍 **Conclusion**: Older patients in this dataset have a significantly higher rate of heart disease.