
![image](https://storage.googleapis.com/kaggle-datasets-images/4134888/7159329/8685cd8fb7c162e34269921f17687cbe/dataset-cover.jpeg?t=2023-12-09-07-27-45)


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Introduction</p></div>

This heart disease dataset, sourced from a reputable multispecialty hospital in India, comprises a rich array of information encompassing 14 common features, making it a valuable resource for cardiovascular research. With a cohort of 1000 subjects and 12 distinct features, this dataset serves as a pivotal tool for developing early-stage heart disease detection methods and constructing predictive machine-learning models. Its diverse and comprehensive nature positions it as a significant asset in advancing research endeavors aimed at understanding and mitigating cardiovascular risks.

<h2 style='border:0; border-radius: 15px; font-weight: 150; color:#9b006e; font-size:250%'><center> Cardiovascular Disease Dataset Description
</center></h2>

|S.No|Attribute|Explain|Unit|Type of Data|
|----|---------|-------|----|------------|
|1|**Patient Identification Number**|patientid|Numeric|Number|
|2|**Age**|age|Numeric|In Years|
|3|**Gender**|gender|Binary|0 (female) / 1 (male)|
|4|**Resting blood pressure**|restingBP|Numeric|94-200 (in mm HG)|
|5|**Serum cholesterol**|serumcholestrol|Numeric|126-564 (in mg/dl)|
|6|**Fasting blood sugar**|fastingbloodsugar|Binary|0 (false) / 1 (true) > 120 mg/dl|
|7|**Chest pain type**|chestpain|Nominal|0 (typical angina), 1 (atypical angina), 2 (non-anginal pain), 3 (asymptomatic)|
|8|**Resting electrocardiogram results**|restingelectro|Nominal|0 (normal), 1 (ST-T wave abnormality), 2 (probable or definite left ventricular hypertrophy)|
|9|**Maximum heart rate achieved**|maxheartrate|Numeric|71-202|
|10|**Exercise induced angina**|exerciseangina|Binary|0 (no) / 1 (yes)|
|11|**Oldpeak = ST**|oldpeak|Numeric|0-6.2|
|12|**Slope of the peak exercise ST segment**|slope|Nominal|1 (upsloping), 2 (flat), 3 (downsloping)|
|13|**Number of major vessels**|noofmajorvessels|Numeric|0, 1, 2, 3|
|14|**Classification (target)**|target|Binary|0 (Absence of Heart Disease), 1 (Presence of Heart Disease)|


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Import Modules</p></div>


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
from termcolor import colored



import warnings
warnings.filterwarnings("ignore")

print(colored('\nAll libraries imported succesfully.', 'blue'))

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Load the Data</p></div>


In [None]:
df = pd.read_csv('Cardiovascular_Disease_Dataset/Cardiovascular_Disease_Dataset.csv')
df.head().style.set_properties(**{'background-color':'blue','color':'white','border-color':'#8b8c8c'})

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Data Information</p></div>

In [None]:
shape_of_dataframe = df.shape

print("No. of samples:")
print("No. of columns:",)



In [None]:
df.info()

## Summary Statistics

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0">ﮩ٨ـ❤️ﮩ٨ـﮩﮩ<b> </b>Data Preprocessning</p></div>

## Handling Missing Values

In [None]:
df.columns

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Exploratory Data Analysis (EDA)📊</p></div>


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 1.What is the age range of patients in the dataset?</b></font>

**Answer: Age Range: 20 - 80**

**Explanation: The age range is determined by finding the minimum and maximum age values in the dataset. In this case, patients' ages range from 20 to 80 years. Use df['age'].min() and df['age'].max()**

In [None]:
min_age=
max_age=
age_range = f"Age Range: {min_age} - {max_age}"
print(age_range)


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 2. How many males and females are represented in the dataset?</b></font>

**Answer: Female: 1, Male: 0**

**Explanation: The dataset contains only female patients, as indicated by the 'gender' column where 1 represents female and 0 represents male. Use df['gender'].value_counts()**


In [None]:
gender_count = 
print(gender_count)


## Visualise Age Distribution by Gender:

What do you interpret from this plot?

In [None]:
# Visualization:
plt.figure(figsize=(10, 6))
sns.histplot(x='age', hue='gender', data=df, palette='muted', multiple='stack', bins=15)
plt.title('Age Distribution by Gender')
plt.xlabel('Age')
plt.ylabel('Count')
plt.legend(title='Gender', labels=['Female', 'Male'])
plt.show()

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 3. What is the most common type of chest pain observed in the patients?</b></font>

**chestpain --> 0 (typical angina), 1 (atypical angina), 2 (non-anginal pain), 3 (asymptomatic)**
**Answer: Chest Pain Type 0**

**Explanation: Chest pain type 0 is the most common among the patients, as determined by counting the occurrences in the 'chestpain' column. Use df['chestpain'].value_counts()**


In [None]:
chest_pain_counts = 
print(chest_pain_counts)

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 4. What is the average resting blood pressure among the patients?</b></font>

**Answer: Average Resting Blood Pressure: 151.75 mm Hg**

**Explanation: The average resting blood pressure is calculated by taking the mean of the values in the 'restingBP' column. Use df['restingBP'].mean()**


In [None]:
average_resting_bp = 
print(f"Average Resting Blood Pressure: {average_resting_bp:.2f} mm Hg")


## Distribution of resting blood pressure:

What do you interpret from this plot?

In [None]:
sns.histplot(df['restingBP'], color='mediumseagreen') 
plt.title('Distribution of Resting Blood Pressure')
plt.show()

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 5. How does serum cholesterol vary across different patients?</b></font>

**Explanation: Serum cholesterol distribution is visualized using both histogram and a boxplot, providing insights into the spread and central tendency of cholesterol levels among patients.**


In [None]:
sns.histplot(df['serumcholestrol'], color='blue') 
plt.title('Distribution of Serum Cholesterol')
plt.xlabel('Serum Cholesterol')
plt.ylabel('Frequency')
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(x='serumcholestrol', data=df, color='royalblue')
plt.title('Distribution of Serum Cholesterol')
plt.xlabel('Serum Cholesterol')
plt.show()

## Annotating the Box Plot

plt.text(x, y, text) adds text at position x, y. Here, x is the median value, and y is arbitrarily set to 0.1 to position the text on the plot. The text displays the median value, formatted to two decimal places.

ha='center' and va='center' set the horizontal and vertical alignment of the text, respectively.

fontweight='bold', color='white', and backgroundcolor='green' style the text, making it bold with white letters on a green background.

In [None]:
serumcholestrol_data=df['serumcholestrol']
# Recreating the boxplot for serum cholesterol data
plt.figure(figsize=(12, 8))
boxplot = sns.boxplot(x=serumcholestrol_data, color='royalblue')

# Calculating statistics for annotations
median = np.median(serumcholestrol_data)
quartile1 = np.percentile(serumcholestrol_data, 25)
quartile3 = np.percentile(serumcholestrol_data, 75)
iqr = quartile3 - quartile1  # Interquartile range
upper_whisker = quartile3 + 1.5 * iqr
lower_whisker = quartile1 - 1.5 * iqr

# Annotating the median
plt.text(median, 0.1, f'Median: {median:.2f}', ha='center', va='center', fontweight='bold', color='white', backgroundcolor='green')

# Annotating the quartiles
plt.text(quartile1, 0.2, f'Q1: {quartile1:.2f}', ha='center', va='center', fontweight='bold', color='white', backgroundcolor='blue')
plt.text(quartile3, 0.2, f'Q3: {quartile3:.2f}', ha='center', va='center', fontweight='bold', color='white', backgroundcolor='blue')

# Annotating the whiskers
plt.text(upper_whisker, 0.1, f'Upper Whisker: {upper_whisker:.2f}', ha='center', va='center', fontweight='bold', color='white', backgroundcolor='red')
plt.text(lower_whisker, 0.1, f'Lower Whisker: {lower_whisker:.2f}', ha='center', va='center', fontweight='bold', color='white', backgroundcolor='red')

plt.title('Annotated Distribution of Serum Cholesterol')
plt.xlabel('Serum Cholesterol')


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 6. What percentage of patients have fasting blood sugar greater than 120 mg/dl? </b></font>

**Answer: Percentage of patients with fasting blood sugar > 120 mg/dl: 29.60%**

**Explanation: The percentage is calculated by dividing the number of patients with fasting blood sugar greater than 120 mg/dl by the total number of patients. Use (df['fastingbloodsugar'].sum() / len(df)) * 100**


In [None]:
percentage_high_fasting_sugar = 
print(f"Percentage of patients with fasting blood sugar > 120 mg/dl: {percentage_high_fasting_sugar:.2f}%")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 7. What are the predominant resting electrocardiogram results in the dataset?</b></font>

**Answer: 0 (normal), 1 (ST-T wave abnormality), 2 (probable or definite left ventricular hypertrophy)**

**Explanation: The counts of normal and abnormal resting electrocardiogram results are determined from the 'restingrelectro' column. Use df['restingrelectro'].value_counts()**


In [None]:
resting_electro_counts = 
print(resting_electro_counts)


## Visualization:

What do you interpret from this plot?

In [None]:
chestpain_labels = {0: 'normal', 1: 'ST-T wave abnormality', 2: 'probable or definite'}
counts = df['restingrelectro'].value_counts() 

# Replace index with descriptive labels using chestpain_labels dictionary
counts.index = counts.index.map(chestpain_labels)

# Plotting
ax = counts.plot(kind='bar', color='skyblue', title='Distribution of Resting Electrocardiogram Results')
ax.set_xlabel('Resting Electrocardiogram Results')
ax.set_ylabel('Count')

# Annotating bars with counts
for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='bottom', color='black')

plt.show()

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 8. What is the average maximum heart rate achieved by the patients on average?</b></font>

**Answer: Average Maximum Heart Rate: 145.48**

**Explanation: The average maximum heart rate is calculated by taking the mean of values in the 'maxheartrate' column. Use df['maxheartrate'].mean()**


In [None]:
average_max_heart_rate = 
print(f"Average Maximum Heart Rate: {average_max_heart_rate:.2f}")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 9. How many patients experienced exercise-induced angina?</b></font>

**Answer: Number of Patients with Exercise-Induced Angina: 498**

**Explanation: The count of patients with exercise-induced angina is obtained from the 'exerciseangia' column. Use df['exerciseangia'].sum()**


In [None]:
exercise_angina_count = 
print(f"Number of Patients with Exercise-Induced Angina: {exercise_angina_count}")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 10. What is the average oldpeak (ST depression induced by exercise relative to rest) among the patients?</b></font>

**Answer: Average Oldpeak: 2.71**

**Explanation: The average oldpeak is calculated by taking the mean of values in the 'oldpeak' column. Use df['oldpeak'].mean()**

In [None]:
average_oldpeak = 
print(f"Average Oldpeak: {average_oldpeak:.2f}")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 11. How is the slope of the peak exercise ST segment distributed in the dataset?</b></font>

**Explanation: The distribution of the slope is visualized using a countplot, showing the frequency of each slope type in the 'slope' 
column.**

**1 (upsloping), 2 (flat), 3 (downsloping)**


In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='slope', data=df, palette='viridis')
plt.title('Distribution of Slope of Peak Exercise ST Segment')
plt.xlabel('Slope')
plt.ylabel('Count')
plt.show()

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 12. What is the range of the number of major vessels in the patients?</b></font>

**Answer: Number of Major Vessels Range: 0 - 3**

**Explanation: The range is determined by finding the minimum and maximum values in the 'noofmajorvessels' column.
Use df['noofmajorvessels'].min() df['noofmajorvessels'].max()**


In [None]:
min_vessels=
max_vessels=
vessels_range = f"Number of Major Vessels Range: {min_vessels} - {max_vessels}"
print(vessels_range)


## Visualization:

What do you interpret from this plot?

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='noofmajorvessels', data=df, palette='viridis')
plt.title('Number of major vessels')
plt.xlabel('Major Vessels')
plt.ylabel('Count')
plt.show



<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 13. What percentage of patients in the dataset have heart disease (target = 1)?</b></font>

**Answer: Percentage of Patients with Heart Disease: 58.00%**

**Explanation: The percentage is calculated by dividing the number of patients with heart disease (target = 1) by the total number of patients. Use (df['target'].sum() / len(df)) * 100**

In [None]:
percentage_heart_disease = 
print(f"Percentage of Patients with Heart Disease: {percentage_heart_disease:.2f}%")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 14. Can you identify the patient with the highest age in the dataset?</b></font>

**Answer:**
* Patient ID: 1160678
* Age: 80
* Gender: Female
* Chest Pain: 1
* Target: 1 (Heart Disease)

**Explanation: The patient with the highest age is identified by finding the maximum value in the 'age' column and extracting other details.**

In [None]:
# Find the maximum age in the dataset
max_age = df['age'].max()

# Filter the DataFrame to only include rows where the age is equal to the maximum age
oldest_patients = df[df['age'] == max_age]

print("Details of the Oldest Patients:")
print(oldest_patients)



<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 15. Who is the patient with the lowest resting blood pressure?</b></font>

**Answer:**
* Patient ID: 119250
* Age: 40
* Gender: Female
* Chest Pain: 0
* Target: 0 (No Heart Disease)

**Explanation: The patient with the lowest resting blood pressure is identified by finding the minimum value in the 'restingBP' column and extracting other details.**


In [None]:
min_BP = df['restingBP'].min()
print("Lowest resting BP is", min_BP)

lowest_BP = df[df['restingBP'] == min_BP]

print("Details of the patient withlowest resting blood pressure:")
print(lowest_BP)

<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 16. What is the correlation between age and maximum heart rate?</b></font>

**Answer: Correlation between Age and Maximum Heart Rate: -0.04**

**Explanation: The correlation coefficient is calculated to quantify the relationship between age and maximum heart rate. Use df['age'].corr(df['maxheartrate'])**


In [None]:
correlation_age_maxheartrate = 
print(f"Correlation between Age and Maximum Heart Rate: {correlation_age_maxheartrate:.2f}")


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 17. Is there a relationship between chest pain type and the presence of heart disease?</b></font>

**Explanation: The relationship is visualized using a countplot, showing the distribution of heart disease (yes/no) for each chest pain type.
What do you interpret from the plot?**


In [None]:
plt.figure(figsize=(10, 6))
ax = sns.countplot(x=df['chestpain'].astype(str), hue=df['target'].astype(str), data=df, palette='Set1')
plt.title('Heart Disease Presence by Chest Pain Type')
# Custom legend
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles, labels=['No', 'Yes'], title='Heart Disease')
# Set the x-tick labels without modifying the dataset
ax.set_xticklabels(['Typical Angina', 'Atypical Angina', 'Non-Anginal Pain', 'Asymptomatic'], rotation=45)
plt.xlabel('Chest Pain Type')
plt.ylabel('Count')
plt.tight_layout()  # Adjust layout to make room for the rotated x-tick labels
plt.show()


<div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 18. How does serum cholesterol differ between patients with and without heart disease?</b></font>

**Explanation: The difference in serum cholesterol levels is visualized using a boxplot, comparing patients with and without heart disease. What do you observe from the boxplot?**


In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(x='target', y='serumcholestrol', data=df, palette='pastel')
plt.title('Serum Cholesterol Distribution by Heart Disease Presence')
plt.xlabel('Heart Disease Presence')
plt.ylabel('Serum Cholesterol')
plt.xticks(ticks=[0, 1], labels=['No Heart Disease', 'Heart Disease'])
plt.show()

<!-- <div style="border-radius: 10px; border: 2px solid #FFD700; padding: 15px; background-color:#FDF5E6; font-size: 100%; text-align: left;">
    
<font size="+1" color="#059c99"><b>💞 19. What is the distribution of oldpeak values for patients with heart disease?</b></font>

**Explanation: The distribution is visualized using a histogram with kernel density estimation, providing insights into the distribution of oldpeak values among patients with heart disease.**
 -->

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Target Categorizing</p></div>

In [None]:
# target classes :
df.target.unique()

In [None]:
df.head()

 # <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Standardization</p></div>
 

In [None]:
# Create X from DataFrame and y as Target
X_disease = df.drop(columns='target')
y = df.target

In [None]:
from sklearn.preprocessing import StandardScaler


# Assuming X_disease is your DataFrame containing the features to be scaled
scaler = StandardScaler().fit_transform(X_disease)
X = pd.DataFrame(scaler, columns=X_disease.columns)

# Display the descriptive statistics of the scaled features
X.describe().T.style.background_gradient(axis=0, cmap='plasma')


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Modelling</p></div>


### Test train Split

In [None]:
df.target.value_counts()

In [None]:
# Split Dataframe
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Ridge Regression</p></div>


## Performing Ridge Regression with Cross-Validation

In [None]:
from sklearn.linear_model import RidgeCV
from sklearn.metrics import confusion_matrix


In [None]:
# Define alphas for cross-validation
alphas = [0.01, 0.1, 1, 10, 100]

# Setup the ridge regression with built-in cross-validation
ridge_cv = RidgeCV(alphas=alphas, cv=5)

# Fit the model
ridge_cv.fit(X_train, y_train)

# ridge_cv.alpha_ gives you the best alpha value found during CV
print("Best alpha value:", ridge_cv.alpha_)


In [None]:
# Predict on the test set
y_pred = ridge_cv.predict(X_test)
print(y_pred)

In [None]:
# Apply manual threshold of 0.5

y_pred_ridge = (y_pred > 0.5).astype(int)

In [None]:
# Generate the confusion matrix



# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Lasso</p></div>


In [None]:
from sklearn.linear_model import LassoCV

In [None]:
# Define a range of alpha values for Lasso
alphas = [0.001, 0.01, 0.1, 1, 10, 100]

# Fit Lasso model with cross-validation to select the best alpha
lasso = LassoCV(alphas=alphas, cv=5, max_iter=10000).fit(X_train, y_train)

# After fitting, lasso.alpha_ will give you the best alpha value found
print("Best alpha value found:", lasso.alpha_)

In [None]:
# Predict with the Lasso model
predictions = lasso.predict(X_test)

# print(predictions)

In [None]:
# Define a threshold for classification
threshold = 0.5 
y_pred_lasso = (predictions >= threshold).astype(int)

In [None]:
# Generate the confusion matrix


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>feature Importance using Lasso</p></div>


In [None]:
 # 'lasso' is your fitted LassoCV model and 'X' is your DataFrame of features

# Extract all coefficients
coefficients = lasso.coef_

# Get column names from the DataFrame
feature_names = X.columns

# Create a Series for the coefficients for easier plotting, including zeros
coeff_series_all = pd.Series(coefficients, index=feature_names)

# Sort the coefficients for better visualization
sorted_coeffs_all = coeff_series_all.sort_values()

# Plotting
plt.figure(figsize=(10, 6))
sorted_coeffs_all.plot(kind='barh')
plt.title('Importance of features for heart disease')
plt.xlabel('Coefficient Value')
plt.ylabel('Features')
plt.show()


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>LogisticRegression</p></div>


In [None]:
# Library of logistic regression with cross validation integrated:
from sklearn.linear_model import LogisticRegressionCV

In [None]:
# Initialize and fit the logistic regression model with cross-validation
log_reg_cv = LogisticRegressionCV(cv=5, max_iter=10000).fit(X_train, y_train)

In [None]:
# Predict on the test set
y_pred_log = log_reg_cv.predict(X_test)

In [None]:
# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred_lasso)


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Comparing Methods based on accuracy</p></div>

## Results :

### Question: Based on the above plots, best algorithm base on Score is?


# <div style="color:white;display:inline-block;border-radius:5px;background-image: url(https://i.postimg.cc/fyD3nrX4/cardiovas-jcdumlao.png);font-family:Nexa;overflow:hidden"><p style="padding:15px;color:white;overflow:hidden;font-size:95%;letter-spacing:0.5px;margin:0"><b>ﮩ٨ـ❤️ﮩ٨ـﮩﮩ</b>Final Modeling</p></div>


In [None]:
from sklearn.metrics import accuracy_score

# y_pred_ridge, y_pred_lasso, and y_pred_log are the predictions from Ridge, Lasso, and Logistic Regression models, respectively

accuracy_ridge = accuracy_score(y_test, y_pred_ridge)
accuracy_lasso = accuracy_score(y_test, y_pred_lasso)
accuracy_logistic = accuracy_score(y_test, y_pred_log)

In [None]:
# Model names
models = ['Ridge', 'Lasso', 'Logistic Regression']

# Corresponding accuracies
accuracies = [accuracy_ridge, accuracy_lasso, accuracy_logistic]

plt.figure(figsize=(10, 6))
bars = plt.bar(models, accuracies, color=['blue', 'green', 'orange'])

# Adding accuracy value on top of each bar
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 0.01, round(yval, 2), ha='center', va='bottom')

plt.title('Model Comparison based on Accuracy')
plt.xlabel('Model')
plt.ylabel('Accuracy')
plt.ylim([0, 1.1])  # Extend y-axis to make room for text
plt.show()

<div class="alert alert-block alert-info"> "In the next Lab we will do Kernel Ridge Regression for various kernels. Dataset:yet to be decided.!" 😊📌</div>