<a href="https://colab.research.google.com/github/vipulcvaru/SoftwareLAB/blob/main/IR_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate the diagnosis of heart patients using the standard Heart Disease Data Set (You can use Java/Python ML library classes/API.

In this step, we are importing the basic libraries required for data manipulation and Bayesian Network modeling.
- `pandas` will help us load and manipulate the dataset.
- `pgmpy` is the main library for working with probabilistic graphical models, such as Bayesian Networks.


In [None]:
!pip install pgmpy

Collecting pgmpy
  Downloading pgmpy-0.1.26-py3-none-any.whl.metadata (9.1 kB)
Downloading pgmpy-0.1.26-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pgmpy
Successfully installed pgmpy-0.1.26


In [None]:
# Importing required libraries
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
from sklearn.preprocessing import LabelEncoder
import numpy as np

In [None]:
# Load the dataset
data = pd.read_csv('/content/heart (1).csv')
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


#### Observation:
Here we see the first five rows of the dataset. Each row represents a patient, and the columns represent various factors like age, sex, chest pain type, cholesterol level, etc. The last column (`target`) indicates whether the patient has heart disease or not.


Before building the Bayesian Network, we must ensure that the data is suitable. This involves:
- Checking for missing values
- Discretizing continuous columns like age and cholesterol

In [None]:
# Check for missing values
print(f"Missing Values: {data.isnull().sum()}")

# Discretize age and cholesterol (as an example, more features can be discretized if needed)
data['age'] = pd.cut(data['age'], bins=3, labels=['Young', 'Middle', 'Old'])
data['chol'] = pd.cut(data['chol'], bins=3, labels=['Low', 'Normal', 'High'])

data.head()

Missing Values: age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,Middle,1,0,125,Low,0,1,168,0,1.0,2,2,3,0
1,Middle,1,0,140,Low,1,0,155,1,3.1,0,0,3,0
2,Old,1,0,145,Low,0,1,125,1,2.6,0,0,3,0
3,Middle,1,0,148,Low,0,1,161,0,0.0,2,1,3,0
4,Old,0,0,138,Normal,1,1,106,0,1.9,1,3,2,0


#### Observation:
- If there were any missing values, we would handle them. In this case, we may not have any missing values.
- We have successfully discretized `age` and `cholesterol` into three categories: `Young`, `Middle`, `Old` for age, and `Low`, `Normal`, `High` for cholesterol.


In this step, we build the Bayesian Network structure using the correct column names:
- The `target` column is the outcome for heart disease (0 for no disease, 1 for disease).
- We create dependencies between features that influence heart disease, such as age, cholesterol, chest pain type, and heart rate.


In [None]:
# Defining the structure of the Bayesian Network using correct column names
model = BayesianNetwork([('age', 'target'),
                         ('chol', 'target'),
                         ('cp', 'target'),  # cp: chest pain type
                         ('target', 'thalach')])  # thalach: maximum heart rate achieved

# Printing the model structure to verify the correct edges
model.edges()


OutEdgeView([('age', 'target'), ('target', 'thalach'), ('chol', 'target'), ('cp', 'target')])

#### Observation:
The structure of the Bayesian Network has been defined. The model has the following edges:
- Age, cholesterol, and chest pain type (`cp`) affect heart disease.
- Heart disease affects the maximum heart rate achieved (`thalach`).


We will now fit the model using the heart disease dataset to estimate the CPDs. We'll use `MaximumLikelihoodEstimator` to learn the parameters from the data.


In [None]:
# Fitting the model using Maximum Likelihood Estimation
model.fit(data, estimator=MaximumLikelihoodEstimator)


#### Observation:
The Bayesian Network has been successfully trained, and the CPDs for each node have been estimated using MLE.


In this step, we will use the `VariableElimination` algorithm to infer the probability of heart disease based on given patient information.
For example, we will infer the probability of a patient having heart disease when the age is "Old" and cholesterol is "High".


In [None]:
# Creating an inference object
infer = VariableElimination(model)

# Performing inference with given evidence
result = infer.query(variables=['target'], evidence={'age': 'Old', 'chol': 'High'})
print(result)


NameError: name 'VariableElimination' is not defined

#### Observation:
The output shows the probability distribution of the `target` (heart disease) variable given the evidence (age: Old, cholesterol: High). We can interpret the probability values to assess the likelihood of heart disease for this patient.


We will now test the model by adding more pieces of evidence (age, cholesterol, and chest pain type) to refine our prediction of heart disease probability.


In [None]:
# Inference with multiple pieces of evidence
result = infer.query(variables=['target'], evidence={'age': 'Middle', 'chol': 'Normal', 'cp': 2})
print(result)


+-----------+---------------+
| target    |   phi(target) |
| target(0) |        0.0000 |
+-----------+---------------+
| target(1) |        1.0000 |
+-----------+---------------+


#### Observation:
The output shows the refined probability of having heart disease when multiple conditions are provided (age: Middle, cholesterol: Normal, chest pain type: 2). As we add more evidence, the model provides a more specific prediction.


In this notebook, we successfully constructed a Bayesian Network to model heart disease diagnosis using the UCI Heart Disease dataset.
We followed the steps of loading the data, discretizing variables, building the network, learning parameters, and performing inference using the trained model.
The Bayesian Network provides a probabilistic way to diagnose heart disease based on multiple patient factors.
