<a href="https://colab.research.google.com/github/zxn16/CN6005-2526-T1-Artifical-Intelligence-/blob/main/lab7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Naive Bayes Tasks 1–4

### Task 1 – Accident Prediction

**Objective**: Classify X = (Rain, Good, Normal, No)

*   P(Accident = Yes) = 0.5
*   P(Accident = No)  = 0.5

**Conditional Probabilities (Accident = Yes):**
*   P(Rain|Yes)=0.2, P(Good|Yes)=0.2, P(Normal|Yes)=0.2, P(No|Yes)=0.4

**Conditional Probabilities (Accident = No):**
*   P(Rain|No)=0.4, P(Good|No)=0.6, P(Normal|No)=0.4, P(No|No)=0.8

**Naive Bayes Scores:**
*   Score_Yes = 0.0016
*   Score_No  = 0.0384

**Normalised:**
*   P(Yes|X)=0.04
*   P(No|X)=0.96

**Classification**: Accident = NO

### Task 2 – Spam or Not Spam

**New email**: (Free=Yes, Win=No, Hello=No)

*   P(Spam)=0.5
*   P(NotSpam)=0.5

**Spam Likelihoods:**
*   P(Free|Spam)=0.5, P(Win=No|Spam)=0.5, P(Hello=No|Spam)=0.5

**Not-Spam Likelihoods:**
*   P(Free|NotSpam)=0.5, P(Win=No|NotSpam)=1, P(Hello=No|NotSpam)=0

**Scores:**
*   Score_Spam = 0.0625
*   Score_NotSpam = 0

**Classification**: SPAM

### Task 3 – Play Tennis

**Instance**: (Sunny, Cool, High, Strong)

*   Score_Yes ≈ 0.0053
*   Score_No  ≈ 0.0206

**Normalised:**
*   P(Yes)=0.205
*   P(No)=0.795

**Classification**: Do NOT play tennis

### Task 4 – Buy Computer

**Instance**: (age ≤ 30, income=medium, student=yes, credit_rating=fair)

**Likelihoods:**
*   P(X|Yes)=0.044
*   P(X|No)=0.019

**Posteriors:**
*   Score_Yes ≈ 0.028
*   Score_No  ≈ 0.007

**Classification**: buys_computer = YES

TASK 5

In [5]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report


In [6]:
data = {
    'EmploymentStatus': ['Employed', 'Unemployed', 'Employed', 'Employed', 'Unemployed'],
    'CreditHistory': ['Good', 'Bad', 'Good', 'Bad', 'Good'],
    'IncomeLevel': ['High', 'Low', 'Medium', 'Medium', 'Low'],
    'LoanApproved': ['Yes', 'No', 'Yes', 'No', 'Yes']
}

df = pd.DataFrame(data)
df













Unnamed: 0,EmploymentStatus,CreditHistory,IncomeLevel,LoanApproved
0,Employed,Good,High,Yes
1,Unemployed,Bad,Low,No
2,Employed,Good,Medium,Yes
3,Employed,Bad,Medium,No
4,Unemployed,Good,Low,Yes


In [7]:
encoder = LabelEncoder()
df_encoded = df.apply(encoder.fit_transform)
df_encoded


Unnamed: 0,EmploymentStatus,CreditHistory,IncomeLevel,LoanApproved
0,0,1,0,1
1,1,0,1,0
2,0,1,2,1
3,0,0,2,0
4,1,1,1,1


In [8]:
X = df_encoded.drop('LoanApproved', axis=1)
y = df_encoded['LoanApproved']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=42
)

X_train, X_test, y_train, y_test


(   EmploymentStatus  CreditHistory  IncomeLevel
 2                 0              1            2
 0                 0              1            0
 3                 0              0            2,
    EmploymentStatus  CreditHistory  IncomeLevel
 1                 1              0            1
 4                 1              1            1,
 2    1
 0    1
 3    0
 Name: LoanApproved, dtype: int64,
 1    0
 4    1
 Name: LoanApproved, dtype: int64)

In [9]:
model = GaussianNB()
model.fit(X_train, y_train)


In [10]:
y_pred = model.predict(X_test)
y_pred


array([0, 1])

In [11]:
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", acc)
print("\nClassification Report:\n", report)


Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2



In [13]:
from sklearn.preprocessing import LabelEncoder

# Create one encoder per column
encoders = {}
df_encoded = pd.DataFrame()

for column in df.columns:
    le = LabelEncoder()
    df_encoded[column] = le.fit_transform(df[column])
    encoders[column] = le

df_encoded



Unnamed: 0,EmploymentStatus,CreditHistory,IncomeLevel,LoanApproved
0,0,1,0,1
1,1,0,1,0
2,0,1,2,1
3,0,0,2,0
4,1,1,1,1


In [14]:
new_data = pd.DataFrame({
    'EmploymentStatus': [encoders['EmploymentStatus'].transform(['Employed'])[0]],
    'CreditHistory': [encoders['CreditHistory'].transform(['Good'])[0]],
    'IncomeLevel': [encoders['IncomeLevel'].transform(['Medium'])[0]]
})

prediction = model.predict(new_data)

encoders['LoanApproved'].inverse_transform(prediction)


array(['Yes'], dtype=object)

In this week’s lab we learned how the Naive Bayes classifier works using probability, conditional likelihoods, and Bayes’ theorem. We applied the full classification process manually on several example datasets, calculating priors, conditional probabilities, and posterior scores to determine the correct class for each instance. After understanding the theory, we implemented Naive Bayes in Python using Google Colab by creating a dataset, encoding categorical variables, training a Gaussian Naive Bayes model, evaluating its accuracy, and making predictions on new unseen data. This helped reinforce both the mathematical foundations and the practical coding skills required to build and test a probabilistic classifier.
