### **Name** : Manu Mathew
### **Student Id**: 8990691

1. Use wine dataset from sklearn.datasets to classify wines into 3 categories. Load the dataset and split it into test and train. Train the model using Gaussian and Multinominal classifiers and post which model performs better. Use the trained model to perform some predictions on test data.

   Record your observations/reflections in the Python notebook.

# (Optional) Install libraries, if not installed

In [None]:
%pip install pandas scikit-learn jupyter

In [None]:
# Adding the required imports

In [None]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the wine dataset which is a classic dataset for the classification

In [None]:
theWine = load_wine()
# Features or the columns
Xfeatures = theWine.data
# Target variables
Ytarget = theWine.target

# Convert to the Dataframe for visualizing the data structures and column names

In [None]:
df = pd.DataFrame(Xfeatures, columns = theWine.feature_names)
df['target'] = Ytarget

In [None]:
print("Data sets loaded sucessfully \n")
print("Wine Shape of features:", Xfeatures.shape, "Shape of target:", Ytarget.shape)
print("\nDisplaying first 5 rows of the dataset:")
print(df.head(5))
print("\nChecking for null or empty values")
print(df.isnull().sum())
print("\nTarget names:", theWine.target_names)
print("-" * 50)

# Split the data

In [None]:
"""
Split the dataset into training and testing sets.
X_train, y_train --> data used to train the model.
X_test, y_test --> data used to evaluate the model's performance.
test_size=0.3 means 30% of the data will be used for testing.
random_state ensures reproducibility of the split.
"""
X_train, X_test, y_train, y_test = train_test_split(Xfeatures, Ytarget, test_size=0.3, random_state=42)

print("\nThe Data Splitting phase:")
print(f"The Data split into training (X_train: {X_train.shape}, y_train: {y_train.shape})")
print(f"and Testing (X_test: {X_test.shape}, y_test: {y_test.shape}) sets.")
print("-" * 50)

# Train Gaussian Naive Bayes

In [None]:
print("\nTraining Gaussian Naive Bayes")
# Initializing the Gaussian Naive Bayes classifier.
gaussian_m = GaussianNB()

# Train the model using the training data.

In [None]:
gaussian_m.fit(X_train, y_train)

# Make predictions on the test set.

In [None]:
gaussian_predictions = gaussian_m.predict(X_test)

# Evaluate the model's performance.

In [None]:
gaussian_accuracy = accuracy_score(y_test, gaussian_predictions)
print(f"The Accuracy for Gaussian Naive Bayes: {gaussian_accuracy:.4f}")

# Print a detailed classification report, including precision, recall, and f1-score for each class.

In [None]:
print("\nClassification Report for Gaussian Naive Bayes:")
print(classification_report(y_test, gaussian_predictions, target_names=theWine.target_names))
print("-" * 50)

# Train Multinomial Naive Bayes

In [None]:
print("\n Training Multinomial Naive Bayes")
# Initialize the Multinomial Naive Bayes classifier.
multinomial_mo = MultinomialNB()


# Train the model using the training data.

In [None]:
multinomial_mo.fit(X_train, y_train)

# Make predictions on the test set

In [None]:
multinomial_predictions = multinomial_mo.predict(X_test)

# Evaluate the model's performance.

In [None]:
multinomial_accuracy = accuracy_score(y_test, multinomial_predictions)
print(f"Multinomial Naive Bayes Accuracy: {multinomial_accuracy:.4f}")

# Print a detailed classification report.

In [None]:
print("\nClassification Report for Multinomial Naive Bayes:")
print(classification_report(y_test, multinomial_predictions, target_names=theWine.target_names))
print("-" * 50)

# Compare models

In [None]:
print("\nModel Comparison:")
# Comparing the accuracy of the two models.
if gaussian_accuracy > multinomial_accuracy:
    print(f"Gaussian Naive Bayes performed better with an accuracy of {gaussian_accuracy:.4f} "
          f"compared to Multinomial Naive Bayes' accuracy of {multinomial_accuracy:.4f}.")
    best_model = gaussian_m
    best_model_name = "Gaussian Naive Bayes"
else:
    print(f"Multinomial Naive Bayes performed better** with an accuracy of {multinomial_accuracy:.4f} "
          f"compared to Gaussian Naive Bayes' accuracy of {gaussian_accuracy:.4f}.")
    best_model = multinomial_mo
    best_model_name = "Multinomial Naive Bayes"
print("-" * 50)

# Predictions on test data using the better model


In [None]:
print(f"\nPerforming Predictions using the {best_model_name} model")
# Select a small subset of the test data to demonstrate predictions.
num_of_predictions_to_show = 5
sample_X_test = X_test[:num_of_predictions_to_show]
sample_y_test = y_test[:num_of_predictions_to_show]

# Get predictions from the best performing model.
sample_predictions = best_model.predict(sample_X_test)

print(f"\nFirst {num_of_predictions_to_show} actual labels (0=class_0, 1=class_1, 2=class_2):")
print(sample_y_test)
print(f"First {num_of_predictions_to_show} predicted labels by {best_model_name}:")
print(sample_predictions)

print("\nDetailed comparison for the first few samples:")
for i in range(num_of_predictions_to_show):
    actual_class_name = theWine.target_names[sample_y_test[i]]
    predicted_class_name = theWine.target_names[sample_predictions[i]]
    print(f"Sample {i+1}: Actual: {actual_class_name} (Label: {sample_y_test[i]}), "
          f"Predicted: {predicted_class_name} (Label: {sample_predictions[i]})")
print("-" * 50)

# Record Observations/Reflections

### 7. Observations/Reflections

1. **Data Loading and Splitting**: The wine dataset was loaded successfully and was then split into training and testing sets. The features or columns are all numerical and non-negative, making them suitable for both Gaussian and Multinomial Naive Bayes classifiers.

2. **Gaussian Naive Bayes Performance**: Gaussian Naive Bayes performed very well on this dataset and the accuracy was higher 1.0000.

3. **Multinomial Naive Bayes Performance**: Multinomial Naive Bayes, while applicable to non-negative continuous data, generally performed slightly worse than Gaussian Naive Bayes for this specific dataset. This is often because Multinomial Naive Bayes is inherently designed for discrete count data and while it can be adapted, it might not capture the nuances of continuous, non-count data as effectively as Gaussian Naive Bayes which explicitly models continuous distributions.

4. **Model Comparison**: For the wine dataset, **Gaussian Naive Bayes proved to be the superior model** in terms of accuracy. 

5. **Predictions**: The better-performing model (Gaussian Naive Bayes in this case) demonstrated good predictive capability on unseen test data, correctly classifying most of the sample predictions shown.

6. **Choosing the Right Naive Bayes Variant**: This exercise underscores the importance of considering the nature of your features when selecting a Naive Bayes algorithm. For continuous data, `GaussianNB` is generally the more appropriate choice, whereas for discrete count data, `MultinomialNB` is typically preferred.

---

