# **SVM & Naive Bayes | Assignment**

**1. What is a Support Vector Machine (SVM), and how does it work?**

A Support Vector Machine (SVM) is a supervised machine learning algorithm primarily used for classification tasks, though it can also be applied to regression (SVR). It works by finding the optimal hyperplane that separates data points of different classes with the maximum margin.

How SVM Works:

1.Hyperplane Concept:

- In a 2D space, the hyperplane is a line that separates two classes. In higher dimensions, it becomes a plane or a hyperplane.

- The goal is to position the hyperplane so that the distance between the closest points of each class (called support vectors) and the hyperplane is maximized. This distance is called the margin.

2.Support Vectors:

- Support vectors are the data points that lie closest to the hyperplane.

- These points are crucial because they define the position and orientation of the hyperplane.

3.Linear vs Non-Linear SVM:

- Linear SVM: Used when data is linearly separable.

- Non-linear SVM: Uses kernel functions (like RBF, polynomial, or sigmoid) to map data into a higher-dimensional space where it can be linearly separated.

4.Kernel Trick:

- SVM uses kernels to handle non-linear data efficiently without explicitly computing the coordinates in higher dimensions.

5.Common kernels:

- Linear Kernel: Simple linear separation.

- Polynomial Kernel: Handles polynomial relationships.

- RBF (Radial Basis Function) Kernel: Maps data into infinite-dimensional space to handle complex boundaries.

6.Optimization:

- SVM solves a convex optimization problem to maximize the margin while minimizing classification errors.

- A regularization parameter C controls the trade-off between maximizing the margin and minimizing misclassification.

Summary:

- SVM finds the best separating boundary (hyperplane) between classes using support vectors, maximizing the margin, and optionally applying kernels for non-linear data. It is powerful for both high-dimensional data and small-to-medium datasets.

**2. Explain the difference between Hard Margin and Soft Margin SVM.**

Support Vector Machines (SVM) can be implemented using Hard Margin or Soft Margin, depending on the data and tolerance for misclassification. Here’s the difference:

1.Hard Margin SVM

- Definition: Assumes the data is perfectly linearly separable. The algorithm finds a hyperplane that separates the classes without any errors.

Key Characteristics:

- No data points are allowed to lie inside the margin or on the wrong side of the hyperplane.

- Maximizes the margin strictly between the two classes.

- Works well only if the data is clean and noise-free.

Limitation:

- Sensitive to outliers or noisy data. Even a single misclassified point can make it impossible to find a solution.

2.Soft Margin SVM

- Definition: Allows some misclassifications or violations of the margin to handle non-linearly separable or noisy data.

Key Characteristics:

- Introduces a slack variable (ξ) to allow some points to lie inside the margin or on the wrong side.

- Uses a regularization parameter (C) to balance margin maximization and classification error:

- High C → less tolerance for misclassification (behaves more like Hard Margin).

- Low C → more tolerance, wider margin, more misclassifications allowed.

Advantage:

- More robust to noise and outliers.

- Can handle overlapping or non-linearly separable data.

Summary Table :

| Feature             | Hard Margin SVM    | Soft Margin SVM               |
| ------------------- | ------------------ | ----------------------------- |
| Data requirement    | Linearly separable | Can be non-linearly separable |
| Misclassification   | Not allowed        | Allowed (controlled by C)     |
| Margin flexibility  | Fixed              | Flexible                      |
| Robustness to noise | Low                | High                          |


**3. What is the Kernel Trick in SVM? Give one example of a kernel and
explain its use cases.**

Kernel Trick in SVM:

- Support Vector Machines (SVM) work by finding a hyperplane that separates data points of different classes. However, not all datasets are linearly separable in their original feature space. The kernel trick is a technique that allows SVM to operate in a higher-dimensional space without explicitly computing the coordinates of the data in that space. Instead, it uses a kernel function to compute the inner product of transformed features directly. This makes it computationally efficient to handle non-linear decision boundaries.

- Mathematically, if
$ϕ(x)$ is a mapping to a higher-dimensional space, the kernel trick uses:

- $K(xi​,xj​)=⟨ϕ(xi​),ϕ(xj​)⟩$

- where
K is the kernel function.

- Example of a Kernel: Radial Basis Function (RBF) Kernel

- The RBF kernel is defined as:

- $$
K(x_i, x_j) = \exp\left(-\gamma \|x_i - x_j\|^2\right)$$

- Use Case: The RBF kernel is useful when the relationship between class labels and features is non-linear. It maps the data into an infinite-dimensional space where a linear separation becomes possible.

- Example Scenario: Classifying points in a 2D XOR problem, where classes are diagonally opposite, cannot be separated by a straight line in the original 2D space. Using the RBF kernel, SVM can find a non-linear boundary to separate the classes effectively.

Example Use Case: RBF Kernel

- Imagine a 2D dataset with points forming concentric circles:

- Inner circle = Class A

- Outer circle = Class B

- A linear SVM cannot separate these classes with a straight line.
Using the RBF kernel, the data is mapped into a higher-dimensional space where it becomes linearly separable, and SVM can find a hyperplane to distinguish the classes.

Intuitive Analogy

- Think of the kernel trick as lifting a tangled string off a table:

- On the table (2D space), the string loops around itself, impossible to cut with a straight line.

- Lift it into 3D space (kernel mapping), and suddenly the string becomes untangled.

- Now you can “cut” it with a flat plane (hyperplane) — which corresponds to a non-linear separation in 2D.


**4. What is a Naïve Bayes Classifier, and why is it called “naïve”?**

Naïve Bayes Classifier

- A Naïve Bayes (NB) classifier is a probabilistic machine learning algorithm used for classification tasks. It is based on Bayes’ Theorem, which calculates the probability of a class given the observed features.

- Bayes’ Theorem formula:

- $ P(C \mid X) = \frac{P(X \mid C) \cdot P(C)}{P(X)}$

Where:

- 𝑃(𝐶∣𝑋)= Posterior probability of class 𝐶 given feature vector 𝑋

- 𝑃(𝑋∣𝐶)= Likelihood of feature vector 𝑋 given class 𝐶

- 𝑃(𝐶) = Prior probability of class

- 𝑃(𝑋) = Probability of feature vector 𝑋 (can be ignored for classification since it’s constant across classes)

- The Naïve Bayes classifier predicts the class with the highest posterior probability.

Why is it called “Naïve”?

- It is called naïve because it assumes that all features are independent of each other given the class.

- In real-world data, features are often correlated, but NB ignores these correlations.

- This “naïve” assumption simplifies computations and allows the classifier to work efficiently on large datasets.

Types of Naïve Bayes Classifiers

- Gaussian Naïve Bayes: Assumes features follow a Gaussian (normal) distribution.

- Multinomial Naïve Bayes: Used for discrete data, e.g., word counts in text classification.

- Bernoulli Naïve Bayes: Used for binary features (0 or 1), e.g., presence/absence of a word.

Example Use Case

Spam Email Detection:

- Features: Words in the email (e.g., “free”, “offer”, “win”)

- Classes: Spam or Not Spam

- Naïve Bayes calculates the probability of an email being spam based on the presence of words.

Even though the words may not be fully independent, NB often performs very well in text classification due to its simplicity and efficiency.

**5. Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?**

1.Gaussian Naïve Bayes (GNB)

Description:

- Assumes that the continuous features of the data follow a Gaussian (normal) distribution.

- The likelihood $P(xi∣C)$ for a feature 𝑥𝑖 is calculated using the Gaussian probability density function:

- $P(x_i \mid C) = \frac{1}{\sqrt{2\pi \sigma_C^2}} \exp\left( -\frac{(x_i - \mu_C)^2}{2\sigma_C^2} \right)$

where

- 𝜇𝐶 and 𝜎2C are the mean and variance of feature 𝑥𝑖

- for class 𝐶.

Use Case:

- Used when features are continuous numerical values.

- Example: Predicting whether a patient has a disease based on continuous features like blood pressure, cholesterol level, or age.

2.Multinomial Naïve Bayes (MNB)

Description:

- Designed for discrete count data, especially feature vectors representing frequencies.

- Computes the probability of features (e.g., word counts) given a class using a multinomial distribution.

- $P(x_i \mid C) = \frac{\text{count of feature } i \text{ in class } C + \alpha}{\text{total count of all features in class } C + \alpha \cdot n}$

where

- 𝛼 is the Laplace smoothing parameter and 𝑛 is the number of features.

Use Case:

- Commonly used in text classification tasks where features represent word counts.

- Example: Spam detection in emails, sentiment analysis, or document categorization.

3.Bernoulli Naïve Bayes (BNB)

Description:

- Designed for binary/boolean features (0 or 1).

- Focuses on presence or absence of a feature rather than frequency.

- $P(x_i \mid C) = P(f_i = 1 \mid C)^{x_i} \cdot (1 - P(f_i = 1 \mid C))^{1 - x_i}$

Use Case:

- Used when data features are binary indicators.

- Example: Text classification based on whether a word appears in a document or not, click prediction (clicked/not clicked), or presence/absence of certain symptoms.

Summary Table:

| Variant        | Feature Type    | Use Case Example                     |
| -------------- | --------------- | ------------------------------------ |
| Gaussian NB    | Continuous      | Predicting disease from medical data |
| Multinomial NB | Discrete counts | Spam detection, sentiment analysis   |
| Bernoulli NB   | Binary features | Text classification (word presence)  |


**6. Write a Python program to:**

**● Load the Iris dataset**

**● Train an SVM Classifier with a linear kernel**

**● Print the model's accuracy and support vectors.**

Explanation:

1. Load Dataset: load_iris() provides 150 samples with 4 features each (sepal/petal length and width).

2. Train/Test Split: We split data into 80% training and 20% testing.

3. SVM Training: We use a linear kernel because Iris data is mostly linearly separable.

4. Accuracy: Evaluates how well the model predicts unseen test data.

5. Support Vectors: SVM identifies the data points that lie closest to the decision boundary — these are critical for defining the hyperplane.

In [1]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data       # Features
y = iris.target     # Labels

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 3: Train an SVM classifier with a linear kernel
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)

# Step 4: Make predictions on the test set
y_pred = svm_model.predict(X_test)

# Step 5: Print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Step 6: Print the support vectors
print("Support Vectors:")
print(svm_model.support_vectors_)


Model Accuracy: 1.00
Support Vectors:
[[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


**7. Write a Python program to:**

**● Load the Breast Cancer dataset**

**● Train a Gaussian Naïve Bayes model**

**● Print its classification report including precision, recall, and F1-score.**

Explanation:

1. Load Dataset: load_breast_cancer() provides 569 samples with 30 numerical features describing tumor properties.

2. Train/Test Split: 80% for training and 20% for testing.

3. Gaussian Naïve Bayes: Assumes continuous features follow a Gaussian distribution.

4. Classification Report: Includes precision, recall, F1-score, and support for each class.

5. Precision: How many predicted positives are correct.

6. Recall: How many actual positives are correctly predicted.

7. F1-score: Harmonic mean of precision and recall.

In [2]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Labels

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 3: Train a Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Step 4: Make predictions on the test set
y_pred = gnb.predict(X_test)

# Step 5: Print the classification report
report = classification_report(y_test, y_pred, target_names=data.target_names)
print("Classification Report:\n")
print(report)


Classification Report:

              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



**8. Write a Python program to:**

**● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma.**

**● Print the best hyperparameters and accuracy.**

Explanation:

1. Wine Dataset: Contains 178 samples with 13 chemical features of wines categorized into 3 classes.

2. SVM with RBF Kernel: gamma is relevant for RBF kernel as it defines the influence of a single training point.

3. GridSearchCV:

- Exhaustively searches over the grid of hyperparameters C (regularization) and gamma (kernel coefficient).

- cv=5 means 5-fold cross-validation.

4. Best Hyperparameters: grid_search.best_params_ gives the combination of C and gamma that achieves the highest accuracy on validation folds.

5. Test Accuracy: Evaluates performance of the tuned model on unseen test data.

In [3]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 3: Define the SVM classifier
svm = SVC(kernel='rbf')  # Using RBF kernel as gamma is relevant

# Step 4: Set up the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1]
}

# Step 5: Initialize GridSearchCV
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Step 6: Get the best hyperparameters
best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

# Step 7: Evaluate the model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Set Accuracy: {accuracy:.2f}")


Best Hyperparameters: {'C': 100, 'gamma': 0.001}
Test Set Accuracy: 0.83


**9. Write a Python program to:**

**● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
sklearn.datasets.fetch_20newsgroups).**

**● Print the model's ROC-AUC score for its predictions.**

Explanation:

1. Dataset:

- fetch_20newsgroups provides a large collection of news articles.

- For simplicity, we selected 3 categories.

2. Text Vectorization:

- TfidfVectorizer converts raw text into numerical features based on word frequency and importance.

3. Naïve Bayes Model:

- MultinomialNB is suitable for discrete count data like TF-IDF features.

4. ROC-AUC Score:

- For multi-class problems, we binarize labels and use One-vs-Rest (OvR) strategy.

- average='macro' computes the average ROC-AUC across classes.

In [4]:
# Import required libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize
import numpy as np

# Step 1: Load a subset of the 20 Newsgroups dataset
categories = ['alt.atheism', 'comp.graphics', 'sci.space']  # Selecting 3 categories for simplicity
newsgroups = fetch_20newsgroups(subset='all', categories=categories)

X = newsgroups.data
y = newsgroups.target

# Step 2: Convert text data to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# Step 3: Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

# Step 4: Train a Multinomial Naïve Bayes classifier
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# Step 5: Make probability predictions for ROC-AUC calculation
y_prob = nb_model.predict_proba(X_test)

# Step 6: Binarize labels for multi-class ROC-AUC
y_test_bin = label_binarize(y_test, classes=np.arange(len(categories)))

# Step 7: Compute ROC-AUC score
roc_auc = roc_auc_score(y_test_bin, y_prob, average='macro', multi_class='ovr')
print(f"ROC-AUC Score: {roc_auc:.4f}")


ROC-AUC Score: 0.9975


**10. Imagine you’re working as a data scientist for a company that handles email communications. Your task is to automatically classify emails as Spam or Not Spam. The emails may contain:**

**● Text with diverse vocabulary**

**● Potential class imbalance (far more legitimate emails than spam)**

**● Some incomplete or missing data
Explain the approach you would take to:**

**● Preprocess the data (e.g. text vectorization, handling missing data)**

**● Choose and justify an appropriate model (SVM vs. Naïve Bayes)**

**● Address class imbalance**

**● Evaluate the performance of your solution with suitable metrics
And explain the business impact of your solution.**

1.Preprocessing the Data

a) Handling Text Data

- Emails are unstructured text, so we need to convert them into numerical features:

1. Text Cleaning:

- Remove punctuation, numbers, and special characters.

- Convert all text to lowercase.

- Remove stop words (common words like “the”, “is” that carry little meaning).

2. Tokenization & Vectorization:

- Use TF-IDF Vectorization (TfidfVectorizer) or CountVectorizer to convert text into numerical features.

- TF-IDF is preferred because it gives higher weight to words that are informative and lower weight to common words.

b) Handling Missing Data

- Check for missing email text or labels.

- Replace missing text with an empty string or a placeholder.

- Drop rows with missing labels because they cannot be used for supervised learning.

2.Choosing an Appropriate Model

- Options: SVM vs. Naïve Bayes

| Model           | Pros                                                                            | Cons                                                            |
| --------------- | ------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| **Naïve Bayes** | Fast, handles high-dimensional text data well, robust to irrelevant features    | Assumes independence of words, may not capture complex patterns |
| **SVM**         | Can handle non-linear boundaries (with kernels), good for high-dimensional data | Slower to train on very large datasets, requires careful tuning |

Recommendation:

- For text classification with high-dimensional sparse features (TF-IDF vectors), Multinomial Naïve Bayes is often preferred.

- It works well even when the independence assumption is not strictly true, and it is very fast for large datasets.

- SVM can also be used if very high accuracy is required and computational resources allow.

3.Addressing Class Imbalance

- Spam detection datasets often have many more legitimate emails than spam emails, which can bias the model. Approaches to address this:

1. Resampling:

- Oversample minority class (spam) using techniques like SMOTE.

- Undersample majority class (ham) to balance the dataset.

2. Class Weights:

- In SVM or other classifiers, set class_weight='balanced' to penalize misclassifying the minority class more.

3. Threshold Tuning:

- Adjust decision thresholds to improve detection of spam emails, even if it slightly increases false positives.

4.Evaluating Performance

Metrics:

- Accuracy: Not sufficient alone because of imbalance.

- Precision (Spam Prediction Accuracy): High precision ensures legitimate emails are rarely flagged as spam.

- Recall (Spam Detection Rate): High recall ensures most spam emails are caught.

- F1-Score: Harmonic mean of precision and recall, balances the trade-off.

- ROC-AUC Score: Measures overall separability of spam vs. non-spam.

Example:

- A spam classifier with high recall but moderate precision may flag some legitimate emails, but almost no spam is missed.

5.Business Impact

1. Implementing an effective spam classifier brings significant value:

2. Improved Productivity: Employees spend less time sorting spam emails.

3. Reduced Security Risk: Spam emails often carry phishing attempts or malware. Detecting them reduces the risk of data breaches.

4. Customer Trust: Ensures critical emails are delivered correctly while keeping spam out of inboxes.

5. Operational Efficiency: Automates email filtering, reducing the need for manual review.

Key Takeaway: A well-designed spam detection system balances accuracy, speed, and reliability, enhancing security and efficiency while maintaining user satisfaction.

Workflow Explanation:

1. Dataset:

- Used fetch_20newsgroups as a sample dataset.

- Simulated missing emails for preprocessing demonstration.

2. Preprocessing:

- Filled missing email text with empty strings.

- Converted raw text to TF-IDF features for numeric representation.

3. Class Imbalance:

- Calculated class weights, though MultinomialNB doesn’t directly use them, it’s useful if switching to other classifiers like SVM.

4. Model:

- Multinomial Naïve Bayes is ideal for text classification with word count/TF-IDF features.

5. Evaluation:

- Classification report gives precision, recall, F1-score, and support.

- ROC-AUC measures model's ability to separate classes.

Python Implementation :

In [5]:
# Import required libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.utils.class_weight import compute_class_weight
from sklearn.preprocessing import label_binarize
import numpy as np
import pandas as pd

# -----------------------------
# 1. Load or simulate the dataset
# -----------------------------
# For demonstration, we use a subset of 20 Newsgroups to simulate spam/ham
categories = ['rec.autos', 'sci.space']  # 'rec.autos' = Not Spam, 'sci.space' = Spam
emails = fetch_20newsgroups(subset='all', categories=categories)
X = emails.data
y = emails.target

# Convert to DataFrame to simulate missing data
df = pd.DataFrame({'text': X, 'label': y})
df.loc[5, 'text'] = None  # Simulate missing email
df.loc[10, 'text'] = None

# -----------------------------
# 2. Preprocess the text
# -----------------------------
# Fill missing email text with empty string
df['text'].fillna('', inplace=True)
X = df['text'].values
y = df['label'].values

# Vectorize text using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# -----------------------------
# 3. Train/Test split
# -----------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X_tfidf, y, test_size=0.2, random_state=42, stratify=y
)

# -----------------------------
# 4. Handle class imbalance (optional)
# -----------------------------
# Compute class weights
classes = np.unique(y_train)
class_weights = compute_class_weight(class_weight='balanced', classes=classes, y=y_train)
class_weight_dict = dict(zip(classes, class_weights))

# -----------------------------
# 5. Train a Multinomial Naïve Bayes model
# -----------------------------
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# -----------------------------
# 6. Make predictions
# -----------------------------
y_pred = nb_model.predict(X_test)
y_prob = nb_model.predict_proba(X_test)

# -----------------------------
# 7. Evaluate the model
# -----------------------------
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=categories))

# For ROC-AUC (multi-class / binary)
y_test_bin = label_binarize(y_test, classes=classes)
roc_auc = roc_auc_score(y_test_bin, y_prob[:,1], average='macro')
print(f"ROC-AUC Score: {roc_auc:.4f}")


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['text'].fillna('', inplace=True)


Classification Report:

              precision    recall  f1-score   support

   rec.autos       0.98      0.99      0.99       198
   sci.space       0.99      0.98      0.99       198

    accuracy                           0.99       396
   macro avg       0.99      0.99      0.99       396
weighted avg       0.99      0.99      0.99       396

ROC-AUC Score: 0.9998
