# Lesson 2: Mastering Text Classification with Naive Bayes in Python

# Overview: Text Classification With Naive Bayes

Hello and welcome! Today, we will explore the world of text classification using the Naive Bayes algorithm, specifically in Python with the Scikit-learn library. By the end of this lesson, you will understand how Naive Bayes works, how to implement a Naive Bayes model in Python, and how to evaluate its performance. Let's get started!

## Understanding the Fundamentals of Naive Bayes

The Naive Bayes algorithm is a category of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naïve) independence assumptions between the features. It calculates the probability that a certain event will occur given that another event has already occurred. In text classification, the event we're interested in is a specific class label, such as spam or ham (not spam), while the given event is the text input we have — a particular SMS in our case.

The 'naive' in Naive Bayes comes from the assumption that each feature contributes independently to the probability of a particular outcome. Although this assumption often isn't valid in the real world (words in an SMS are often not independent), the Naive Bayes algorithm still tends to perform very well in text classification, particularly as a simple and fast method.

## Dataset Loading and Preparation

Before we start building our Naive Bayes model, let's load our dataset and perform the necessary preparations:

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train-test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)
```

In the above code, we load our SMS dataset and conduct a train-test split. These steps prepare our dataset for modeling. By separating our data into a training set and a test set, we ensure that our model can learn from one portion of the data (the training set) and then have its performance evaluated on unseen data (the test set).

## Text Data Transformation Into Numerical Features

Before building the Naive Bayes model, we need to prepare our data. Since machine learning algorithms operate on numeric data, we must convert our SMS text data into numerical features:

```python
# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)
```

In this code block, we implement the CountVectorizer, which performs two important tasks: it tokenizes the sentences (breaking the text down into individual words) and counts the frequency of each word in each sentence. This information is used to transform each sentence into a numerical vector that our machine learning model can understand. The vectors produced by CountVectorizer result in a matrix of token counts: `X_train_count` and `X_test_count`.

## Building the Naive Bayes Model

Now that we've transformed our text data into numerical vectors, we can create our Naive Bayes classifier:

```python
# Initialize the MultinomialNB model
naive_bayes_model = MultinomialNB()

# Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)
```

Here, we initialize a Naive Bayes classifier using the `MultinomialNB` class from Scikit-learn. The `fit` method trains our model on the training data, learning the probabilities of each label (spam or ham) given the input features (token counts). Once the model is trained, we use the `predict` method to make predictions on our test data.

## Predicting and Evaluating the Model Performance

Accuracy is a common metric for classification, calculated as the ratio of the number of correct predictions to the total number of input samples:

```python
# Calculate the accuracy of the model
from sklearn import metrics

accuracy = metrics.accuracy_score(Y_test, y_pred)

# Print the accuracy
print(f"Accuracy of Naive Bayes Classifier: {accuracy:.2f}")
```

The output will be:

```
Accuracy of Naive Bayes Classifier: 0.98
```

This indicates that our classifier has a very high accuracy rate, rarely misclassifying SMS messages. This high level of accuracy demonstrates the effectiveness of the Naive Bayes classifier for text classification tasks.

## Lesson Summary and Practice

Well done on reaching the end of this lesson! We explored the Naive Bayes algorithm, implemented it in Python for text classification, and evaluated its performance. The Naive Bayes classifier is a powerful and fast classification tool ideal for text data, even if its assumptions largely ignore the semantics of text.

In the upcoming exercises, you will have the chance to implement a Naive Bayes classifier and gain valuable hands-on experience. Remember that practicing what you've learned is essential in your learning journey. So, get your hands dirty with our exercises and improve your problem-solving abilities and understanding of the Naive Bayes classifier. Let's go! Happy coding!

## Tuning Alpha Parameter in Naive Bayes Model

Greetings, Space Explorer! Your mission is to tune the Naive Bayes classifier used for SMS classification by adjusting the alpha parameter. The alpha parameter is a smoothing variable used in the Naive Bayes algorithm to account for features not present in the learning samples and to prevent zero probabilities in further computations. Setting alpha too low may cause underfitting, while setting it too high may result in overfitting. Originally, our model uses an alpha value of 0. Let's see what happens when we adjust the alpha to 1, the default value of the parameter. Implement this change and report the accuracy of the updated model on the test set.

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# Initialize the MultinomialNB model with alpha=0
naive_bayes_model = MultinomialNB(alpha=0)

# Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)

# Calculate the accuracy of the model
accuracy = metrics.accuracy_score(Y_test, y_pred)

# Print the accuracy
print(f"Accuracy of Naive Bayes Classifier with alpha=0: {accuracy:.2f}")

```

Abacus AI ChatLLM:
To adjust the Naive Bayes classifier by changing the alpha parameter from 0 to 1, you can modify the initialization of the `MultinomialNB` model. Below is the updated code that implements this change and reports the accuracy of the updated model on the test set.

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# Initialize the MultinomialNB model with alpha=1 (default value)
naive_bayes_model = MultinomialNB(alpha=1)

# Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)

# Calculate the accuracy of the model
accuracy = metrics.accuracy_score(Y_test, y_pred)

# Print the accuracy
print(f"Accuracy of Naive Bayes Classifier with alpha=1: {accuracy:.2f}")
```

### Explanation of Changes:
- The line `naive_bayes_model = MultinomialNB(alpha=0)` has been changed to `naive_bayes_model = MultinomialNB(alpha=1)` to set the alpha parameter to its default value.

### Expected Output:
When you run this code, it will print the accuracy of the Naive Bayes classifier with the updated alpha value. The output will look something like this:

```
Accuracy of Naive Bayes Classifier with alpha=1: [accuracy_value]
```

Replace `[accuracy_value]` with the actual accuracy obtained from running the code. This will give you insight into how changing the alpha parameter affects the model's performance

## Fill in the Blanks: Building Naive Bayes Model

Space Voyager, your task awaits! We have prepared a Naive Bayes model for text classification. There are some missing pieces marked with ____. Can you fill in the blanks to make the code work? We aim to replace the ____ with the correct syntax to create a Multinomial Naive Bayes object, then train the model with the training dataset.

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import ____ 
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# Initialize the MultinomialNB model
naive_bayes_model = ____(alpha=1) 

# Fit the model on the training data
naive_bayes_model.____(X_train_count, Y_train)

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)

print(f"The model's first 10 predictions: {y_pred[:10]}")

```

Abacus AI ChatLLM:
To complete the code for the Naive Bayes model for text classification, you need to fill in the blanks with the appropriate syntax. Here’s the completed code with the missing pieces filled in:

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB  # Fill in the blank here
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# Initialize the MultinomialNB model
naive_bayes_model = MultinomialNB(alpha=1)  # Fill in the blank here

# Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)  # Fill in the blank here

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)

print(f"The model's first 10 predictions: {y_pred[:10]}")
```

### Summary of Changes:
- **First Blank:** `MultinomialNB` is the correct class to import for the Naive Bayes model.
- **Second Blank:** The method to fit the model is `fit`.

This code will now correctly initialize a Multinomial Naive Bayes model, train it with the training dataset, and make predictions on the test dataset.

## Fill in the Blanks: Predicting Using Naive Bayes Model

Stellar Navigator, it’s your turn! Please fill in the missing pieces represented as ____ in the code. You will need to transform the training and test data, fit the Naive Bayes model on the training data, and make predictions on the test data. Are you up for this adventure?

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.____(X_train)

# Transform the test data
X_test_count = count_vectorizer.____(X_test)

# Initialize the MultinomialNB model
naive_bayes_model = MultinomialNB()

# Fit the model on the training data
naive_bayes_model.____(X_train_count, Y_train)

# Make predictions on the test data
y_pred = naive_bayes_model.____(X_test_count)

# Print the first 5 predictions and actual labels for comparison
print("First 5 Predictions: ", y_pred[:5])
print("Actual Labels: ", Y_test[:5].tolist())

```

Abacus AI ChatLLM:
Absolutely, Stellar Navigator! Let's fill in the missing pieces in the code to ensure it works correctly for training and predicting with the Naive Bayes model. Here’s the completed code:

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)  # Fill in the blank here

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)  # Fill in the blank here

# Initialize the MultinomialNB model
naive_bayes_model = MultinomialNB()

# Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)  # Fill in the blank here

# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test_count)  # Fill in the blank here

# Print the first 5 predictions and actual labels for comparison
print("First 5 Predictions: ", y_pred[:5])
print("Actual Labels: ", Y_test[:5].tolist())
```

### Summary of Changes:
- **First Blank:** `fit_transform` is used to fit the `CountVectorizer` on the training data and transform it into a numerical format.
- **Second Blank:** `transform` is used to convert the test data into the same numerical format using the already fitted `CountVectorizer`.
- **Third Blank:** `fit` is the method used to train the Naive Bayes model on the transformed training data.
- **Fourth Blank:** `predict` is the method used to make predictions on the transformed test data.

This code will now correctly process the SMS dataset, train the Naive Bayes model, and output the first five predictions along with the actual labels for comparison. Happy coding!

## Visualize Naive Bayes Model Predictions

Greetings, Stellar Navigator! For this assignment, we're focusing on model initialization and training. You will find a TODO comment in the provided starter code. Fill it in to define the Naive Bayes model and train it! You'll be able to see the difference between your model's prediction and the actual results visually on a scatter plot. Let's dive in!

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# TODO: Initialize the MultinomialNB model and fit it on the training data

# Make predictions on the test data
Y_pred = naive_bayes_model.predict(X_test_count)

# Create a DataFrame with actual and predicted labels
results_df = pd.DataFrame({"Actual": Y_test, "Predicted": Y_pred})

# We now generate indices for our scatter plot for clarity
indices = range(1, 51)

# Plotting the comparison scatter plot for the first 50 messages
plt.figure(figsize=(10, 5))

# Plot actual labels
plt.scatter(indices, results_df["Actual"].values[:50], edgecolor='b', facecolors='none', label='Actual')

# Plot predicted labels
plt.scatter(indices, results_df["Predicted"].values[:50], edgecolor='none',color='r', label='Predicted', marker='x')

plt.yticks([0, 1], ['Ham', 'Spam'])
plt.ylabel('Category')
plt.xlabel('Message Number')
plt.title('Actual vs Predicted Labels for First 50 Messages')
plt.legend()
plt.show()

```

Abacus AI ChatLLM:
Greetings, Stellar Navigator! Let's fill in the TODO comment in the provided code to initialize the Naive Bayes model and fit it on the training data. Here’s the completed code:

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import datasets

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# Initialize the CountVectorizer
count_vectorizer = CountVectorizer()

# Fit and transform the training data 
X_train_count = count_vectorizer.fit_transform(X_train)

# Transform the test data
X_test_count = count_vectorizer.transform(X_test)

# TODO: Initialize the MultinomialNB model and fit it on the training data
naive_bayes_model = MultinomialNB()  # Initialize the model
naive_bayes_model.fit(X_train_count, Y_train)  # Fit the model on the training data

# Make predictions on the test data
Y_pred = naive_bayes_model.predict(X_test_count)

# Create a DataFrame with actual and predicted labels
results_df = pd.DataFrame({"Actual": Y_test, "Predicted": Y_pred})

# We now generate indices for our scatter plot for clarity
indices = range(1, 51)

# Plotting the comparison scatter plot for the first 50 messages
plt.figure(figsize=(10, 5))

# Plot actual labels
plt.scatter(indices, results_df["Actual"].values[:50], edgecolor='b', facecolors='none', label='Actual')

# Plot predicted labels
plt.scatter(indices, results_df["Predicted"].values[:50], edgecolor='none', color='r', label='Predicted', marker='x')

plt.yticks([0, 1], ['Ham', 'Spam'])
plt.ylabel('Category')
plt.xlabel('Message Number')
plt.title('Actual vs Predicted Labels for First 50 Messages')
plt.legend()
plt.show()
```

### Summary of Changes:
- **Model Initialization:** `naive_bayes_model = MultinomialNB()` initializes the Multinomial Naive Bayes model.
- **Model Fitting:** `naive_bayes_model.fit(X_train_count, Y_train)` fits the model on the training data.

### Explanation:
- After fitting the model, predictions are made on the test data, and a scatter plot is generated to visually compare the actual and predicted labels for the first 50 messages. The blue circles represent the actual labels, while the red crosses represent the predicted labels.

This code will now run successfully, allowing you to visualize the performance of your Naive Bayes model in classifying SMS messages as spam or ham. Happy coding!

## Evaluate Naive Bayes Model with Confusion Matrix

Dear Space Voyager, your skills are needed once again! Using what you've learned about the Naive Bayes Model, your mission is to evaluate your model using a confusion matrix. Implement the Naive Bayes model, make predictions, and then generate a confusion matrix for the model using the test data. Plot the results of the confusion matrix for visual assessment.

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# TODO: Initialize the CountVectorizer and fit and transform the training data

# TODO: Transform the test data

# TODO: Initialize the MultinomialNB model

# TODO: Fit the model on the training data

# TODO: Make predictions on the test data

# Generate confusion matrix
confusion_matrix = metrics.confusion_matrix(Y_test, Y_pred)

# Create a DataFrame with confusion matrix
confusion_df = pd.DataFrame(confusion_matrix, index=['ham', 'spam'], columns=['ham', 'spam'])

# Print the confusion DataFrame
print(confusion_df)

# Plot the confusion matrix
plt.figure(figsize=(6,6))
sns.heatmap(confusion_df, annot=True, fmt='d', cmap='Blues')
plt.show()

```

Abacus AI ChatLLM:
Dear Space Voyager, let's complete the provided code to implement the Naive Bayes model, make predictions, and generate a confusion matrix for evaluation. Here’s the completed code with the necessary sections filled in:

```python
# Import the necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
import datasets
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
spam_dataset = datasets.load_dataset('codesignal/sms-spam-collection', split='train')
spam_dataset = pd.DataFrame(spam_dataset)

# Define X (input features) and Y (output labels)
X = spam_dataset["message"]
Y = spam_dataset["label"]

# Perform the train test split using stratified cross-validation
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

# TODO: Initialize the CountVectorizer and fit and transform the training data
count_vectorizer = CountVectorizer()  # Initialize CountVectorizer
X_train_count = count_vectorizer.fit_transform(X_train)  # Fit and transform the training data

# TODO: Transform the test data
X_test_count = count_vectorizer.transform(X_test)  # Transform the test data

# TODO: Initialize the MultinomialNB model
naive_bayes_model = MultinomialNB()  # Initialize the model

# TODO: Fit the model on the training data
naive_bayes_model.fit(X_train_count, Y_train)  # Fit the model on the training data

# TODO: Make predictions on the test data
Y_pred = naive_bayes_model.predict(X_test_count)  # Make predictions on the test data

# Generate confusion matrix
confusion_matrix = metrics.confusion_matrix(Y_test, Y_pred)

# Create a DataFrame with confusion matrix
confusion_df = pd.DataFrame(confusion_matrix, index=['ham', 'spam'], columns=['ham', 'spam'])

# Print the confusion DataFrame
print(confusion_df)

# Plot the confusion matrix
plt.figure(figsize=(6, 6))
sns.heatmap(confusion_df, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
```

### Summary of Changes:
- **CountVectorizer Initialization and Transformation:** The `CountVectorizer` is initialized, and the training data is fitted and transformed.
- **Test Data Transformation:** The test data is transformed using the fitted `CountVectorizer`.
- **Model Initialization and Fitting:** The `MultinomialNB` model is initialized and fitted on the training data.
- **Predictions:** Predictions are made on the test data.

### Explanation:
- After fitting the model and making predictions, a confusion matrix is generated using `metrics.confusion_matrix()`. This matrix is then converted into a DataFrame for better readability.
- Finally, the confusion matrix is visualized using a heatmap from the Seaborn library, providing a clear visual assessment of the model's performance.

This code will now run successfully, allowing you to evaluate the performance of your Naive Bayes model using a confusion matrix. Happy coding!