### Movie Reviews Sentiment Analysis - Naive Bayes
#### ✅ This notebook trains a Naive Bayes model on NLTK Movie Reviews
- Uses stopword removal
- Uses word presence as binary features
- Shows most informative features for sentiment
- Prints test accuracy

#### 📌 Step 1: Imports and Setup

In [None]:

import nltk
from nltk.corpus import movie_reviews, stopwords
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy

nltk.download('movie_reviews')
nltk.download('stopwords')


#### 📌 Step 2: Define Feature Extractor

In [None]:
stop_words = set(stopwords.words('english'))

# Feature extractor: returns a dict for each word
# NaiveBayesClassifier expects a dict {feature_name: True}
def extract_features(words):
    return {w.lower(): True for w in words if w.isalpha() and w.lower() not in stop_words}

#### 📌 Step 3: Prepare Training and Test Data

In [None]:
# Split files
pos_files = movie_reviews.fileids('pos')
neg_files = movie_reviews.fileids('neg')

# 80% train, 20% test
train_pos = pos_files[:800]
test_pos = pos_files[800:]
train_neg = neg_files[:800]
test_neg = neg_files[800:]

# Build training set with (features, label) tuples
train_set = []
for fileid in train_pos:
    words = movie_reviews.words(fileid)
    train_set.append((extract_features(words), 'pos'))
for fileid in train_neg:
    words = movie_reviews.words(fileid)
    train_set.append((extract_features(words), 'neg'))

# Build test set
test_set = []
for fileid in test_pos:
    words = movie_reviews.words(fileid)
    test_set.append((extract_features(words), 'pos'))
for fileid in test_neg:
    words = movie_reviews.words(fileid)
    test_set.append((extract_features(words), 'neg'))

#### 📌 Step 4: Train Naive Bayes Classifier

In [None]:
# Train Naive Bayes Classifier
classifier = NaiveBayesClassifier.train(train_set)

#### 📌 Step 5: Evaluate the Classifier

In [None]:
# Evaluate on test data
acc = accuracy(classifier, test_set)
print(f"Test Accuracy: {acc:.4f}")

# Show most informative words
classifier.show_most_informative_features(10)