<a href="https://colab.research.google.com/github/thaparjeeya786/Movie-Review-Sentiment-Analysis/blob/main/Movie_Review_Mood_Detector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Movie Review Sentiment Analysis**

In [None]:
import nltk
nltk.download('movie_reviews')

from nltk.corpus import movie_reviews
import random
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score


[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.


Firstly, we prepare our dataset by loading all the movie reviews and their labels (positive or negative). We then shuffle the data to ensure an even mix of reviews for fair training. Finally, we separate the text from the labels and print a summary to confirm our data is ready, showing the total number of reviews and a quick sample.

In [None]:
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

texts = [" ".join(words) for words, label in documents]
labels = [label for words, label in documents]

print("Total Reviews:", len(texts))
print("Sample Review:", texts[0][:200], "...")
print("Label:", labels[0])


Total Reviews: 2000
Label: neg


Convert text reviews into a numerical format that the model can understand.

In [None]:
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

Split the data: 80% for training the model and 20% for testing its accuracy.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)


Create and train the model using our training data.

In [None]:
model = MultinomialNB()
model.fit(X_train, y_train)


Test the trained model on new reviews it hasn't seen before and print the accuracy.

In [None]:
y_pred = model.predict(X_test)
print(" Model Accuracy:", accuracy_score(y_test, y_pred))


 Model Accuracy: 0.8125


Now I test model in a real-world scenario, as I have created a function called predict_review. This function takes any new review we give it, converts the words into numbers using our vectorizer, and then has the trained model make a prediction. The result is a clear label of whether the review is Positive or Negative, demonstrating how our system can instantly analyze and understand the sentiment of new text.

In [None]:
def predict_review(review_text):
    review_vector = vectorizer.transform([review_text])
    prediction = model.predict(review_vector)[0]
    mood = "Positive " if prediction == "pos" else "Negative "
    print(f"Review: {review_text}\nPrediction: {mood}")

predict_review("This movie was amazing and full of great performances.")
predict_review("Bohot boring movie thi, pura time waste.")

Review: This movie was amazing and full of great performances.
Prediction: Positive 
Review: Bohot boring movie thi, pura time waste.
Prediction: Negative 


 ## **In Short:**


### **Task**
My task is to explain the movie review sentiment analysis project in Hinglish(Hindi and English ), step-by-step, for someone with no prior knowledge.I also include how real-time data manipulation would affect the output.

# **Introduction**

**Subtask :**

This project reads movie reviews and tells us whether the review is good (positive) or bad (negative). It's a simple program that understands the feelings behind the words in a review.

**Reasoning:**

 I need to explain the project's purpose in simple Hinglish, as if talking to a non-technical person. I will explain that the project teaches a computer to read movie reviews and determine if they are positive or negative.