Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Movie Review Analysis

An analysis of the movie_review data set included in the nltk corpus. I would probably add some buzz words here later on.


What is in this repo

[Back to top]

  • An implementation of nltk.NaiveBayesClassifier trained against 5000 movie reviews. Implemented in nltkNB.ipynb
  • Using sklearn
    • Naive Bayes:
      • MultinomialNB:
      • BernoulliNB:
    • Linear Model
      • LogisticRegression:
      • SGDClassifier:
    • SVM
      • SVC:
      • LinearSVC:
      • NuSVC:

Implemented in scikitlearnNB.ipynb

  • Implemented a voting system to choose the best out of all the learning methods. Implemented in voting_process.ipynb

Accuracy achieved

[Back to top]

Classifiers Accuracy achieved
nltk.NaiveBayesClassifier 73.0%
ScikitLearn Implementations
BernoulliNB 72.0%
MultinomialNB 76.0%
LogisticRegression 74.0%
SGDClassifier 69.0%
SVC 48.0%
LinearSVC 74.0%
NuSVC 74.0%


[Back to top]

The simplest way(and the suggested way) would be to install the required packages and the dependencies by using either anaconda or miniconda

After that you can do

$ conda update conda
$ conda install scikit-learn nltk

Downloading the dataset

[Back to top]

The dataset used in this package is bundled along with the nltk package.

Run your python interpreter

>>> import nltk

NOTE: You can check system specific installation instructions from the official nltk website

Check if everything is good till now by running your interpreter again and importing these

>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn

If these imports work for you. Then you are good to go!

Running it

[Back to top]

  1. Clone the repo
$ git clone
$ cd movieReviewsAnalysis
## run the ipython server
$ ipython notebook
  1. Order of running

  2. nltkNB.ipynb

  3. scikitlearnNB.ipynb

  4. voting_process.ipynb

  5. Hack away!


[Back to top]

"So what, Well this is pretty basic!"

Yes, it is but hey we all do start somewhere right?

Psst. I am working on a spam filtering system. You know the one in which you paste an email and then it tells you whether it is a spam or not.

You can follow me on twitter @tasdikrahman to keep tabs on it.

Legal stuff

[Back to top]

Hacked together by Tasdik Rahman under the MIT License

You can find a copy of the License at


Some stupid Movie reviews analyzed and classified using nltk and scikitlearn







No releases published


No packages published