Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

Movie Review Analysis

An analysis of the movie_review data set included in the nltk corpus. I would probably add some buzz words here later on.


Index:


What is in this repo

[Back to top]

  • An implementation of nltk.NaiveBayesClassifier trained against 5000 movie reviews. Implemented in nltkNB.ipynb
  • Using sklearn
    • Naive Bayes:
      • MultinomialNB:
      • BernoulliNB:
    • Linear Model
      • LogisticRegression:
      • SGDClassifier:
    • SVM
      • SVC:
      • LinearSVC:
      • NuSVC:

Implemented in scikitlearnNB.ipynb

  • Implemented a voting system to choose the best out of all the learning methods. Implemented in voting_process.ipynb

Accuracy achieved

[Back to top]

Classifiers Accuracy achieved
nltk.NaiveBayesClassifier 73.0%
ScikitLearn Implementations
BernoulliNB 72.0%
MultinomialNB 76.0%
LogisticRegression 74.0%
SGDClassifier 69.0%
SVC 48.0%
LinearSVC 74.0%
NuSVC 74.0%

Requirements

[Back to top]

The simplest way(and the suggested way) would be to install the required packages and the dependencies by using either anaconda or miniconda

After that you can do

$ conda update conda
$ conda install scikit-learn nltk

Downloading the dataset

[Back to top]

The dataset used in this package is bundled along with the nltk package.

Run your python interpreter

>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews') 

NOTE: You can check system specific installation instructions from the official nltk website

Check if everything is good till now by running your interpreter again and importing these

>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn
>>> 

If these imports work for you. Then you are good to go!


Running it

[Back to top]

  1. Clone the repo
$ git clone https://github.com/prodicus/movieReviewsAnalysis
$ cd movieReviewsAnalysis
## run the ipython server
$ ipython notebook
  1. Order of running

  2. nltkNB.ipynb

  3. scikitlearnNB.ipynb

  4. voting_process.ipynb

  5. Hack away!


So

[Back to top]

"So what, Well this is pretty basic!"

Yes, it is but hey we all do start somewhere right?

Psst. I am working on a spam filtering system. You know the one in which you paste an email and then it tells you whether it is a spam or not.

You can follow me on twitter @tasdikrahman to keep tabs on it.


Legal stuff

[Back to top]

Hacked together by Tasdik Rahman under the MIT License

You can find a copy of the License at http://prodicus.mit-license.org/

About

Some stupid Movie reviews analyzed and classified using nltk and scikitlearn

Resources

License

Releases

No releases published
You can’t perform that action at this time.