Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Sentiment Classification of Movie Reviews

Last updated: 2022-May-21

Neural Network model to classify whether a movie review is positive or negative. Movie reviews are written in English and obtained from IMDB.

Please contact me at if you have any questions.

The repository is publicly available at

Stable release with pdf report is in

This repository has been updated with more recent version of PyTorch and other libraries, please visit tag v1.4 to replicate results reported in the pdf documentation.


  • Assuming you are on Ubuntu 22.04 with Python 3.10.4, open terminal
  • ./ (Install packages) or ./ (To install packages globally with sudo access)
  • cd data && ./ && cd .. (Download and Prepare dataset)
  • ./ 0 GTX1070 (Train model on GPU 0, a GTX1070)
  • ./run_test (Test model on CPU)

Requirements and Environment:

  • Ubuntu 22.04
  • Python 3.10.4 (default on Ubuntu 22.04)

Python Library Required (requirements.txt):

  • torch (PyTorch)
  • h5py
  • numpy
  • scikit-learn
  • scipy
  • nltk
  • pydot
  • matplotlib

Python libraries above can be installed via pip3 -r requirements.txt.

Python libraries setup:

If you are running a UNIX based machine, you can run the shell script ./ to install all the required python libraries, assuming pip is already installed. If you prefer to install manually, please refer to the list of libraries above.


  • IMDB Large Moview Review dataset is obtained from
  • Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
  • If you are setting up for the first time:
    • Download and process dataset from scratch by running script in data folder
    • i.e., execute the following command cd data && ./
    • The data will be automatically downloaded and preprocessed into training, validation, and test set
    • If you do not have java development kit (JDK) installed, please install by running this command:
      • sudo apt update && sudo apt install default-jdk

To train the model

  • Execute ./ <GPU_Number> <GPU_Name> , e.g., $> ./ 0 TITANX
  • Please make sure Nvidia CUDA is installed to be able to train the model using GPU.
  • For more details on the training arguments, refer to the sample shell script

Pre-trained word embeddings

  • To train your own word embeddings from the provided unsupervised movie reviews, execute ./run_word2vec
  • To download pre-trained word embeddings such as GloVe, execute cd word2vec/vectors && ./

To test the model

  • For quick testing, execute ./
  • There are three arguments for testing:
    • -v : vocab path, the path to the vocabulary files saved during training
    • -m : model path, the path to the best model saved during training
    • -ts : test path, the path to the text file containing the movie review to be evaluated
  • Sample command : python -v saved_model/vocab_v50000.pkl -m saved_model/best_model_weights.h5 -ts data/aclImdb/train/unsup/74_0.txt

To run web demo

  • For quick testing: python3
  • For production (using pm2): pm2 start --name sentiment_classifier --interpreter python3
  • Visit the web UI at