SAIMDB - Sentiment Analysis with IMDb Dataset


Mindmap SAIMDB Project

Overview

Humans are naturally dependent of the opinions and experience of others, thus have a high tendency to seek the reviews of products before trying out themselves. However, it would be painful to scour the correct information from a multitude of reviews present on the internet.

Sentiment analysis is a powerful method to obtain helpful information about a review and classify it as positive or negative sentiment. In this Sentiment Analysis with IMDb Movie Reviews Project, we wish to apply Machine Learning and Deep Learning approaches to measure the accuracy of the model and identify the best algorithm for sentiment analysis.

IMDb Reviews is a large dataset for binary sentiment classification, consisting of 50,000 highly polar reviews (in English) with an even number of examples for training and testing purposes. The dataset contains additional unlabelled data. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie.

After text processing phase with Word Vectorisation, we will implement multiple algorithms to evaluate the accuracy. ML algorithms are traditional algorithms including Supervised Learning (KNN, Naive Bayes, Decision Tree, Random Forest, SVM, Logistic Regression) and Unsupervised Learning (K-Means Clustering), whereas DL algorithms work with multilayers artificial neural network and are expected to give better output. The output of the model is to classify a movie review as Positive (1) or Negative (0).

You can research more and more with materials in Machine Learning Research

Directories

materials: All things requirement for project
src: Source code, implement by Scikit-learn & Keras TensorFlow
- src/supervised: Machine Learning & Deep Learning Algorithms
- src/unsupervised: Clustering Algorithm + Self-supervised Pre-trained Word2Vec + Mini-batch KMeans
report: Report LaTeX + Slide presentation

Collaborators

Name	Student ID	Email
Le Duc Anh Tuan	20204929	tuan.lda204929@sis.hust.edu.vn
Nguyen Van Thanh Tung	20190090	tung.nvt190090@sis.hust.edu.vn
Hoang Long Vu	20204897	vu.hl204897@sis.hust.edu.vn
Nguyen Huu Tuan Duy	20204907	duy.nht204907@sis.hust.edu.vn
Hoang Gia Nguyen	20204889	nguyen.hg204889@sis.hust.edu.vn

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
materials/img		materials/img
report		report
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAIMDB - Sentiment Analysis with IMDb Dataset

Overview

Directories

Collaborators

About

Contributors 4

Languages

License

tuanlda78202/SAIMDB

Folders and files

Latest commit

History

Repository files navigation

SAIMDB - Sentiment Analysis with IMDb Dataset

Overview

Directories

Collaborators

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

Languages