Movie Review Classification with Word Embeddings (Feb 2018)

Classified movie reviews into positive and negative with GloVe embeddings and machine learning techniques.

Description

The large movie view dataset (http://ai.stanford.edu/~amaas/data/sentiment/) contains a collection of 50,000 reviews from IMDB. The dataset contains an even number of positive and negative reviews and is divided into training and test sets. The training set is the same 25,000 labeled reviews. The sentiment classification task consists of predicting the polarity (positive or negative) of a given text.

Steps

Use the libary spacy to tokenize data.
Download embedding vectors from https://nlp.stanford.edu/projects/glove/.
Read the 300 dimensional GloVe embeddings into a dictionary.
Create average feature embedding for each sentence (ignore stopwords).
Fit an XGBoost classifier to this data. Report test and training errors.
Compare previous results to fitting XGBoost to a one-hot encoding representation of the data with bag of words. Report test and training errors.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
word-embeddings.ipynb		word-embeddings.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

word-embeddings.ipynb

word-embeddings.ipynb

Repository files navigation

Movie Review Classification with Word Embeddings (Feb 2018)

Description

Steps

About

Releases

Packages

Languages

ytian22/Movie-Review-Classification

Folders and files

Latest commit

History

README.md

README.md

word-embeddings.ipynb

word-embeddings.ipynb

Repository files navigation

Movie Review Classification with Word Embeddings (Feb 2018)

Description

Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages