Deep learning & NLP based project, using word2vec model for creating word embeddings and using it to feed to LSTM neural network to predict the essay score awarded to a student.
- Pandas
- Numpy
- Natural language toolkit (NLTK)
- Gensim 3.5
- Scikit-learn
- Keras 2.2.2
- Tensorflow 1.9
- Linear Regressor
- Gradient Boosting Regressor
- Support Vector Regressor
- Long Short Term Memory(LSTM)
- Mean squared error
- Variance
- Cohen's Kappa score
Here, Word2vec model is used to generate word embeddings on which the whole model relies to predict the scores. Before generating word vectors, essays given in the essay sets are cleaned(removing stopwords, punctuations, special symbols etc.) and then word vectors are created so as to feed it to the word2vec model for generating word embeddings.After that, model is trained on various regression algorithms and each is avaluated on the basis of certain evaluation metric.
- Install all the packages which are mentioned above using the following command:
pip install keras
pip install numpy
pip install pandas
pip install sklearn
pip install gensim
pip install nltk
- Run the essay_scorer.py file in jupyter notebook.
- Have a cup of coffee and wait for the models to generate their respective mean squared error values, variances, cohen kappa scores.
Out of all the models that were trained and tested, LSTMs outperform all other models and hence the same is used to generate scores for the final dataset.