This app uses Machine Learning NLP/topic modeling/document similarity techniques to group OMSCS CS-6460 Fall 2018 students by interests based on their essays/writing assignments.
With few clicks you will see a ranking of who's work is most similar of yours.
The objective is to help you find people with similar interests who are working in the same topics you are, to facilitate team formation and collaboration. After all, learning is better togehter. Have fun!
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
I built this on a MacOS X using Python 3.7.0. Check your Python version by running python -V
. If you have an earlier version of Python installed, I suggest you upgrade it, downloading the latest Python version.
On top of that, you will need pip for installing Python packages. pip is already installed if you are using downloaded from python.org. Just make sure to upgrade pip.
Clone this repository locally and install all requirements by running in terminal:
git clone git@github.com:ucals/bettertogether.git
cd bettertogether
pip install -r requirements.txt
To run it properly you will have to download all PDFs with students' assignments. To do it, go to Canvas, click on Account -> Settings
, scroll to the bottom of the page, and click + New Access Token
. Copy the new token.
You will also have to get a TextRazor free API Key. After creating a free accoung, you will be redirected to a success page containing your API Key. Copy that as well.
Edit your ~/.bash_profile
and add the following line:
export CANVAS_API_KEY="your new Canvas token"
export TEXTRAZOR_API_KEY="your new TextRazor API key"
Replace your new Canvas token
by the token you got from Canvas, and your new TextRazor API key
by the API Key you got from TextRazor . Reload your profile by running source ~/.bash_profile
from terminal.
Edit pytest.ini
and set download_all_assignments = True
. Now, to download all PDFs, just run:
pytest -k TestPreProcess
This process will take some time. After it ends, you will have all PDFs from students' assignments downloaded in pdfs/
folder. Finally, I recommend setting download_all_assignments = False
back in pytest.ini
.
All main tests are located in test_api.py
. To run them:
pytest -k TestApi
To run the webserver locally, just:
python main.py
You can access it by browsing to http://localhost:8080
.
I have included the trained models in the repository as it takes ~10 minutes to train them. If you want to train them yourself, install Jupyter Notebook and run tutorial.ipynb
3 times. At the beginning of each time, alter assignment
variable in 2nd cell to "Assignment 2"
, "Assignment 3"
, and "Assignment 4"
. This procedure will generate the following files:
doc2vec_model_assignment_2
doc2vec_model_assignment_3
doc2vec_model_assignment_4
They will be located in models/
folder.
This code was built on top of the following code:
- Gensim - Topic Modeling for Humans
- Canvas API
- TextRazor API
- Textract
- Bottle Python Web Framework
- JsonPickle
I'm Carlos Souza, and I did this side project as part of Master of Science in Computer Science CS-6460 Education Technology course from Georgia Institute of Technology. I'm accessible at souza@gatech.edu or carlos@udacity.com.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- Quoc Le & Tomas Mikolov, thanks for this fantastic article!
- RaRe Technologies, thanks for this great tutorial!
- Skipgram, thanks for this amazing tutorial!