Described each step along with code in the notebooks.
Used praw library of python for extraction.
Analysed the data using graphs and scattered points as well as correlation. Used matplotlib library for the same.
- Preprocessed the data: Removed stopwords and performed stemming on the data
- Diving into training and test: Divided the dataset into training and test set. Used standard, 0.7:0.3 metric
- Testing accross classifiers: Tested along 3 classifiers: Naive Bayees, SVM and Logisitic Regression. Checked accuracy of each of the classifiers.
- Saving the model: Saved the model with highest accuracy in a .sav file to use it for prediction.
- Model testing: Take input URL from the user and return the predicted and actual flairs. Call the saved model for predicted flairs
The model reads all the urls in the file line by line and predict the flair
- The same is stored in json file.
It will be a key and predicted flair as value.