Comparison and analysis of several algorithms on the problem of detecting spam in messages. The analysis of the data set is carried out.
- SMS Spam Collection Dataset
- The files contain one message per line. Each line is composed by two columns: v1 contains the label (ham or spam) and v2 contains the raw text.
Realised methods:
- Naive Bayes spam filtering
- K-Nearest Neighbors algorithm
- Decision Tree learning
- Support Vector Machine (SVM)
- Random Forest
- Naive Bayes spam filtering
- Naive Bayes classifier
- Text Classification and Naïve Bayes
- Naive Bayes - Stanford NLP - Professor Dan Jurafsky
-
WordCloud
-
Seaborn