The Internet has not only made information accessible to the masses but also has become a hotspot of misinformation and fake news. Fake news can lead to more harm if not correctly identified and tagged. The severity of of the effects of misinformation can be judged from the fact that there have been riots and killings attributed to fake news.
Fake news can even sway people's opinions and affiliations - a fact that political parties have used (and still use) to make people vote in their favour.
As such, it has become necessary to segregate the real from the fake news. But this is not feasible manually thanks to the huge amount of information that is churned out every minute on the internet.
In this project, I tried to classify Fake News using two algorithms, namely Naive Bayes Classifier and PassiveAggressive Classifier.
The dataset can be downloaded from here.
The features in the dataset are:
The first method was to apply the models to just the titles. It gave fairly good results with the Naive Bayes accuracy at 92.9% and PassiveAggressive Classfier's accuracy at 91.2%.
Naive Bayes
PassiveAggressive
Then I applied the models to text only. The results improved a lot, and PassiveAggressive classifier preformed better.
Naive Bayes
PassiveAggressive
Finally I applied the models to title+text. The results improved didn't improve much from the previous try, but there was an improvement.
Naive Bayes
PassiveAggressive