# Machine Learning Based News Classification in fastText Embeddings

Dataset Link - https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset

![Election](enhanced_flowchart.png)

In [2]:
import pandas as pd

In [3]:
df_fake = pd.read_csv("Fake.csv")
df_true = pd.read_csv("True.csv")

In [4]:
df_fake

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"
...,...,...,...,...
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016"
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016"
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016"
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016"


In [5]:
df_fake['label'] =  'Fake'
df_fake.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",Fake
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",Fake
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",Fake
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",Fake
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",Fake


In [6]:
df_true['label'] = 'True'
df_true.head()

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",True
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",True
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",True
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",True
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",True


In [7]:
df = df_fake._append(df_true, ignore_index=True)
df

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",Fake
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",Fake
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",Fake
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",Fake
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",Fake
...,...,...,...,...,...
44893,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017",True
44894,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017",True
44895,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017",True
44896,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017",True


In [8]:
df.drop(['date'],axis = 1,inplace = True)

In [9]:
df.columns = ['Title','Text','Subject','Label']

In [10]:
df.head(1)

Unnamed: 0,Title,Text,Subject,Label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,Fake


In [11]:
df['Label'].value_counts()

Label
Fake    23481
True    21417
Name: count, dtype: int64

In [12]:
df.isnull().sum()

Title      0
Text       0
Subject    0
Label      0
dtype: int64

In [13]:
cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]
df.head(1)

Unnamed: 0,Label,Title,Text,Subject
0,Fake,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News


In [14]:
df['Label'] = '__label__' + df['Label'].astype(str)

In [15]:
df.head()

Unnamed: 0,Label,Title,Text,Subject
0,__label__Fake,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News
1,__label__Fake,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News
2,__label__Fake,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News
3,__label__Fake,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News
4,__label__Fake,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News


In [16]:
df['News_Description'] = df['Label'] + ' ' + df['Title'] + ' ' + df['Text'] + ' ' + df['Subject']
df.head(1)

Unnamed: 0,Label,Title,Text,Subject,News_Description
0,__label__Fake,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,__label__Fake Donald Trump Sends Out Embarras...


In [17]:
df['News_Description'][0]

'__label__Fake  Donald Trump Sends Out Embarrassing New Year’s Eve Message; This is Disturbing Donald Trump just couldn t wish all Americans a Happy New Year and leave it at that. Instead, he had to give a shout out to his enemies, haters and  the very dishonest fake news media.  The former reality show star had just one job to do and he couldn t do it. As our Country rapidly grows stronger and smarter, I want to wish all of my friends, supporters, enemies, haters, and even the very dishonest Fake News Media, a Happy and Healthy New Year,  President Angry Pants tweeted.  2018 will be a great year for America! As our Country rapidly grows stronger and smarter, I want to wish all of my friends, supporters, enemies, haters, and even the very dishonest Fake News Media, a Happy and Healthy New Year. 2018 will be a great year for America!  Donald J. Trump (@realDonaldTrump) December 31, 2017Trump s tweet went down about as welll as you d expect.What kind of president sends a New Year s greet

In [18]:
df['News_Description'][1]

'__label__Fake  Drunk Bragging Trump Staffer Started Russian Collusion Investigation House Intelligence Committee Chairman Devin Nunes is going to have a bad day. He s been under the assumption, like many of us, that the Christopher Steele-dossier was what prompted the Russia investigation so he s been lashing out at the Department of Justice and the FBI in order to protect Trump. As it happens, the dossier is not what started the investigation, according to documents obtained by the New York Times.Former Trump campaign adviser George Papadopoulos was drunk in a wine bar when he revealed knowledge of Russian opposition research on Hillary Clinton.On top of that, Papadopoulos wasn t just a covfefe boy for Trump, as his administration has alleged. He had a much larger role, but none so damning as being a drunken fool in a wine bar. Coffee boys  don t help to arrange a New York meeting between Trump and President Abdel Fattah el-Sisi of Egypt two months before the election. It was known b

In [19]:
df['News_Description'][2]

'__label__Fake  Sheriff David Clarke Becomes An Internet Joke For Threatening To Poke People ‘In The Eye’ On Friday, it was revealed that former Milwaukee Sheriff David Clarke, who was being considered for Homeland Security Secretary in Donald Trump s administration, has an email scandal of his own.In January, there was a brief run-in on a plane between Clarke and fellow passenger Dan Black, who he later had detained by the police for no reason whatsoever, except that maybe his feelings were hurt. Clarke messaged the police to stop Black after he deplaned, and now, a search warrant has been executed by the FBI to see the exchanges.Clarke is calling it fake news even though copies of the search warrant are on the Internet. I am UNINTIMIDATED by lib media attempts to smear and discredit me with their FAKE NEWS reports designed to silence me,  the former sheriff tweeted.  I will continue to poke them in the eye with a sharp stick and bitch slap these scum bags til they get it. I have been

In [20]:
df['News_Description'][3]

'__label__Fake  Trump Is So Obsessed He Even Has Obama’s Name Coded Into His Website (IMAGES) On Christmas day, Donald Trump announced that he would  be back to work  the following day, but he is golfing for the fourth day in a row. The former reality show star blasted former President Barack Obama for playing golf and now Trump is on track to outpace the number of golf games his predecessor played.Updated my tracker of Trump s appearances at Trump properties.71 rounds of golf including today s. At this pace, he ll pass Obama s first-term total by July 24 next year. https://t.co/Fg7VacxRtJ pic.twitter.com/5gEMcjQTbH  Philip Bump (@pbump) December 29, 2017 That makes what a Washington Post reporter discovered on Trump s website really weird, but everything about this administration is bizarre AF. The coding contained a reference to Obama and golf:  Unlike Obama, we are working to fix the problem   and not on the golf course.  However, the coding wasn t done correctly.The website of Dona

In [21]:
import re

In [22]:
def preprocess_text(text):
    text = re.sub(r'\s+', ' ', text)  
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\S*@\S*\s?', '', text)
    text = re.sub(r'\(.*?\)', '', text)
    text = re.sub(r'\b(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},\s+\d{4}\b', '', text)
    text = re.sub(r'[+\-*/=]', '', text)
    return text.strip().lower()

In [23]:
df['News_Description'] = df['News_Description'].map(preprocess_text)
df.head()

Unnamed: 0,Label,Title,Text,Subject,News_Description
0,__label__Fake,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,__label__fake donald trump sends out embarrass...
1,__label__Fake,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,__label__fake drunk bragging trump staffer sta...
2,__label__Fake,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,__label__fake sheriff david clarke becomes an ...
3,__label__Fake,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,__label__fake trump is so obsessed he even has...
4,__label__Fake,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,__label__fake pope francis just called out don...


In [24]:
df['News_Description'][1]

'__label__fake drunk bragging trump staffer started russian collusion investigation house intelligence committee chairman devin nunes is going to have a bad day he s been under the assumption like many of us that the christopher steeledossier was what prompted the russia investigation so he s been lashing out at the department of justice and the fbi in order to protect trump as it happens the dossier is not what started the investigation according to documents obtained by the new york timesformer trump campaign adviser george papadopoulos was drunk in a wine bar when he revealed knowledge of russian opposition research on hillary clintonon top of that papadopoulos wasn t just a covfefe boy for trump as his administration has alleged he had a much larger role but none so damning as being a drunken fool in a wine bar coffee boys don t help to arrange a new york meeting between trump and president abdel fattah elsisi of egypt two months before the election it was known before that the for

In [25]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size = 0.2, random_state=42)

In [26]:
train.shape, test.shape

((35918, 5), (8980, 5))

In [27]:
train.to_csv("news_train.csv", columns=["News_Description"], index=False, header=False)
test.to_csv("news_test.csv", columns=["News_Description"], index=False, header=False)

# In fastText Model

In [29]:
!pip install fasttext



In [30]:
import fasttext

In [120]:
fast_text_model = fasttext.train_supervised(input="news_train.csv")
fast_text_model.test("news_test.csv")

Read 15M words
Number of words:  206925
Number of labels: 2
Progress: 100.0% words/sec/thread: 3419313 lr:  0.000000 avg.loss:  0.014094 ETA:   0h 0m 0s100.0% words/sec/thread: 3419393 lr: -0.000052 avg.loss:  0.014094 ETA:   0h 0m 0s


(8980, 0.998218262806236, 0.998218262806236)

In [122]:
fast_text_model.predict("shocking-revelation-uncovers-hidden-agenda-top-officials-exposed-a-recently-released-documentary-has-revealed-that-several-high-profile-government-officials-have-been-acting-in-secret-collusion-with-foreign-powers-the-documentary-claims-that-the-officials-were-involved-in-a-secret-scheme-to-influence-major-national-decisions-the-exposé-details-how-they-used-their-influence-to-manipulate-public-opinion-and-alter-key-policies-to-benefit-their-hidden-agendas-across-multiple-administrations-the-documentary-alleges-that-this-conspiracy-spans-over-a-decade-and-involves-some-of-the-most-respected-names-in-politics-and-business-the-documentary-uses-alleged-intercepted-communications-and-confidential-testimonies-to-support-its-claims-the-revelations-are-prompting-calls-for-an-independent-investigation-and-have-ignited-a-firestorm-of-debate-about-the-integrity-of-the-political-system-the-documentary-is-sparked-controversy-within-political-circles-and-raises-serious-questions-about-the-transparency-and-accountability-of-public-officials-while-some-dismiss-the-claims-as-sensationalist-others-are-demanding-action-and-reform-the-ongoing-debate-is-likely-to-impact-public-perception-and-policy-decisions-in-the-foreseeable-future")

(('__label__fake',), array([1.00001001]))

# fastText Embedding on Classification Algorithms

In [34]:
X = df['News_Description']
y = df['Label']

In [35]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state=42)

In [36]:
X_train.shape

(35918,)

In [37]:
X_test.shape

(8980,)

In [38]:
def get_fasttext_embeddings(X, model):
    embeddings = []
    for text in X:
        words = text.split()
        word_embeddings = [model.get_word_vector(word) for word in words]
        if word_embeddings:
            text_embedding = np.mean(word_embeddings, axis=0)
        else:
            text_embedding = np.zeros((model.get_dimension(),))
        embeddings.append(text_embedding)
    return np.array(embeddings)

In [40]:
with open('data.txt', 'w') as f:
    for text in X_train:
        f.write(text + '\n')

In [41]:
model = fasttext.train_unsupervised('data.txt', model='skipgram')

Read 15M words
Number of words:  47950
Number of labels: 2
Progress: 100.0% words/sec/thread:   61700 lr:  0.000000 avg.loss:  1.886242 ETA:   0h 0m 0s


In [42]:
import numpy as np

In [43]:
X_train = get_fasttext_embeddings(X_train, model)
X_test = get_fasttext_embeddings(X_test, model)

In [44]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

In [45]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

In [46]:
import matplotlib.pyplot as plt

In [47]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

In [48]:
classifiers = {
    'KNN': KNeighborsClassifier(n_neighbors=5),
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'SVM': SVC(),
    'Random Forest': RandomForestClassifier()
}

In [49]:
for name, clf in classifiers.items():
    print(f"Training {name}...")
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    print(f"\n{name} Evaluation:")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print(classification_report(y_test, y_pred))

Training KNN...

KNN Evaluation:
Accuracy: 0.9667037861915367
               precision    recall  f1-score   support

__label__Fake       0.97      0.96      0.97      4733
__label__True       0.96      0.97      0.96      4247

     accuracy                           0.97      8980
    macro avg       0.97      0.97      0.97      8980
 weighted avg       0.97      0.97      0.97      8980

Training Logistic Regression...

Logistic Regression Evaluation:
Accuracy: 0.9806236080178173
               precision    recall  f1-score   support

__label__Fake       0.98      0.98      0.98      4733
__label__True       0.98      0.98      0.98      4247

     accuracy                           0.98      8980
    macro avg       0.98      0.98      0.98      8980
 weighted avg       0.98      0.98      0.98      8980

Training SVM...

SVM Evaluation:
Accuracy: 0.9854120267260579
               precision    recall  f1-score   support

__label__Fake       0.99      0.99      0.99      4733
__lab

In [104]:
compare = pd.DataFrame({'Model': ['KNN','Logistic Regression','SVM', 'Random Forest'], 
                        'Accuracy': [0.9667037861915367*100, 0.9806236080178173*100, 0.9854120267260579*100, 0.9751670378619154*100]})
compare['Accuracy'] = compare['Accuracy'].round(2)
compare.sort_values(by='Accuracy', ascending=False)

Unnamed: 0,Model,Accuracy
2,SVM,98.54
1,Logistic Regression,98.06
3,Random Forest,97.52
0,KNN,96.67


In [106]:
def predict_class(text, model, classifiers):
    words = text.split()
    word_embeddings = [model.get_word_vector(word) for word in words]
    if word_embeddings:
        text_embedding = np.mean(word_embeddings, axis=0)
    else:
        text_embedding = np.zeros((model.get_dimension(),))

    predictions = {}
    for name, clf in classifiers.items():
        prediction = clf.predict([text_embedding])
        predictions[name] = prediction[0]
    
    return predictions

manual_text = input("Enter the text for prediction: ")
predictions = predict_class(manual_text, model, classifiers)
print("\nPredictions:")
for name, prediction in predictions.items():
    print(f"{name}: {prediction}")

Enter the text for prediction:  turkeys erdogan says us jerusalem decision tramples on law athens reuters  turkish president tayyip erdogan said on thursday that us president donald trump s unfortunate decision to recognize jerusalem as the capital of israel was trampling on international laws  erdogan speaking in athens after talks with prime minister alexis tsipras also said turkey wanted to see a lasting solution on the island of cyprus but said greek cypriots were avoiding talks worldnews



Predictions:
KNN: __label__True
Logistic Regression: __label__True
SVM: __label__True
Random Forest: __label__True


## **Contact Information**
*Please contact us for additional inquiries and collaboration opportunities.*

#### **Email**

mdssohail1018@gmail.com

#### **Github**
(**tsohail12**)

### **Thank you for your time and consideration!!!**