<a href="https://colab.research.google.com/github/vsrmule/Fake-News-Detection/blob/main/Fake_News_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**Data Science Task 1**

**Task**: Detecting Fake News

**Objective**: Differentiate between real and fake news using a Python project applying a PassiveAggressiveClassifier.

**Tasks**:
1. Read and explore the textual dataset.
2. Build a machine learning model with TfidfVectorizer and PassiveAggressiveClassifier.
3. Create a confusion matrix to evaluate the model's performance.
4. Measure the model's accuracy.

**Steps**:
1.  Import necessary libraries
2.  Read and explore the dataset
3.   Build a model using PassiveAggressiveClassifier
4.   Evaluate the model's accuracy









**Import necessary libraries**

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import itertools
import seaborn as sns
import matplotlib.pyplot as plt

**Read and explore the dataset**

In [None]:
news_data= pd.read_csv("/content/news.csv")
news_data.head(10)


FileNotFoundError: [Errno 2] No such file or directory: '/content/news.csv'

In [None]:
news_data.info()

In [None]:
news_data.shape

In [None]:
news_data["label"].value_counts()

In [None]:
labels= news_data.label
labels.head(5)

**1st model**


3- Build the model

In [None]:
#First, we split the dataset into train & test samples:
x_train, x_test, y_train, y_test= train_test_split(news_data["text"], labels, test_size= 0.4, random_state= 7)

In [None]:
#Then we’ll initialize TfidfVectorizer with English stop words
vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
tfidf_train=vectorizer.fit_transform(x_train)
tfidf_test=vectorizer.transform(x_test)

In [None]:
#Create a PassiveAggressiveClassifier
passive=PassiveAggressiveClassifier(max_iter=50)
passive.fit(tfidf_train,y_train)

y_pred=passive.predict(tfidf_test)

**4- Evaluate the model's accuracy**

In [None]:
#Create a confusion matrix
matrix= confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
matrix

In [None]:
#Visualize the confusion matrix

sns.heatmap(matrix, annot=True, cmap='magma')  # 'magma' is just an example, you can choose any other colormap
plt.show()

In [None]:
#Calculate the model's accuracy
Accuracy=accuracy_score(y_test,y_pred)
Accuracy*100

In [None]:
#The model's accuracy is 93.21%
Report= classification_report(y_test, y_pred)
print(Report)

In [None]:
#Now let’s test this model.
#To test our trained model, I’ll first write down the title of any news item found on google news to see if our
#model predicts that the news is real or not:
news_headline_1 = "Trump takes on Cruz, but lightly"

data = vectorizer.transform([news_headline_1]).toarray()
print(passive.predict(data))

In [None]:
news_headline_2 = "Cow dung can cure Corona Virus"
data = vectorizer.transform([news_headline_2]).toarray()
print(passive.predict(data))

In [None]:
news_headline_3 = "Doubt Congress will get ‘even 40 seats’ in LS polls, says Mamata"
data = vectorizer.transform([news_headline_3]).toarray()
print(passive.predict(data))

**2nd Model to Increase Accuracy**


3- Build the model


In [None]:
#First, we split the dataset into train & test samples:
x_train, x_test, y_train, y_test= train_test_split(news_data["text"], labels, test_size= 0.2, random_state= 7)

In [None]:
#Then we’ll initialize TfidfVectorizer with English stop words
vectorizer=TfidfVectorizer(stop_words='english', max_df=0.9)
## fit and transform train set, transform test set
tfidf_train=vectorizer.fit_transform(x_train)
tfidf_test=vectorizer.transform(x_test)


In [None]:
#Create a PassiveAggressiveClassifier
passive=PassiveAggressiveClassifier(max_iter=50)
passive.fit(tfidf_train,y_train)

y_pred=passive.predict(tfidf_test)


4.   Evaluate the model's accuracy



In [None]:
#Create a confusion matrix
matrix= confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
matrix


In [None]:
#Visualize the confusion matrix
sns.heatmap(matrix, annot=True, cmap='viridis')  # 'viridis' is just an example, you can choose any other colormap
plt.show()



In [None]:
#Calculate the model's accuracy
Accuracy=accuracy_score(y_test,y_pred)
Accuracy*100

In [None]:
#The model's accuracy is 93.21%
Report= classification_report(y_test, y_pred)
print(Report)


In [None]:
#Now let’s test this model.
#To test our trained model, I’ll first write down the title of any news item found on google news to see if our
#model predicts that the news is real or not:news_headline_1 = "Trump takes on Cruz, but lightly"
news_headline_1 = "Trump takes on Cruz, but lightly"
data = vectorizer.transform([news_headline_1]).toarray()
print(passive.predict(data))

In [None]:
#Now I’m going to write a random fake news headline to see if the model predicts the news is fake or not:
news_headline_2 = "Cow dung can cure Corona Virus"
data = vectorizer.transform([news_headline_2]).toarray()
print(passive.predict(data))

In [None]:
news_headline_3 = "Doubt Congress will get ‘even 40 seats’ in LS polls, says Mamata"
data = vectorizer.transform([news_headline_3]).toarray()
print(passive.predict(data))


**3rd Model to further increase accuracy**


3.  Build the model




In [None]:
#First, we split the dataset into train & test samples:
x_train,x_test,y_train,y_test=train_test_split(news_data['text'], labels, test_size=0.3, random_state=6)


In [None]:
vectorizer=TfidfVectorizer(stop_words='english', max_df=0.9)
## fit and transform train set, transform test set
tfidf_train=vectorizer.fit_transform(x_train)
tfidf_test=vectorizer.transform(x_test)


In [None]:
#Create a PassiveAggressiveClassifier
passive=PassiveAggressiveClassifier(max_iter=50)
passive.fit(tfidf_train,y_train)

y_pred=passive.predict(tfidf_test)

**4- Evaluate the model's accuracy**

In [None]:
#Create a confusion matrix
matrix= confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
matrix

In [None]:
#Visualize the confusion matrix
sns.heatmap(matrix, annot=True, cmap='viridis')  # 'viridis' is just an example, you can choose any other colormap
plt.show()

In [None]:

Accuracy=accuracy_score(y_test,y_pred)
Accuracy*100


In [None]:

Report= classification_report(y_test, y_pred)
print(Report)


In [None]:
#Now let’s test this model.
#To test our trained model, I’ll first write down the title of any news item found on google news to see if our
#model predicts that the news is real or not:
news_headline_1 = "Trump takes on Cruz, but lightly"

data = vectorizer.transform([news_headline_1]).toarray()
print(passive.predict(data))


In [None]:
#Now I’m going to write a random fake news headline to see if the model predicts the news is fake or not:
news_headline_2 = "Cow dung can cure Corona Virus"
data = vectorizer.transform([news_headline_2]).toarray()
print(passive.predict(data))

In [None]:
news_headline_3 = "Doubt Congress will get ‘even 40 seats’ in LS polls, says Mamata"
data = vectorizer.transform([news_headline_3]).toarray()
print(passive.predict(data))

**4nd Model to Increase Accuracy**

**3- Build the model**

In [None]:
#First, we split the dataset into train & test samples:
x_train,x_test,y_train,y_test=train_test_split(news_data['text'], labels, test_size=0.2, random_state=10)

In [None]:
vectorizer=TfidfVectorizer(stop_words='english', max_df=0.9)
## fit and transform train set, transform test set
tfidf_train=vectorizer.fit_transform(x_train)
tfidf_test=vectorizer.transform(x_test)

In [None]:
#Create a PassiveAggressiveClassifier
passive=PassiveAggressiveClassifier(max_iter=50)
passive.fit(tfidf_train,y_train)

y_pred=passive.predict(tfidf_test)



**4- Evaluate the model's accuracy**

In [None]:
#Create a confusion matrix
matrix= confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
matrix

In [None]:
#Visualize the confusion matrix
sns.heatmap(matrix, annot=True, cmap='viridis')  # 'viridis' is just an example, you can choose any other colormap
plt.show()

In [None]:
#Calculate the model's accuracy
Accuracy=accuracy_score(y_test,y_pred)
Accuracy*100


In [None]:
#the model's accuracy is 94.55%
Report= classification_report(y_test, y_pred)
print(Report)

In [None]:
#Now let’s test this model.
#To test our trained model, I’ll first write down the title of any news item found on google news to see if our
#model predicts that the news is real or not:
news_headline_1 = "Trump takes on Cruz, but lightly"

data = vectorizer.transform([news_headline_1]).toarray()
print(passive.predict(data))

In [None]:
#Now I’m going to write a random fake news headline to see if the model predicts the news is fake or not:
news_headline_2 = "Cow dung can cure Corona Virus"
data = vectorizer.transform([news_headline_2]).toarray()
print(passive.predict(data))


In [None]:
news_headline_3 = "Doubt Congress will get ‘even 40 seats’ in LS polls, says Mamata"
data = vectorizer.transform([news_headline_3]).toarray()
print(passive.predict(data))