<a href="https://colab.research.google.com/github/vanshika2424agr/FAKE-NEWS-DTECTOR-AND-GENERATOR2/blob/main/Copy_of_fake_news_detector_project1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

"""
📰 FAKE NEWS GENERATOR & DETECTOR USING GENERATIVE AI (GPT-2) & NLP (BERT)
-----------------------------------------------------------------------------
 WHAT IS FAKE NEWS?
 --> Fake news refers to false or misleading information presented as news, often intended to manipulate public opinion, deceive readers, or drive engagement through sensationalism.

 💡 Why this project?
Fake news is a big problem today. People often get misled by false news on social media. With this tool, we can:
- See how easily fake news can be created
- Try to detect fake news using AI
- Understand how powerful and risky AI can be when misused

📌 Project Overview:
This project combines the power of *Generative AI* and *Natural Language Processing*
to create a dual-function tool:
1. A Fake News Generator using **GPT-2**.
2. A Fake News Detector using **BERT** for classification.



📦 Key Components:
- **GPT-2** (by OpenAI): Autoregressive language model used here to generate fake news articles from user prompts.
- **BERT** (by Google): Transformer-based encoder model used for detecting whether an input news article or statement is real or fake.
- **Gradio UI**: Simple web interface to allow users to interactively generate and test news samples.

👨‍🏫 Ideal For:
- Students or researchers exploring NLP, transformers, or misinformation.
- Demos or educational tools on media literacy and AI-generated text.
- Foundations for more robust fake news detection systems.

🚧 Challenges We Faced:
1. Fake and real news often look **very similar**
2. Hard to find good **labeled datasets**
3. AI can also **generate very believable fake content**
4. Models can be **inaccurate** if not trained properly

✅ How We Tried to Solve Them:
- Used powerful pre-trained models (GPT-2 and BERT)
- Showed confidence scores so users can judge results better
- Suggested future improvements like fine-tuning BERT on real datasets

🧠 Note:
- The generator (GPT-2) creates fluent outputs but they are purely synthetic and not fact-checked.

"""

In [None]:
!pip install -q transformers torch gradio

In [None]:
# ✅ Import necessary libraries
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import gradio as gr

# ✅ Detect and set device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# ✅ Load GPT-2 for Fake News Generation
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)

# ✅ Load BERT for Fake News Detection (binary classification: Fake vs Real)
bert_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
bert_model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2  # 0: Fake, 1: Real
).to(device)

# ✅ Function to generate fake news text using GPT-2
def generate_fake_news(prompt):
    inputs = gpt2_tokenizer.encode(prompt, return_tensors="pt").to(device)
    outputs = gpt2_model.generate(
        inputs,
        max_length=200,                # Max length of generated text
        num_return_sequences=1,        # Number of outputs to return
        no_repeat_ngram_size=2,        # Avoid repeating phrases
        do_sample=True,                # Enable randomness
        temperature=0.7,               # Sampling temperature (0.7 = moderate creativity)
        top_k=50,                      # Consider top 50 words
        top_p=0.95,                    # Nucleus sampling
        early_stopping=True            # Stop early when possible
    )
    generated_text = gpt2_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# ✅ Function to classify news as Fake or Real using BERT
def detect_news(text):
    inputs = bert_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()
    confidence = torch.softmax(logits, dim=1)[0][predicted_class].item()
    label = "🟥 Fake News" if predicted_class == 0 else "🟩 Real News"
    return f"{label} (Confidence: {confidence:.2f})"

# ✅ Build Gradio User Interface
with gr.Blocks() as demo:
    gr.Markdown("## 📰 Fake News Generator & Detector (GPT-2 + BERT)")

    # ➕ Tab for generating fake news
    with gr.Tab("🛠 Generate Fake News"):
        with gr.Row():
            input_text = gr.Textbox(
                label="Enter a News Headline or Prompt",
                placeholder="e.g. A mysterious object was spotted in the sky...",
                lines=2
            )
        generate_btn = gr.Button("Generate")
        output_text = gr.Textbox(label="Generated News Article")
        generate_btn.click(generate_fake_news, inputs=input_text, outputs=output_text)

    # ➕ Tab for detecting real or fake news
    with gr.Tab("🔍 Detect Fake or Real"):
        with gr.Row():
            detect_input = gr.Textbox(
                label="Enter a News Article or Statement",
                placeholder="Paste a paragraph to detect if it's fake or real...",
                lines=5
            )
        detect_btn = gr.Button("Detect")
        detect_output = gr.Textbox(label="Detection Result")
        detect_btn.click(detect_news, inputs=detect_input, outputs=detect_output)

# ✅ Launch the Gradio app
demo.launch()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fe30d53f5b22ed1495.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


