# Text Summarization

## Types
There are two main types of summarization:
1. Extractive Summarization – Selects key sentences directly from the text.
2. Abstractive Summarization – Generates new sentences while keeping the original meaning.

## Methods

Method|Best For|Pros|Cons
---|---|---|---
Extractive (spaCy)|Quick sentence selection|Fast, lightweight, no training needed|Less flexible, no paraphrasing
Abstractive (Hugging Face BART/T5)|Human-like summaries|More natural, deep learning-based|Slower, requires a large model

## 1. Extractive summarization with spaCy

Extracts important sentences without modifying the wording.

**Pros:** Fast, doesn’t require training.\
**Cons:** Just selects sentences, no rewording.

In [1]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from heapq import nlargest


def extractive_summary(text, num_sentences=5):
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(text)

    # Calculate word frequencies
    word_frequencies = {}
    for word in doc:
        if word.text.lower() not in STOP_WORDS and word.text.isalpha():
            word_frequencies[word.text] = word_frequencies.get(word.text, 0) + 1

    max_frequencies = max(word_frequencies.values())

    # Normalize word frequencies
    for word in word_frequencies:
        word_frequencies[word] /= max_frequencies

    # Score sentences based on word frequencies
    sentence_scores = {}
    for sent in doc.sents:
        for word in sent:
            if word.text in word_frequencies:
                sentence_scores[sent] = (
                    sentence_scores.get(sent, 0) + word_frequencies[word.text]
                )

    # Get the top sentences
    summarized_sentences = nlargest(
        num_sentences, sentence_scores, key=sentence_scores.get
    )
    summary = " ".join([sent.text for sent in summarized_sentences])
    return summary

In [3]:
sample_long_text = """
    Artificial Intelligence (AI) has rapidly evolved over the past few decades, transforming industries, reshaping economies, and revolutionizing human interactions with technology. The journey of AI began in the mid-20th century when pioneers like Alan Turing and John McCarthy laid the theoretical foundations for machine intelligence. Early AI systems focused on rule-based approaches and expert systems, which, although powerful for specific tasks, lacked the adaptability of modern machine learning models.
    With the rise of deep learning in the 2010s, AI took a significant leap forward. Neural networks, inspired by the structure of the human brain, enabled machines to recognize speech, translate languages, and even generate realistic images. Companies like Google, OpenAI, and Tesla leveraged deep learning to create state-of-the-art AI applications. Self-driving cars, natural language processing (NLP), and recommendation algorithms became mainstream.
    Despite these advancements, AI still faces ethical and technical challenges. Bias in AI models, data privacy concerns, and the impact of automation on employment are widely debated topics. Researchers continue to develop responsible AI frameworks to ensure fairness, transparency, and accountability in AI-driven decision-making.
    Looking ahead, AI is expected to become even more integrated into daily life. Innovations in healthcare, education, and robotics promise to enhance human capabilities while raising important ethical considerations. As AI progresses, balancing innovation with ethical responsibility will be crucial for shaping a future where artificial intelligence benefits all of humanity.
"""
sample_long_text = sample_long_text.replace("\n", "")
sample_long_text = sample_long_text.replace("    ", "")

# Generate a summary of the sample text
summary = extractive_summary(sample_long_text, num_sentences=3)

# Print the summary
print(summary)

Researchers continue to develop responsible AI frameworks to ensure fairness, transparency, and accountability in AI-driven decision-making. Early AI systems focused on rule-based approaches and expert systems, which, although powerful for specific tasks, lacked the adaptability of modern machine learning models. The journey of AI began in the mid-20th century when pioneers like Alan Turing and John McCarthy laid the theoretical foundations for machine intelligence.


## 2. Abstractive summarization with Hugging Face Transformer

Uses pre-trained deep learning models (BART/T5) to generate a human-like summary.

**Pros:** More natural and human-like.\
**Cons:** Requires a pre-trained model (larger and slower).

In [6]:
from transformers import pipeline

# Load the summarization pipeline
summarizer = pipeline("summarization")

# Generate a summary using the pipeline
summary = summarizer(sample_long_text, max_length=100, min_length=20, do_sample=False)

# Print the summary
print(summary[0]["summary_text"])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


 The journey of AI began in the mid-20th century when pioneers like Alan Turing and John McCarthy laid the theoretical foundations for machine intelligence . Early AI systems focused on rule-based approaches and expert systems . With the rise of deep learning in the 2010s, AI took a significant leap forward .
