In [2]:
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

In [3]:
# Step 1: Define the URL of the article you want to summarize
# You can change this URL to any news article you like.
article_url = "https://apnews.com/article/trump-promises-greenland-canada-gaza-headlines-30c646fefc30bf028577b1b65f955eb4"

In [4]:
# Step 2: Fetch the HTML content from the URL
try:
    response = requests.get(article_url)
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)
    html_content = response.text
except requests.exceptions.RequestException as e:
    print(f"Error fetching the URL: {e}")
    exit()

In [5]:
# Step 3: Parse the HTML and extract the main text
# We use BeautifulSoup to find all the paragraph (<p>) tags and join them together.
# This is a general approach and might need adjustment for different website layouts.
soup = BeautifulSoup(html_content, 'html.parser')
paragraphs = soup.find_all('p')
long_text = ' '.join([p.get_text() for p in paragraphs])

In [6]:
# Check if any text was extracted
if not long_text.strip():
    print("Could not find any paragraph text on the page. Exiting.")
    exit()

In [7]:
# Step 4: Create and use the summarization pipeline
try:
    # Initialize the summarization model
    summarizer = pipeline("summarization")

    # Generate the summary
    # Added truncation=True to handle articles longer than the model's limit.
    summary = summarizer(long_text, max_length=250, min_length=40, do_sample=False, truncation=True)

    # Step 5: Print the final summary
    print("Original Article URL:", article_url)
    print("\n------------------\n")
    print("Generated Summary:")
    print(summary[0]['summary_text'])

except Exception as e:
    print(f"An error occurred during summarization: {e}")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Original Article URL: https://apnews.com/article/trump-promises-greenland-canada-gaza-headlines-30c646fefc30bf028577b1b65f955eb4

------------------

Generated Summary:
 President Donald Trump loves to toss out startling ideas aimed at dropping jaws, commandeering headlines and bolstering his political brand . Never in modern times has a president offered so many off-the-cuff statements with such a potential for wide, even global, impact . His sometimes implausible notions may become reality, or — through repetition — no longer sound so outlandish .
