# Summarize Google News

## Introduction
Summarizes news in easy to read format for the stories present on Google news aggregator.

In [14]:
#import packages
import requests
import os
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
import json

In [24]:
# Load environment variables. 
# Here we're loading the OpenAPI Key, instead of hardcoding it in the code.
# This requires keeping the key in .env file. 

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')
MODEL = "gpt-4o-mini"    # The OpenAI mode to use
GN = "https://news.google.com"

# Check the key
if not api_key:
    print("No API key was found, please check that .env file exists.")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start with sk-proj-; please check you're using the right key")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them")
else:
    print("API key found, and looks good so far!")

API key found, and looks good so far!


In [25]:
# Create object of OpenAI
openai = OpenAI()

## Step 1: Find Relevant Story Links

In [4]:
# A class to represent a Webpage
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [27]:
link_system_prompt = "You are provided with a list of links found on Google news home page \
You are able to decide which of the links would be pointing to a news story so that they could be used for summarizing today's headlines, \
Also provide the type of the news story such as political, entertainment, business, sports etc.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "political", "url": "https://full.url/goes/here/about"},
        {"type": "business": "url": "https://another.full.url/careers"}
    ]
}
"""

In [28]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the Google news website {website.url} - "
    user_prompt += "please decide which of these point to actual news story, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links etc.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [30]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

## Step 2: Generate a summary from all the links

In [44]:
def get_all_news_content(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    #print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [45]:
system_prompt = "You are an assistant that analyzes the contents of several news stories linked from google news website \
and creates a short summary of today's headlines and a news brief for someone who is in a hurry and wants cursory look. Respond in markdown.\
Include details of news type such as political, business etc, publication date etc, if you have the information."

In [47]:
def get_news_user_prompt(url):
    user_prompt = f"You are looking at Google News website\n"
    user_prompt += f"Here are the contents of its landing page and other linked pages of news stories; use this information to build a summary of news stories in markdown.\n"
    user_prompt += get_all_news_content(url)
    user_prompt = user_prompt[:10_000] # Truncate if more than 10,000 characters
    return user_prompt

In [48]:
def create_news_summary(url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_news_user_prompt(url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [49]:
create_news_summary(GN)

# Today's News Summary (January 26, 2025)

## Headlines Overview:
### **Political**
- **Republic Day 2025 Celebrations**: PM Modi marked the occasion as a "memorable morning" at Kartavya Path, showcasing India's military strength and cultural heritage with a parade featuring the Pralay missile and significant tableaux. (The Economic Times, 1 hour ago)
- **BJP’s Response to Terrorism**: The BJP applauded the extradition of Tahawwur Rana, the mastermind behind the 26/11 Mumbai attacks, as a testament to the Modi government's strong will against terrorism. (NDTV, 20 hours ago)
- **Kejriwal Critiques BJP**: Delhi CM Arvind Kejriwal alleged that the BJP waived debts of select individuals, calling the upcoming Delhi elections a contest between different ideologies. (Hindustan Times, 5 hours ago)

### **International**
- **Middle East Conflicts**: Amid a fragile ceasefire, Hamas released four Israeli hostages in exchange for 200 Palestinian prisoners, while Gazan officials reported many people stranded at an Israeli barrier. (The Indian Express, 1 hour ago)

### **Technology & Innovation**
- **First Train to Kashmir**: The Vande Bharat Express successfully completed its first trial run to Kashmir, marking a significant milestone after 126 years. (Hindu, 13 hours ago)

### **Business**
- **Budget 2025 Expectations**: As the Budget announcement approaches, expectations are on rise for income tax relief and enhanced spending on infrastructure, specifically for railways and highways. (Mint, 40 minutes ago)

## News Brief:
- **India celebrates Republic Day** with a heartfelt display of military and cultural pride, while political tensions rise with Kejriwal’s allegations against the BJP and the extradition of a key terrorist figure. On the international front, Middle Eastern hostages are being exchanged amidst ongoing conflict, and technological advancements are celebrated with a new train line to Kashmir. Economic forecasts point towards a hopeful Budget 2025 that could bring relief to many citizens.

### Publications:
- **The Times of India**
- **NDTV**
- **The Hindu**
- **Hindustan Times**
- **Mint**
  
### Publication Dates:
- Information gathered from articles published **within the past few hours to 20 hours ago**, primarily on **January 26, 2025**.