Building a News Articles Summarizer

In [6]:
import requests
from newspaper import Article

# Set up custom headers to simulate a real browser, which can help bypass restrictions.
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0'
}

# URL of the article to fetch
article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

# Create a session to maintain parameters across requests
session = requests.Session()

try:
    # Send a GET request to the URL with custom headers to simulate a browser
    response = session.get(article_url, headers=headers, timeout=10)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Create an Article object and parse the content
        article = Article(article_url)
        article.download()
        article.parse()

        # Print the title and text of the article
        print(f"Title: {article.title}")
        print(f"Text: {article.text}")
    else:
        # If the request was unsuccessful, print a failure message
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    # Catch any exceptions and print an error message
    print(f"Error occurred while fetching article at {article_url}: {e}")


Title: Meta claims its new AI supercomputer will set records
Text: Ryan Daws is a senior editor at TechForge Media with over a decade of experience in crafting compelling narratives and making complex topics accessible. His articles and interviews with industry leaders have earned him recognition as a key influencer by organisations like Onalytica. Under his leadership, publications have been praised by analyst firms such as Forrester for their excellence and performance. Connect with him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)

Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (NLP) and computer vision models.

RSC is set to be fully built in mid-2022. Meta says that it will be the fastest in the world once complete and 

In [18]:
from langchain.schema import (
    HumanMessage
)

# We get the article data from the scraping part
article_title = article.title
article_text = article.text

# Prepare template for the prompt
template = """You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

====================
Title: {article_title}

{article_text}
====================

Write a summary of the previous article.
"""

# Format the template with the article's title and text
prompt = template.format(article_title=article.title, article_text=article.text)

# Create a HumanMessage object with the formatted prompt
messages = [HumanMessage(content=prompt)]

In [20]:
from langchain.chat_models import ChatOpenAI

# load the model
chat = ChatOpenAI(model_name="gpt-4o-mini",temperature=0)

# Generate summary based on the prompt
summary = chat(messages)

# Print the generated summary
print(summary.content)

Meta has announced the development of its AI Research SuperCluster (RSC), which it claims will be the fastest AI supercomputer in the world upon completion in mid-2022. Currently, researchers are already utilizing RSC for training large natural language processing (NLP) and computer vision models. The supercomputer is designed to handle models with trillions of parameters and is expected to be 20 times faster than Meta's existing V100-based clusters. RSC will significantly reduce training times for large models, completing tasks in three weeks instead of nine.

Meta envisions RSC facilitating advancements in AI systems, such as real-time voice translation for collaborative projects and contributing to the development of technologies for the metaverse. The supercomputer incorporates enhanced security and privacy controls, allowing Meta to use real-world data from its platforms for training, which is crucial for tasks like identifying harmful content. Meta asserts that RSC represents a s

In [24]:
# Prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.

Here's the article you need to summarize.

====================
Title: {article_title}

{article_text}
====================

Now, provide a summarized version of the article in a bulleted list format.
"""

# Format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# Generate summary
summary = chat([HumanMessage(content=prompt)])
print(summary.content)


- **Title**: Meta claims its new AI supercomputer will set records
- **Author**: Ryan Daws, senior editor at TechForge Media
- **Overview**: Meta (formerly Facebook) has introduced the AI Research SuperCluster (RSC), claiming it will be the fastest AI supercomputer in the world.
- **Current Status**: 
  - RSC is not yet fully complete but is already being used for training large natural language processing (NLP) and computer vision models.
  - Expected to be fully operational by mid-2022.
- **Performance Goals**:
  - Aims to train models with trillions of parameters.
  - Projected to be 20x faster than current V100-based clusters.
  - Estimated to be 9x faster with NVIDIA Collective Communication Library (NCCL) and 3x faster for large-scale NLP workflows.
  - Training time for models with tens of billions of parameters reduced from nine weeks to three weeks.
- **Applications**:
  - Intended to support real-time voice translations for multilingual collaboration in research and AR gaming

In [21]:
# Prepare template for prompt
template = """ You are an advanced AI assistant that summarizes online articles into bulleted lists in French.

Here's the article you need to summarize.

====================
Title: {article_title}

{article_text}
====================

Now, provide a summarized version of the article in a bulleted list format, in French.
"""

# Format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# Generate summary
summary = chat([HumanMessage(content=prompt)])
print(summary.content)

- **Titre** : Meta annonce que son nouvel superordinateur IA établira des records.
- **Superordinateur** : Appelé AI Research SuperCluster (RSC), il est en cours de construction et vise à être le plus rapide au monde.
- **Utilisation actuelle** : Les chercheurs de Meta l'utilisent déjà pour former des modèles de traitement du langage naturel (NLP) et de vision par ordinateur.
- **Achèvement prévu** : RSC devrait être entièrement construit d'ici mi-2022.
- **Capacités** : Destiné à entraîner des modèles avec des trillions de paramètres.
- **Applications envisagées** : Permettre des traductions vocales en temps réel pour des groupes parlant différentes langues, facilitant la collaboration sur des projets de recherche ou des jeux en réalité augmentée.
- **Performances** : RSC sera 20 fois plus rapide que les clusters V100 actuels de Meta, 9 fois plus rapide pour la bibliothèque de communication collective NVIDIA (NCCL) et 3 fois plus rapide pour les flux de travail NLP à grande échelle.
-