Skip to content

A single purpose desktop app that lets the user input a keyword and scrapes the web for relevant news articles. The user also has an option to get Summaries of the news articles from GPT 3.5 turbo API

License

Notifications You must be signed in to change notification settings

uzairaslam19/NewsBreeze

Repository files navigation

News Breeze: A Scraping and Summarizing app

Video Demo

Linkedin

App Screenshot App Screenshot

Table of Contents

Dall-e

Description

The News Scraping and Summarization App is a simple Python application that allows users to scrape news articles based on a keyword and optionally summarize them. The app provides a user-friendly interface built using dearpygui and utilizes Hydra for configuration. The app utilizes openai api to call Chatgpt or GPT3.5 turbo model to summarise the collected news. Once the User provides a keyword, and selects the number of news articles required (Max 5). The App utilizes a Scrapy Spider to collect the relevant news Articles using NewsData.io API.

Dall-e2

Built With

  • Scrapy
  • Hydra
  • Spacy
  • OpenAI
  • DearpyGUI

Project Structure

  • Clean_News: Folder that has the clean module for preprocessing the news using Spacy.
  • conf: Hydra Configurations are stored here in a config.yaml file.
  • data: Raw news, Cleaned news and Summarised news are stored in this folder in json files.
  • images: Project images such as logo and robot images are stored here.
  • Scrapy_Project: This folder contains the scrapy code, along with the spider to crawl news from Newsdata.io API
  • summarise: This folder contains the summarise module with OpenAI API code to generate summarise for the news.
  • output: Hydra outputs are stored here.
  • project.py: Main project code.
  • test_project.py: Unit tests for the code.

Dall-e

Features

  • Scraping news articles based on a user-provided keyword.
  • Specifying the number of news articles to scrape.
  • Summarizing scraped news articles using GPT 3.5 Turbo from OpenAI.
  • Real-time feedback on the scraping progress.
  • Customizable configuration using Hydra.
  • Logging information such as Errors, Warnings.

Dall-e

Getting Started

Follow these steps to run the project in your local

Prerequisites

Before running the application, make sure you have the following installed:

  • Python (3.7 or higher)
  • Pip (Python package manager)
  • Pipenv

Packages Requirements

Since the project relies on pipenv package manager, There is a requirements.txt in the root folder, you can utilize the following command to install all necessary packages you can do the following

$ cd project
$ pipenv install -r requirements.txt 

or you can try to do this

# make sure to have pipenv installed
$ pip install pipenv

# set up venv and install dependencies with pipenv
$ pipenv sync

# run the application
$ pipenv run python3 project.py

Testing and Development Dependencies

To set up a development environment for this project with pipenv:

make dev

To perform unittests with pytest:

make tested

Or, if you don't have build-essential installed.

# install development dependencies
pipenv install --dev

# perform unittest
pipenv run pytest

Dall-e

Usage

To Get started, simply run the command python3 project.py and this will launch the DearpyGui app, once the app is launched, provide a keyword or multiple keywords, select the total number of news articles you want,( for now i have set max number at 5). Then simply click "Get News" button. This will initiate the scrapy spider, and after scraping the relevant articles, dump the data into a json file. The app then takes the data from the json file, extracts the Title, Author and Published date and displays it on the GUI. Now you have the option of getting the summary of the news. With how busy our lives are, no one has the time to read through thousands of words or article for a single topic. Rather why not have GPT to summarise the news in a few lines for us. This is exactly what the APP does. Once you check the summarise box and click on the "Summarise" Button. The cleaner module will take over, and using Spacy module it will preprocess the data, and pass it to the summariser module which basically calls OPENAI's GPT 3.5 API to summarise the content of the news articles in a few lines. And Viola in a few seconds you have a short quick summary of the necessary news articles.

LICENSE

This project is licensed under the MIT License. Please review the license file for more information.

Acknowledgements

Thank you to David Malan and his entire team for helping to make the harvard CS50P course accessible to everyone who wants to learn, and teach it in such an astounding way.

About

A single purpose desktop app that lets the user input a keyword and scrapes the web for relevant news articles. The user also has an option to get Summaries of the news articles from GPT 3.5 turbo API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages