ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.
-
Updated
Jun 25, 2024 - Python
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.
Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
A publishing platform for modern newspapers.
source based news in short : Winner @MumbaiHackathon 2018
generate and deliver a daily newspaper to you or your remarkable tablet
This repository provides usage examples for the Python module Newspaper3k.
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Scrape article metadata from major media outlet's websites, including NYT, WaPo, WSJ. Built on top of the Newspaper Python Library (https://github.com/codelucas/newspaper).
A bot that sends daily The Hindu newspaper, Vision IAS, Next IAS & Insights IAS PDF download link.
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
A python project (with nlp integration) to denoise any news article and strip off any images, advertisement from it giving a basic and hassle free article. It provides a 'smart view' for web-view in mobile devices with heading, keywords and text. Powered with newspaper3k.
implement what this video says https://www.youtube.com/watch?v=BVizDqOfins
Add a description, image, and links to the newspaper topic page so that developers can more easily learn about it.
To associate your repository with the newspaper topic, visit your repo's landing page and select "manage topics."