Skip to content
/ matilda Public

A news scraping bot for telegram written in python.

License

Notifications You must be signed in to change notification settings

xlanor/matilda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matilda

Matilda is a telegram bot written in Python 3 to scrape news articles, written in order to allow me to get a better understanding of Python. This bot is purely for educational purposes.

Matilda is currently still in the development stage. Currently, I only have time to work on Matilda on the weekends, so development for this bot might be a little slow.

Supported Sites

  • Straits Times
  • ChannelNewsAsia
  • TodayOnline (Beta)

Licensing

Matilda is licensed under the Affero General Public License Version 3.

Sample

A sample version of this bot is currently running on Telegram, under @matilda_jk_bot.

Sample Gif

Credits

  • Thanks to LFlare for giving me the idea, and letting me take a look at his source code when I was stuck.
  • Python-Telegram-Bot for making a wonderful wrapper, and having an excellent community who are willing to devote time to assist others.
  • Sumy for building a wonderful python-based text summarizer.
  • BeautifulSoup4 for an easy to use web scraper.
  • PhantomJS for scraping JS based sites.

Contact

You can open an issue here to contact me regarding bugs.

Commands

  • /cmd (full command list)
  • /aboutme (about Matilda)
  • /supported (supported sites)
  • /mode (Switches Matilda between Full and Truncated)
  • /new (Latest 5 articles from ST/today/cna)
  • /rand (randoms 5 articles from ST/today/cna)
  • /search (Searches for ST/today/cna articles)
  • /today (scrapes Today Articles)
  • /cna (scrapes CNA Articles)
  • /st (scrapes straits times article)
  • /cna_search (Searches for CNA Articles)
  • /cna_new (Latest five CNA Articles)
  • /st_search (Searches for ST Articles)
  • /st_new (Latest five ST Articles)
  • /st_rand (Randomly generates 5 articles from StraitsTimes)
  • /cna_rand (Randomly generates 5 articles from CNA)
  • /subscribe (Subscribes to Updates (Automatically subscribed by default))
  • /unsub (Unsubscribes from updates)

Admin Commands

  • /mega (Sends a message to all chats that the bot has previously been used in. To use, add your user id to tokens.py)

How does Matilda work?

If you have the article url, you can simply run /st or /cna

If not, you can use the search feature, to either search for specific keywords that appear in the article title, or to get the latest 5 articles.

The reason why only 5 articles are supported is because the sample version of this bot is not running on a very powerful server, and I do not wish to overload it.

From there, you can then use the inline buttons generated by the bot to read the article from the comfort of your telegram chat.

Usage

Install the following python libraries

  • python-telegram-bot
  • Beautiful Soup 4
  • Requests
  • Python String Utilities
  • dateutil
  • PyMySQL
  • Sumy
  • Selenium

Download PhantomJS and place it in the same directory. This is required for TodayOnline

Setup a MySQL Database. exampledb

Run the scripts found in the Matilda-tools folder. More information is avaliable there. This will enable you to grab new articles as they come out.

Update token.py with your bot's api token, mysql information, and the list of user ids for admin.

Start your bot with

python3 matilda.py

If you are running Matilda on linux, you may also want to use this command to ensure that Matilda keeps running after you exit the terminal.

sudo nohup python3 matilda.py > /home/matilda-live/error.log 2>&1 &

About

A news scraping bot for telegram written in python.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages