Skip to content

tomg404/Zeit-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zeit-Scraper

Description

This project downloads every new article from zeit.de in xml, scrapes it and writes the data in a csv table. My inspiration was a talk from David Kriesel on 33c3. This project runs on a Raspberry Pi Zero via a scheduled cronjob.

Installation

  1. Install the requirements from requirements.txt
pip install -r requirements.txt
  1. OPTIONAL: Edit the config.ini file to use PushNotifier. For more info see pushnotifier.de
  2. Execute the run.py file. (run.py -e to enable PushNotifier)
  3. Have fun with your data!!!

Screenshot

alt text

Output format

author genre ressort sub_ressort edited ...
Max Mustermann Kommentar Sport Fussball Yes ...

Sample charts

These charts were made with matplotlib. Source codes in visualization. pie chart bar chart bar chart

Future updates

  • Visualization of the scraped data
    • on a webpage
    • with chart.js

About

Little Data Mining Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages