Skip to content

benfoxall/scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recent News

This pulls the HN front page to hacker-news.html and uses git log/show to access a history of changes.

See git scraping & Flat Data for more info about the approach.

Updating the data

export TARGET="hacker-news.html"

curl https://news.ycombinator.com > $TARGET
git add $TARGET
git commit -m ":robot: scraped to $TARGET"

This is run automatically by .github/workflows/scrape.yml

Extracting file history

git log --pretty=format:"%H %at" -- "$TARGET" | while read commit timestr
do
    git show "$commit:$TARGET" > tmp_${timestr}_${commit}.html
done