Skip to content

danhively/scrape

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recent News

This pulls the HN front page to hacker-news.html and uses git log/show to access a history of changes.

See git scraping & Flat Data for more info about the approach.

Updating the data

export TARGET="hacker-news.html"

curl https://news.ycombinator.com > $TARGET
git add $TARGET
git commit -m ":robot: scraped to $TARGET"

This is run automatically by .github/workflows/scrape.yml

Extracting file history

git log --pretty=format:"%H %at" -- "$TARGET" | while read commit timestr
do
    git show "$commit:$TARGET" > tmp_${timestr}_${commit}.html
done

About

Git Scraping Hacker News

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 75.4%
  • JavaScript 21.5%
  • CSS 3.1%