Skip to content

trevorsibanda/hs-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Title

Real Estate Web Scraper in Haskell

Description

A simple concurrent real estate web scraper. This scraper is written in Haskell. It is a simple web scraper that uses the Haskell Conduit Downloader library to make HTTP requests and the Scalpel library to extract information from HTML pages. ATwo implementations of a cache and datalake, using STM and postgresql using Postgres Simple exist. The scraper is concurrent with each work started in a seperate thread.

Currently the scraper can only scrape the PropertyBook ZW website. The scraper can easily be extended to scrape other websites.

Getting Started

Dependencies

Executing program

  • Still a work in progress
  • You might need to take a look at the code to see how to use it in case you're early here.
$ cabal repl
λ> pbc <- newPropertyBookInMemoryCrawler
λ> import Crawler.Simple
λ>
λ> runCrawler pbc "https://www.propertybook.co.zw/"
λ>
λ> stopCrawler pbc

Help

All help is welcome. If you have any suggestions, please open an issue or a pull request.

Authors

Contributors names and contact info

ex. Trevor Sibanda ex. @trevorsibanda

Version History

  • 0.0.1
    • Initial Release

License

This project is licensed under the GNU GPLv3 License

About

A Haskell scraper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages