Real Estate Web Scraper in Haskell
A simple concurrent real estate web scraper. This scraper is written in Haskell. It is a simple web scraper that uses the Haskell Conduit Downloader library to make HTTP requests and the Scalpel library to extract information from HTML pages. ATwo implementations of a cache and datalake, using STM and postgresql using Postgres Simple exist. The scraper is concurrent with each work started in a seperate thread.
Currently the scraper can only scrape the PropertyBook ZW website. The scraper can easily be extended to scrape other websites.
- Should work anywhere you can use cabal
- Haskell
- Cabal
- Postgres
- Postgres Simple
- Haskell Conduit Downloader
- Scalpel
- STM
- UnliftIO
- Still a work in progress
- You might need to take a look at the code to see how to use it in case you're early here.
$ cabal repl
λ> pbc <- newPropertyBookInMemoryCrawler
λ> import Crawler.Simple
λ>
λ> runCrawler pbc "https://www.propertybook.co.zw/"
λ>
λ> stopCrawler pbc
All help is welcome. If you have any suggestions, please open an issue or a pull request.
Contributors names and contact info
ex. Trevor Sibanda ex. @trevorsibanda
- 0.0.1
- Initial Release
This project is licensed under the GNU GPLv3 License