Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
-
Updated
Sep 4, 2019 - Java
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
Simple crawl framework for a focused web-crawler in Java.
Easily crawl news portals or blog sites using Storm Crawler.
Domain Discovery for the Sparkler Crawl Environment
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Add a description, image, and links to the crawling-framework topic page so that developers can more easily learn about it.
To associate your repository with the crawling-framework topic, visit your repo's landing page and select "manage topics."