🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
-
Updated
Dec 2, 2020 - Java
🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
crawler-engine with HTTP, proxy, JS-Java Interoperability, MQ task consumption, dynamic crawler scripts execution. support deployment in distribution style.
mercator scheme/rate-limiting/scheduling part of whirlpool project; handles crawler priority and politeness
A high-performance distributed web crawling framework based on SpringBoot framework. It provides rich APIs to customize business and easily embedded your system.
Java website crawler - library for analyze and testing websites
A miniature Java Search Engine using the Rapid Automatic Keyword Extraction Framework ( RAKE ) and HashMaps
Small web crawler developer in Java and Spring Boot
A search engine implements the page rank, term frequency and inverse document frequency algorithms. The data is provided by the Web Crawler that uses DFS and BFS to crawl through all pages.
Add a description, image, and links to the crawler-engine topic page so that developers can more easily learn about it.
To associate your repository with the crawler-engine topic, visit your repo's landing page and select "manage topics."