web-crawler

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

java search-engine crawler flexible web-crawler crawlers filesystem-crawler collector-http collector-fs

Updated Oct 13, 2024
Java

ScaleUnlimited / flink-crawler

Star

Continuous scalable web crawler built on top of Flink and crawler-commons

crawler spider web-crawler crawling flink web-crawling

Updated Apr 8, 2019
Java

commoncrawl / nutch

Star

Common Crawl fork of Apache Nutch

java big-data hadoop web-crawler commoncrawl

Updated Oct 6, 2024
Java

rzo1 / crawler4j

Sponsor

Star

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

java crawler spider web-crawler crawler4j web-spider

Updated Jun 18, 2024
Java

rajagopal28 / simple-web-crawler

Star

Java based web-crawler program which makes use of pool based multi-threading, simple UI with Swing and jsoup to nested web crawling

Updated Oct 10, 2022
Java

apache / nutch-webapp

Star

Apache Nutch is an extensible and scalable web crawler

java hadoop web-crawler nutch crawling apache

Updated Jul 7, 2023
Java

jiup / gospy

Star

🕷 a flexible web crawler framework

framework spider web-crawler

Updated Nov 7, 2017
Java

xujiahaha / price-monitoring-system

Star

microservices spring-boot rabbitmq web-crawler

Updated Aug 22, 2017
Java

vladimanaev / web-spider

Star

web crawler allowing full page render crawl using HtmlUnit

crawler web-crawler web-scraper web-scraping web-crawling htmlunit web-spider webpage-scraper

Updated Dec 15, 2017
Java

Frog-Front / web-crawler

Star

A Library for web crawling websites harvesting URLs of embedded links and images

java bot spider web-crawler webcrawler

Updated Sep 1, 2022
Java

pkgodara / WebCrawler

Star

Java Web Crawler Program to get all links or images download from websites and use Google or Bing search options .

java web-crawler google-search

Updated Apr 17, 2017
Java

kenych / java-web-crawler

Star

This is pretty basic example of web page crawling in java and is not fully production ready crawler and is done for test purposes only

web-crawler concurrency java8

Updated Sep 1, 2022
Java

giahuy2201 / manga-dl

Star

🚀 Get your favorite manga from Kissmanga in 📖 EPUB/PDF format

java docker cli pdf downloader web-crawler epub command-line-tool kissmanga kissmanga-downloader kissmanga-scraper

Updated Dec 21, 2021
Java

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawler

Here are 87 public repositories matching this topic...

ssssssss-team / spider-flow

apache / nutch

apache / incubator-stormcrawler

VIDA-NYU / ache

USCDataScience / sparkler

commoncrawl / news-crawl

crawler-commons / crawler-commons

Norconex / crawlers

ScaleUnlimited / flink-crawler

commoncrawl / nutch

rzo1 / crawler4j

rajagopal28 / simple-web-crawler

apache / nutch-webapp

jiup / gospy

xujiahaha / price-monitoring-system

vladimanaev / web-spider

Frog-Front / web-crawler

pkgodara / WebCrawler

kenych / java-web-crawler

giahuy2201 / manga-dl

Improve this page

Add this topic to your repo