Skip to content
View davidldennison's full-sized avatar
🎯
Focusing
🎯
Focusing

Sponsoring

@workeffortwaste

Block or report davidldennison

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🕸️ Crawlers and Scrapers

87 repositories

Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

Python 232 15 Updated Jun 10, 2024

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

Go 95 2 Updated Mar 13, 2025

A standalone version of the readability lib

JavaScript 9,534 633 Updated Mar 3, 2025

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

JavaScript 4,386 167 Updated Jan 1, 2025

Convert between HTML, Markdown, and plain text from the command line.

JavaScript 14 Updated Oct 29, 2024

Python scraper based on AI

Python 18,602 1,574 Updated Mar 13, 2025

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 30,921 2,628 Updated Mar 13, 2025

A web crawler and scraper for Rust

Rust 1,559 128 Updated Mar 13, 2025

✋ URL to JSON! Fetch webpage content into structured text using crawlers or AI at your command.

Go 5 Updated Mar 7, 2025

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Python 3,412 643 Updated Oct 29, 2024

Flexible Node.js AI-assisted crawler library

TypeScript 1,668 104 Updated Mar 11, 2025

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

JavaScript 380 61 Updated Dec 30, 2022

The simple, easy to use command line web crawler.

Python 346 69 Updated Aug 8, 2024

Wgit enables you to crawl and extract the data you want from the web

Ruby 14 3 Updated Oct 30, 2024

State-of-the-art web crawler 🔱

HTML 125 26 Updated Mar 13, 2025

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

TypeScript 23 8 Updated Feb 23, 2025

A set of reusable Java components that implement functionality common to any web crawler

Java 243 79 Updated Dec 16, 2024

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.

Kotlin 815 122 Updated Mar 13, 2025

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object fr…

Python 39 17 Updated Mar 5, 2025

🐍 Python module for converting complex JSON to HTML Table representation

Python 277 86 Updated Jun 27, 2024

Web scraper that can create an offline readable version of a website

Go 203 41 Updated Mar 6, 2025

HTML to Markdown converter

Go 250 19 Updated Feb 28, 2025

A Chrome DevTools Protocol driver for web automation and scraping.

Go 5,750 369 Updated Dec 7, 2024

A simple yaml-based xpath crawler framework for easy tracking site updates. https://zhupeng.github.io/

Python 22 6 Updated Mar 1, 2024

A npm package for Client-side rendering approach to extract product data from ecommerce websites using undetectable puppeteer-cluster with pagination support

JavaScript 8 Updated Aug 29, 2024
Jupyter Notebook 14 Updated Aug 21, 2024

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on d…

JavaScript 628 38 Updated Dec 10, 2024

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 👉

Python 3,245 576 Updated Feb 19, 2025

CrimeFlare is a useful tool for bypassing websites protected by CloudFlare WAF, with this tool you can easily see the real IP of websites that have been protected by CloudFlare. The resulting infor…

PHP 1,437 182 Updated Sep 1, 2023

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 3,939 348 Updated Aug 7, 2024