davidldennison

🎯

Focusing

David Dennison davidldennison

🎯

Focusing

🎯SEO Mastermind | Content Marketing Wiz | Noob Developer🚀Combining SEO expertise, dev tools, and innovation to thrive at the crossroads of marketing and tech!

279 followers · 4.5k following

Sponsoring

Achievements

Stars

🕸️ Crawlers and Scrapers

87 repositories

serpapi / lego-ai-parser

Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

Python 232 15 Updated Jun 10, 2024

tech-engine / goscrapy

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

Go 95 2 Updated Mar 13, 2025

mozilla / readability

A standalone version of the readability lib

JavaScript 9,534 633 Updated Mar 3, 2025

danburzo / percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

JavaScript 4,386 167 Updated Jan 1, 2025

danburzo / trimd

Convert between HTML, Markdown, and plain text from the command line.

JavaScript 14 Updated Oct 29, 2024

ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI

Python 18,602 1,574 Updated Mar 13, 2025

mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 30,921 2,628 Updated Mar 13, 2025

spider-rs / spider

A web crawler and scraper for Rust

Rust 1,559 128 Updated Mar 13, 2025

RealAlexandreAI / sticky-hand

✋ URL to JSON! Fetch webpage content into structured text using crawlers or AI at your command.

Go 5 Updated Mar 7, 2025

Gerapy / Gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Python 3,412 643 Updated Oct 29, 2024

coder-hxl / x-crawl

Flexible Node.js AI-assisted crawler library

TypeScript 1,668 104 Updated Mar 11, 2025

brendonboshell / supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

JavaScript 380 61 Updated Dec 30, 2022

rivermont / spidy

The simple, easy to use command line web crawler.

Python 346 69 Updated Aug 8, 2024

michaeltelford / wgit

Wgit enables you to crawl and extract the data you want from the web

Ruby 14 3 Updated Oct 30, 2024

internetarchive / Zeno

State-of-the-art web crawler 🔱

HTML 125 26 Updated Mar 13, 2025

omkarcloud / botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

TypeScript 23 8 Updated Feb 23, 2025

crawler-commons / crawler-commons

A set of reusable Java components that implement functionality common to any web crawler

Java 243 79 Updated Dec 16, 2024

platonai / PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.

Kotlin 815 122 Updated Mar 13, 2025

googleapis / python-documentai-toolbox

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object fr…

Python 39 17 Updated Mar 5, 2025

softvar / json2html

🐍 Python module for converting complex JSON to HTML Table representation

Python 277 86 Updated Jun 27, 2024

cornelk / goscrape

Web scraper that can create an offline readable version of a website

Go 203 41 Updated Mar 6, 2025

suntong / html2md

HTML to Markdown converter

Go 250 19 Updated Feb 28, 2025

go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.

Go 5,750 369 Updated Dec 7, 2024

ZhuPeng / trackupdates

A simple yaml-based xpath crawler framework for easy tracking site updates. https://zhupeng.github.io/

Python 22 6 Updated Mar 1, 2024

shinevue / ecommerce-scrapper-extension

A npm package for Client-side rendering approach to extract product data from ecommerce websites using undetectable puppeteer-cluster with pagination support

JavaScript 8 Updated Aug 29, 2024

shinevue / Web-Scraping-using-Selenium

Jupyter Notebook 14 Updated Aug 21, 2024

rebrowser / rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on d…

JavaScript 628 38 Updated Dec 10, 2024

my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 👉

Python 3,245 576 Updated Feb 19, 2025

zidansec / CloudPeler

CrimeFlare is a useful tool for bypassing websites protected by CloudFlare WAF, with this tool you can easily see the real IP of websites that have been protected by CloudFlare. The resulting infor…

PHP 1,437 182 Updated Sep 1, 2023

rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 3,939 348 Updated Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David Dennison davidldennison

Sponsoring

Achievements

Achievements

Block or report davidldennison

🕸️ Crawlers and Scrapers

serpapi / lego-ai-parser

tech-engine / goscrapy

mozilla / readability

danburzo / percollate

danburzo / trimd

ScrapeGraphAI / Scrapegraph-ai

mendableai / firecrawl

spider-rs / spider

RealAlexandreAI / sticky-hand

Gerapy / Gerapy

coder-hxl / x-crawl

brendonboshell / supercrawler

rivermont / spidy

michaeltelford / wgit

internetarchive / Zeno

omkarcloud / botasaurus-starter

crawler-commons / crawler-commons

platonai / PulsarRPA

googleapis / python-documentai-toolbox

softvar / json2html

cornelk / goscrape

suntong / html2md

go-rod / rod

ZhuPeng / trackupdates

shinevue / ecommerce-scrapper-extension

shinevue / Web-Scraping-using-Selenium

rebrowser / rebrowser-patches

my8100 / scrapydweb

zidansec / CloudPeler

rom1504 / img2dataset