-
David Dennison SEO
- Las Vegas, NV
- https://searchriot.com
- in/davidldennison
- https://searchriot.com/
- @searchriot
- @Search_Riot
🕸️ Crawlers and Scrapers
Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
A standalone version of the readability lib
A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
Convert between HTML, Markdown, and plain text from the command line.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
✋ URL to JSON! Fetch webpage content into structured text using crawlers or AI at your command.
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Flexible Node.js AI-assisted crawler library
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
The simple, easy to use command line web crawler.
Wgit enables you to crawl and extract the data you want from the web
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
A set of reusable Java components that implement functionality common to any web crawler
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object fr…
🐍 Python module for converting complex JSON to HTML Table representation
Web scraper that can create an offline readable version of a website
A Chrome DevTools Protocol driver for web automation and scraping.
A simple yaml-based xpath crawler framework for easy tracking site updates. https://zhupeng.github.io/
A npm package for Client-side rendering approach to extract product data from ecommerce websites using undetectable puppeteer-cluster with pagination support
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on d…
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 👉
CrimeFlare is a useful tool for bypassing websites protected by CloudFlare WAF, with this tool you can easily see the real IP of websites that have been protected by CloudFlare. The resulting infor…
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.