ai-scraping

Star

Here are 25 public repositories matching this topic...

mendableai / firecrawl

Star

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

markdown crawler data scraper ai html-to-markdown web-crawler scraping webscraping rag llm ai-scraping

Updated Jun 13, 2025
TypeScript

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

markdown crawler ai html-to-markdown web-crawler scraping web-scraping rag automated-scraper scraping-python web-crawlers llm ai-scraping

Updated Jun 13, 2025
Python

D4Vinci / Scrapling

Sponsor

Star

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Updated May 31, 2025
Python

itsOwen / CyberScraper-2077

Sponsor

Star

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

scraper web-scraper openai webscraping gemini-api llm llm-scraper ai-scraping

Updated Jun 13, 2025
Python

raznem / parsera

Star

Lightweight library for scraping web-sites with LLMs

python opensource ai scraping data-extraction webscraping playwright llm ai-scraping

Updated Jun 2, 2025
Python

➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.

search markdown crawler data scraper ai html-to-markdown web-crawler scraping embeddings webscraping rag llm ai-scraping

Updated May 23, 2025
TypeScript

mendableai / firecrawl-app-examples

Star

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

markdown data ai examples html-to-markdown templates web-crawler scrapers rag llm ai-scraping

Updated Jun 2, 2025
Jupyter Notebook

ArchiveBox / abx-dl

Sponsor

Star

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

cli chrome downloader curl headless scraping crawling http-client youtube-dl wget cli-tool puppeteer internet-archiving playwright archivebox yt-dlp gallery-dl ai-scraping

Updated Dec 26, 2024
JavaScript

WeebDataHoarder / go-away

Star

[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

security mirror http-proxy ai-scraping

Updated Jun 9, 2025
Go

kaymen99 / ai-web-scraper

Star

AI web scraper built with Crawl4AI for extracting structured leads data from websites.

scraper web-scraper web-scraping ai-agents lead-generation data-scraper llms ai-scraping crawl4ai

Updated Feb 13, 2025
Python

spider-rs / web-crawling-guides

Star

How to guides on web-crawling or scraping

crawler scraper html-to-markdown web-scraping agents ai-agents ai-scraping llm-webcrawler clean-markdown fast-webcrawler

Updated Apr 26, 2025

spider-rs / spider-clients

Star

Python, Javascript, and Rust libraries for the Spider Cloud API.

crawler scraper ai spider html-to-markdown web-scraping ai-agents supabase ai-scraping llm-webcrawler

Updated Jun 8, 2025
Python

any4ai / AnyCrawl

Star

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

data html-to-markdown scraping webscraper crawl scrape serp rag aitools ai-scraping

Updated Jun 9, 2025
TypeScript

L1shed / Turbo

Star

Fastest and cheapest distributed residential proxy network.

iaas web-scraping forward-proxy payment-gateway collaborate passive-income depin distributed-network bandwidth-sharing ai-scraping

Updated Jun 13, 2025
Go

drisskhattabi6 / AI-Scraper

Star

AI Scraper : scrap and extract data from website in any format (CSV, JSON, HTML...) using Selenium or Crawl4ai, and using Ollama or Sambanova API, and using Streamlit for UI as chatbot

Updated May 22, 2025
Python

nathabonfim59 / md-fetch

Sponsor

Star

A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications

golang scraper htmltomarkdown ai-scraping

Updated Feb 2, 2025
Go

Chakszzz / NB-Scraper

Star

All Scrapers Resource Available Here! Give Us Stars🌟

scraper facebook-scraper scrape-websites ai-scraping nb-scraper nb-script

Updated Jun 14, 2025
TypeScript

GitRectify / scrapegraph-ai

Star

ScrapeGraphAI is a Python-based web-scraping framework that pairs large-language-model reasoning with a graph-style pipeline engine to turn websites (or local XML/HTML/JSON/Markdown files) into structured data with just a handful of lines of code.

markdown crawler ai html-to-markdown web-crawler scraping web-scraping rag automated-scraper scraping-python web-crawlers llm ai-scraping

Updated Jun 5, 2025
Python

luminati-io / llama-3-web-scraping

Star

Use LLaMA 3 and Python to extract structured data from websites like Amazon, leveraging LLM-powered parsing for resilient, AI-driven web scraping.

python web-scraping data-collection python-scraper llama-3 ai-scraping llm-scraping

Updated Apr 28, 2025

vonuyvicoo / crava

Star

AI-powered web scraper using Javascript/Typescript.

webscraping llm ai-scraping

Updated Jun 13, 2025
TypeScript

Improve this page

Add a description, image, and links to the ai-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-scraping topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-scraping

Here are 25 public repositories matching this topic...

mendableai / firecrawl

ScrapeGraphAI / Scrapegraph-ai

D4Vinci / Scrapling

itsOwen / CyberScraper-2077

raznem / parsera

devflowinc / firecrawl-simple

mendableai / firecrawl-app-examples

ArchiveBox / abx-dl

WeebDataHoarder / go-away

kaymen99 / ai-web-scraper

spider-rs / web-crawling-guides

spider-rs / spider-clients

any4ai / AnyCrawl

L1shed / Turbo

drisskhattabi6 / AI-Scraper

nathabonfim59 / md-fetch

Chakszzz / NB-Scraper

GitRectify / scrapegraph-ai

luminati-io / llama-3-web-scraping

vonuyvicoo / crava

Improve this page

Add this topic to your repo