The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
-
Updated
May 28, 2024 - TypeScript
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, search and extract with a single API.
Turn any webpage into structured data using LLMs
A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Novels, Manga, etc.
Linkedin Automation Bot with every possible scraping! Valid for 2022 used by Linvo.io
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
Metadata scraper with support for oEmbed, Twitter Cards and Open Graph Protocol for Node.js ⚡
Nodejs library that provides high-level APIs for obtaining information on various entertainment media such as books, movies, comic books, anime, manga, and so on.
A simple browser/client-side web scraper.
estela, an elastic web scraping cluster 🕸
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Add a description, image, and links to the scraper topic page so that developers can more easily learn about it.
To associate your repository with the scraper topic, visit your repo's landing page and select "manage topics."