web-crawling

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

lekhmanrus / real-shot-pdf

Star

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Updated Mar 1, 2024
TypeScript

miroshnikov / scrapyteer

Star

Web crawling & scraping framework for Node.js on top of headless Chrome browser

scraper spider web-crawler headless scraping crawling web-scraping scrapy scrape scraping-websites web-crawling scrapy-crawler crawling-framework crawer spider-framework crawling-sites crawling-tool web-scraping-nodejs

Updated Mar 3, 2024
TypeScript

SpeedyShot / capture

Star

An easy-to-use library for the SpeedyShot Capture service.

pdf screenshots capture pdf-generation web-crawling

Updated Jul 8, 2024
TypeScript

breck7 / measurementscrawlers

Sponsor

Star

Crawlers for extracting measurements from the web for Scroll datasets

scrapers web-crawling

Updated May 18, 2024
TypeScript

omkarcloud / botasaurus-starter

Sponsor

Star

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Updated Jul 8, 2024
TypeScript

apify / crawlee

Star

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Jul 10, 2024
TypeScript

Improve this page

Add a description, image, and links to the web-crawling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawling topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawling

Here are 11 public repositories matching this topic...

supergillis / crawler-ts

lewisakura / spiderboi

mstephen19 / apify-global-store

ayakashi-io / ayakashi

dstark5 / gnews-scraper

lekhmanrus / real-shot-pdf

miroshnikov / scrapyteer

SpeedyShot / capture

breck7 / measurementscrawlers

omkarcloud / botasaurus-starter

apify / crawlee

Improve this page

Add this topic to your repo