web-data-extraction

Star

Here are 25 public repositories matching this topic...

MohamedHmini / iww

Star

AI based web-wrapper for web-content-extraction

python data-mining library ai information-extraction web-scraping web-mining web-content-extractor web-data-extraction

Updated Feb 6, 2023
Python

neurons-me / this.url

Star

The this.url class is designed to fetch and parse URL data, returning an object with structured information that can then be used for machine learning algorithms in a database or other storage.

web-scraping url-parsing metadata-extraction web-data-extraction neurons-me-ecosystem structured-url-data machine-learning-urls data-driven-web-analysis intelligent-link-processing ai-ready-url-processing

Updated Feb 1, 2025
JavaScript

lightfeed / lightfeed-extract

Star

Use LLMs to robustly extract structured data from HTML and markdown

Updated May 14, 2025
TypeScript

luminati-io / java-web-scraping

Star

Quick guide with code example how to use Java for web scraping

java maven scraping-websites web-data-extraction

Updated Dec 18, 2024

DemonMartin / scrappey-wrapper

Star

An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)

web-scraping data-extraction web-data-extraction scraping-framework scraping-tool cloudflare-bypass web-scraping-solution cloudflare-solver api-scraping scraping-solution website-data-extraction scraping-library cloudflare-anti-bot scraping-service data-scraping-tool website-scraping-tool turnstile-solver

Updated Jan 10, 2024
JavaScript

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

jjonescz / awe

Sponsor

Star

AI-based web extractor

deep-learning information-extraction web-scraping web-data-extraction structured-web-data

Updated Feb 25, 2023
Python

Boomslet / Web_Crawler

Star

Open-source web crawler

python url html open-source website opensource links web-crawler urls free data-extraction webcrawler web-crawling web-data-extraction urllib web-crawler-python

Updated Jul 21, 2018
Python

wbsg-uni-mannheim / WDCFramework

Star

Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.

schema-org json-ld microdata web-data-extraction

Updated Dec 13, 2022
Java

kaizenplatform / FacebookInsightsConnector

Star

The Tableau Web Data Connector for Facebook Insights API

facebook tableau facebook-insights web-data-extraction

Updated Jun 26, 2017
JavaScript

lekhmanrus / real-shot-pdf

Star

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Updated Mar 1, 2024
TypeScript

oxpath / oxpath

Star

OXPath from Oxford

scraper web ajax web-data-extraction

Updated May 20, 2022
Java

lightfeed / lightfeed

Star

Lightfeed SDK to search and filter web data

Updated May 12, 2025
Python

wbsg-uni-mannheim / schemaorg-tables

Star

This repository contains the code and data download links to reproduce the building process of the 2021 Schema.org Table Corpus.

schema-org web-data-extraction web-tables

Updated May 12, 2021
Python

hoxhaeris / get_muitiple

Star

Get and process multiple resources from web, using asyncio (aiohttp) to fetch the data and multiprocessing/multithreading for processing it.

python3 web-scraping asyncio web-data-extraction

Updated Mar 4, 2021
Python

ranajahanzaib / wdx

Star

A web data extraction library written in golang.

scraper mongodb nextjs web-data-extraction go-scraper

Updated Apr 16, 2025
Go

wbsg-uni-mannheim / wdc-page

Star

This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl

web-data-extraction