web-scraping

Star

Here are 2,817 public repositories matching this topic...

palewire / reuters-jobs

Sponsor

Star

A bot that posts job openings at Reuters News

python bot twitter-bot news jobs journalism web-scraping mastodon-bot

Updated Nov 16, 2024
Python

D4Vinci / Scrapling

Sponsor

Star

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

Updated Nov 16, 2024
Python

lorae / roundup

Star

Web scraper which aggregates pre-print academic economics papers from 20+ sources; presents titles, abstracts, authors and hyperlinks on an online dashboard. Auto-updates daily.

selenium economics microeconomics requests web-scraping beautifulsoup macroeconomics beautifulsoup4 html-scraping github-actions streamlit streamlit-dashboard streamlit-webapp api-scraping

Updated Nov 16, 2024
Python

sbmagar13 / sharesansar_datascrape

Star

Sharesansar Nepal NEPSE daily share price data scraping with Python. Scrapes all daily floor sheet from sharesansar site.

python web-scraping nepal nepse sharesansar nepse-data nepal-share-market

Updated Nov 16, 2024
Python

mrzzy / providence

Star

Personal Finance Data Pipeline & Dashboard

automation sql dashboard azure superset pandas data-visualization data-engineering web-scraping dbt data-pipeline prefect duckdb

Updated Nov 16, 2024
Python

Thiraput01 / Courspora-notifier

Star

A discord bot to notify the new courses from Courspora

bot docker discord-bot http-requests web-scraping discord-js selenium-python google-kubernetes-engine gcp-project

Updated Nov 16, 2024
Python

palewire / fed-dot-plot-scraper

Sponsor

Star

Extracting the "dot plot" economic projections posted online by the Federal Open Market Committee

python scraper news journalism data-journalism web-scraping federal-reserve economic-data macroeconomics monetary-policy fomc

Updated Nov 16, 2024
Python

RaresCode / RaresTestHub

Star

Test scraped job details from companies websites against peviitor.ro

python test-cases automation data-validation selenium assertions pytest requests web-scraping allure-report html-report bs4 api-testing automation-testing pytest-xdist parallel-testing github-actions allure-pytest peviitor-automation-testing

Updated Nov 16, 2024
Python

seleniumbase / SeleniumBase

Star

📊 Blazing fast Python framework for web crawling, scraping, testing, and reporting. Supports pytest. Stealth abilities: UC Mode and CDP Mode.

Updated Nov 16, 2024
Python

flairNLP / fundus

Star

A very simple news crawler with a funny name

python nlp rss sitemap crawler scraper corpus text-extraction web-scraping news-crawler commoncrawl web-corpus news-scraping cc-news

Updated Nov 15, 2024
Python

adbar / trafilatura

Star

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Updated Nov 15, 2024
Python

steel-dev / steel-selenium-starter

Star

Starter project for using Steel with Python SDK and Selenium.

python selenium web-scraping browser-automation

Updated Nov 15, 2024
Python

PhilaController / gun-violence-dashboard-data

Star

Python toolkit for preprocessing data for the City Controller's Gun Violence Dashboard

philadelphia python3 web-scraping python-toolkit gun-violence preprocessing-data

Updated Nov 15, 2024
Python

scrapy / scrapy

Star

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python

Updated Nov 15, 2024
Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright

Updated Nov 15, 2024
Python

34j / cached-historical-data-fetcher

Star

Python utility for fetching any historical data using caching. Suitable for news, posts, weather, etc.

fetch python http scraper cache realtime scraping update pandas aiohttp web-scraping asyncio lz4 historical-data hacktoberfest tqdm pagenation joblib

Updated Nov 15, 2024
Python

tinyfish-io / agentql

Star

AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing as UI changes and work across similar sites. Users can define structured data output, making AgentQL versatile for developers and data scientists.

python automation scraping web-scraping web-scrapping playwright web-scraping-python

Updated Nov 15, 2024
Python

Siddhi-Naik18 / News-Scrapper

Star

This project uses requests and BeautifulSoup to scrape articles from Google News in categories like Sports, Entertainment, Health, Technology, Business, and Law. Selenium is used to extract detailed content for sentiment analysis, categorizing articles as Positive, Negative, or Neutral with TextBlob. For Entertainment news, Llama is used to generat

python news sentiment web-scraping beautifulsoup google-news textblob streamlit

Updated Nov 15, 2024
Python

pim97 / scrappey-wrapper-python

Star

An API wrapper for Scrappey.com written in Python (cloudflare, datadome bypass & solver)

captcha shape web-scraping data-extraction akamai captcha-solver incapsula queue-it scraping-framework datadome scraping-tool cloudflare-bypass web-scraping-solution scraping-library cloudflare-anti-bot scraping-service web-data-extration anti-bot-api perimetex

Updated Nov 15, 2024
Python

jaebradley / basketball_reference_web_scraper

Star

NBA Stats API via Basketball Reference

python nba web-scraper web-scraping basketball-reference

Updated Nov 15, 2024
Python

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-scraping

Here are 2,817 public repositories matching this topic...

palewire / reuters-jobs

D4Vinci / Scrapling

lorae / roundup

sbmagar13 / sharesansar_datascrape

mrzzy / providence

Thiraput01 / Courspora-notifier

palewire / fed-dot-plot-scraper

RaresCode / RaresTestHub

seleniumbase / SeleniumBase

flairNLP / fundus

adbar / trafilatura

steel-dev / steel-selenium-starter

PhilaController / gun-violence-dashboard-data

scrapy / scrapy

apify / crawlee-python

34j / cached-historical-data-fetcher

tinyfish-io / agentql

Siddhi-Naik18 / News-Scrapper

pim97 / scrappey-wrapper-python

jaebradley / basketball_reference_web_scraper

Improve this page

Add this topic to your repo