Skip to content
#

web-scraping

Here are 2,817 public repositories matching this topic...

SeleniumBase

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Updated Nov 15, 2024
  • Python

AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing as UI changes and work across similar sites. Users can define structured data output, making AgentQL versatile for developers and data scientists.

  • Updated Nov 15, 2024
  • Python

This project uses requests and BeautifulSoup to scrape articles from Google News in categories like Sports, Entertainment, Health, Technology, Business, and Law. Selenium is used to extract detailed content for sentiment analysis, categorizing articles as Positive, Negative, or Neutral with TextBlob. For Entertainment news, Llama is used to generat

  • Updated Nov 15, 2024
  • Python

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."

Learn more