Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
-
Updated
Jul 17, 2024 - Python
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
A Minimal Yet Powerful Crawler for Extracting all The Internal/External/Fuzz-able Links from a website
A simple GIT URL parser.
A type to represent, query, and manipulate a Uniform Resource Identifier.
This is a website url scraper built using python.
Extract information from URLs inside shell scripts
Web scraping | Website cloner
Check if the urls contained in a markdown file are down or not.
A command line url parser, written in Python
Simple URL builder
WebBriefs is an intelligent webpage summarizer API that extracts and condenses content into concise, readable markdown format. Perfect for quickly getting the gist of any website
Crawl websites and extract meaningful information from HTML and site content
Bot to generate useful links to increase the ranking of products sold on Amazon
ImageSpace is a Python application that downloads images from web pages, filters out certain types of images, and stores the valid images in a SQLite database. It utilizes the FastAPI framework for providing an API endpoint to process web pages and extract images.
A python library which could parse URL to ip and country.
A real spider at work scraping a website.
Parsing and analyzing an Addiction club training history
Add a description, image, and links to the url-parser topic page so that developers can more easily learn about it.
To associate your repository with the url-parser topic, visit your repo's landing page and select "manage topics."