Locally saves webpages to your hard disk with images, css, js & links as is.
-
Updated
Dec 12, 2023 - Python
Locally saves webpages to your hard disk with images, css, js & links as is.
Heuristic based boilerplate removal tool
Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Unofficial REST API for ctengg.amu.ac.in
Python webscraping module for NCAA Basketball Stats
Python Script to download results of whole class/branch by providing attendance Excel file.
Easy way for HTML parsing and building XPath
Lightweight HTML/XML parser for quick and dirty web scraping.
This project allows you to convert your YouTube watch history HTML file from Google Takeout into a CSV file that can be used by the universalscrobbler.com to Scrobble manually in bulk.
A script to parse the saved Humble Bundle library HTML
Python Script to extract college names from UGC, India website.
A Work-In-Progress Discord bot based on the largely popular Touhou series by ZUN.
Pure python3 alternative to stdlib xml.etree with HTML support
A Python library for loading data from various formats into PostgreSQL databases.
Add a description, image, and links to the html-parser topic page so that developers can more easily learn about it.
To associate your repository with the html-parser topic, visit your repo's landing page and select "manage topics."