Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
-
Updated
Jul 13, 2025 - Python
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
Free database schema discovery and comprehension tool
Web version of ytmdl. Allows downloading songs with metadata embedded from various sources like itunes, gaana, LastFM etc.
Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
Content ExtRactor and MINEr
Fast, cross-platform Node.js access to ExifTool
Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.
ExifLooter finds geolocation on all image urls and directories also integrates with OpenStreetMap
Android application for analyzing installed apps
PhotoStructure for Servers
A collection of tools for forensic analysis
Adult Media Manager is the ultimate media manager for your adult movies and videos. Organize your content for Kodi, Plex, and other media centers.
📷 EXIF metadata viewing tool
MetaData html scraper and parser for Node.js (supports Promises and callback style)
Digital forensic analysis tool that provides a user-friendly interface for investigating disk images.
A Laravel package to fetch Open Graph data of a website.
Fast and robust date extraction from web pages, with Python or on the command-line
🏷️ A JavaScript library for scraping/parsing metadata from a web page.
This package implements a complete SpyWare.
LazyOwn RedTeam/APT Framework is the first RedTeam Framework with an AI-powered C&C, featuring rootkits to conceal campaigns, undetectable malleable implants compatible with Windows/Linux/Mac OSX, and self-configuring backdoors. With its Web interface and powerful Console Client, it is the best combination for your RedTeam/APT campaigns.
Add a description, image, and links to the metadata-extraction topic page so that developers can more easily learn about it.
To associate your repository with the metadata-extraction topic, visit your repo's landing page and select "manage topics."