Automated tool for tracking and archiving GitHub Marketplace models data. The scraper runs every day and maintains a historical record of changes in the marketplace.
- Automated data collection from GitHub Marketplace
- JSON output with detailed model information
- Historical tracking of changes
- Cached requests to respect API limits
- Rich console output for local debugging
The script uses PEP 723 inline script dependencies, which works seamlessly with uv run
:
# Install uv if you haven't already
pip install uv
# Run the script directly - uv will handle dependencies
uv run script.py
The script includes this magic comment header:
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "click>=8.1.7",
# "rich>=13.7.0",
# "pydantic>=2.5.0",
# "requests>=2.31.0"
# ]
# ///
uv
will automatically create a temporary virtual environment with these dependencies installed. This process takes just a few milliseconds once the uv
cache has been populated.
If you prefer not to use uv
, you can install dependencies traditionally:
# Install dependencies
pip install -r requirements.txt
# Run scraper with default settings
python script.py
# Run with specific model family filter
python script.py -m DeepSeek
# Save output to JSON
python script.py -f json -o models.json
# Enable debug logging
python script.py -d
-o, --output
: Output JSON file path-m, --model-family
: Filter by model family-f, --format
: Output format (table/json)-d, --debug
: Enable debug logging--cache-dir
: Cache directory path (default: .cache)--cache-timeout
: Cache timeout in seconds
The scraped data is stored in models.json
and includes:
- Model name and ID
- Registry information
- Task type and model family
- Licensing information
- Token limits
- Supported languages and modalities
- And more
This repository uses GitHub Actions to automatically run the scraper once per day. The workflow:
- Runs the scraper
- Checks for changes in the data
- Commits and pushes updates if changes are found
This project was inspired by and builds upon the work of several excellent resources:
- Simon Willison - For his pioneering work on GitHub scraping tools and workflows, particularly the openai-models scraper
- OpenWebUI community, specifically @theonehong's models scraper - For insights into marketplace data structure
- Anthropic Claude 3.5 - For assistance in code development and optimization
MIT