v0.2.3 - Add parallel `postprocess` (#173)
🌟 Summary
v0.2.3 makes the mkdocs Ultralytics plugin faster and more robust 🚀 by adding optional parallel HTML postprocessing and a smarter, thread-safe cache for GitHub author data.
📊 Key Changes
-
⚙️ Parallel
postprocess_siteexecution- New arguments:
use_processes: bool = Trueandworkers: int | None = None. - HTML files can now be processed in parallel using:
- Multiple processes (
ProcessPoolExecutor) or - Multiple threads (
ThreadPoolExecutor), depending onuse_processes.
- Multiple processes (
- Automatically picks an appropriate worker count based on CPU cores (or a user-specified
workersvalue).
- New arguments:
-
🧱 Shared worker state for process pools
- Introduces
_WORKER_STATEplus helper_set_worker_stateand_process_fileto avoid repeatedly sending large read-only data (like config and Git metadata) to each task. - Reduces overhead and improves performance when using process-based parallelism.
- Introduces
-
📈 Improved progress handling & logging
- Uses a single global
TQDMprogress bar for all workers. - Enables logging for the single-worker (sequential) path; disables per-task logging in parallel pools to stay safe and pickle-friendly.
- Clear console message indicating how many HTML files will be processed and with how many workers and which mode (thread or process).
- Uses a single global
-
🧵 Thread-safe, cached GitHub author lookups
- Adds a global, shared cache:
_AUTHOR_CACHE,_AUTHOR_CACHE_MTIME, and_CACHE_LOCKinplugin/utils.py.
get_github_username_from_email:- Wraps cache access and updates in a lock to avoid data races when running in parallel.
- Avoids duplicate GitHub REST API calls and redundant avatar URL resolution.
get_github_usernames_from_file:- Loads
mkdocs_github_authors.yamlonly once per process. - Tracks modification time so it can reload if the file changes.
- Only writes the YAML file back when the cache actually changed, reducing I/O contention.
- Loads
- Adds a global, shared cache:
-
🛡 More robust GitHub lookup behavior
- Safely handles:
- Empty email strings (logs a warning when verbose).
- GitHub noreply emails by deriving the username directly and resolving the avatar URL once.
- Ensures cache writes are consistent and guarded by a lock.
- Safely handles:
-
🧪 Minor structural optimizations
- Simplified markdown index (
md_index) creation inpostprocess_siteusing a dictionary comprehension. - Centralized common parameters into
task_kwargsfor cleaner worker submission logic.
- Simplified markdown index (
🎯 Purpose & Impact
-
🚀 Faster documentation builds
- Parallel processing of HTML files significantly speeds up mkdocs builds, especially for large sites with many pages.
- Reduced overhead for Git metadata and author resolution in process pools.
-
🤝 More stable parallel execution
- Thread-safe caching and careful locking prevent race conditions and YAML file corruption when running with multiple workers.
- Less risk of hitting GitHub rate limits due to repeated identical API calls.
-
📉 Lower I/O and API usage
- Only writes
mkdocs_github_authors.yamlwhen needed. - Caches avatar URLs and usernames, so repeated builds or pages with the same authors don’t re-query GitHub.
- Only writes
-
🔧 Flexible configuration for different environments
workerslets you tune performance for CI, local development, or resource-constrained machines.use_processeslets you choose between process-based isolation (often faster for CPU-heavy tasks) and threads (lighter-weight, easier debugging).
-
📚 Better user experience
- Clearer progress output during
postprocess_site. - More reliable author attribution and avatars in generated docs, with fewer intermittent failures under load.
- Clearer progress output during
In short: v0.2.3 focuses on performance, scalability, and robustness for documentation postprocessing, particularly when running mkdocs with parallel workers ⚡📚.
What's Changed
- Add parallel
postprocessby @glenn-jocher in #173
Full Changelog: v0.2.2...v0.2.3