Advanced OCR system for digitizing Donn Draeger's martial arts research materials. Features dual-engine OCR (Tesseract+EasyOCR), comprehensive Japanese text processing with romanization/translation, intelligent layout analysis, and academic-quality HTML reconstruction. Specialized for mixed English/Japanese documents with embedded diagrams.
-
Updated
Oct 12, 2025 - Python