Skip to content

v1.2.0 — installable package + multilingual chapter detection

Latest

Choose a tag to compare

@virgiliojr94 virgiliojr94 released this 17 Jun 22:15

book-to-skill v1.2.0 turns the project into an installable Python package and makes chapter detection genuinely multilingual.

Highlights

📦 Installable package + CLI

book-to-skill is now a real book_to_skill package: pip install it, run the book-to-skill console script or python -m book_to_skill, and pull only the extractors you need via extras (epub, pdf, docx, rtf, technical, all). The base install stays dependency-free with stdlib fallbacks, and python3 scripts/extract.py still works unchanged, so existing skill flows keep running.

🌍 Multilingual chapter detection

  • Markdown / AsciiDoc ATX headings (#, ==) detected when no numeric "Chapter N" is present.
  • setext / reStructuredText underline headings (Title over === / ---), guarded against thematic breaks, table borders, and YAML front matter.
  • French, German, Italian, Dutch chapter words (Chapitre, Kapitel, Capitolo, Hoofdstuk) and umlaut titles (Überblick).
  • Full-width Arabic digits in CJK headings (第1章), common in Japanese typesetting.
  • Multilingual table-of-contents detection (CN, JP, FR, DE, IT, NL).

🔎 Diagnosable extraction

Unexpected parser errors are now logged to stderr (extractor name + exception type) instead of vanishing, while the fallback chain still continues. Corrupt files and encoding issues are finally visible.

🔒 Security & CI

CodeQL, Bandit (HIGH gate), Zizmor workflow audit, and grouped Dependabot. Test matrix now spans Python 3.9–3.13.

Thanks

Community contributions from @Marcelluxx, @dex0shubham, @RandMelville, @addy790, @yukaina, @wuji-labs, and everyone filing multilingual edge cases. 💖 Sponsor the project

Full changelog: https://github.com/virgiliojr94/book-to-skill/blob/master/CHANGELOG.md