feat: opt-out paragraph-reflow auto-format via markdown-it-py#21
Merged
Conversation
Adopts markdown-it-py as the wrapper's first runtime dependency and enables default-on paragraph reflow before every gh-post body forward. The reflow collapses soft-break newlines inside top-level CommonMark paragraphs to single spaces and preserves every other region byte-identical to input: code fences (both backtick and tilde, including fence-char content-collision cases), lists, blockquotes, tables, headings, HTML blocks (including HTML comments), link reference definitions, and hard breaks (trailing two-space or unescaped backslash). Opt-out per invocation via --no-format on every body-bearing subcommand (issue/pr create/edit/comment, comment-edit, reply-inline) and per process via GH_POST_NO_FORMAT env. The flag wins over env when both are set; invalid env values exit non-zero before any body is read. New gh_post.markdown adapter exposes iter_prose_paragraphs, iter_non_prose_spans (complement-of-paragraphs construction so link refdefs and any future block type are handled automatically), and reflow_paragraphs. detect_hardwrap is refactored to consume the adapter, removing the in-file fence and structural regexes. validate_body gains a keyword-only format_mode kwarg (default False) so existing direct callers and tests keep their current behavior. Test suite: 232 passing (188 original + 44 new covering adapter, reflow byte-identity for every preserved block type, opt-out flag and env wiring, env value validation, and CommonMark behavior-change regression cases for lazy continuation, overlong ordered-list marker, hash-without-space, pipe-without-delimiter). Closes #12
Two corrections to the auto-format path surfaced by post-implement
review:
- reflow_paragraphs("para\n\n") returned "para\n", silently dropping
trailing blank lines. The previous algorithm used "\n".join on the
tail non-prose span, which collapsed an empty trailing line into
the body's existing terminator. Rewrites the function to walk
source lines one at a time, emitting each line plus its terminator
uniformly, then strips the final newline only when the input
itself had none.
- GH_POST_NO_FORMAT validation in post.py and comment-edit.py ran
after read_body, so an invalid env value with --body-stdin blocked
on stdin before the error could surface, and with --body-file
read the file before reporting the env error. Moves the env
resolution to the start of cmd_post / cmd_comment_edit so the
invalid-env contract ("exit non-zero before any body is read")
actually holds. reply-inline already resolved env before reading
stdin and is unchanged.
Adds six regression tests: four for trailing-blank-line byte-
identity (single trailing blank, double trailing blank, no trailing
newline, blank-only body), two for env-validation-before-stdin-block
across post and comment-edit.
There was a problem hiding this comment.
Pull request overview
This PR introduces a Markdown-aware auto-formatting step (default-on) that reflows soft line breaks inside top-level CommonMark paragraphs before bodies are sent to gh, with opt-out via --no-format and GH_POST_NO_FORMAT. It also refactors the existing hardwrap detector to use markdown-it-py tokenization rather than bespoke regex heuristics, improving CommonMark conformance while preserving byte-identical behavior outside reflowed paragraphs.
Changes:
- Add
markdown-it-pyas the first runtime dependency and introduce a Markdown adapter (iter_prose_paragraphs,iter_non_prose_spans,reflow_paragraphs) to drive both validation and formatting. - Implement shared opt-out resolution + formatting application (
--no-formatflag precedence overGH_POST_NO_FORMAT, env validation before any body read). - Update all body-bearing entry points to apply reflow by default and adjust validator behavior via
validate_body(..., format_mode=...)tripwire semantics; expand tests and document behavior in README.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Adds markdown-it-py>=3.0,<4 runtime dependency. |
gh_post/markdown.py |
New adapter over markdown-it-py providing top-level paragraph spans and paragraph reflow with byte-preservation outside paragraphs. |
gh_post/_format.py |
New shared opt-out/env parsing and formatting application helpers. |
gh_post/validators.py |
Refactors detect_hardwrap to consume adapter paragraphs; adds format_mode tripwire behavior to validate_body. |
gh_post/cli.py |
Registers --no-format for issue/pr subcommands so it is consumed by the wrapper parser. |
gh_post/subcommands/post.py |
Resolves opt-out before reading bodies, applies formatting, and validates in rejector vs tripwire mode depending on opt-out. |
gh_post/subcommands/comment_edit.py |
Adds --no-format, resolves opt-out before body read, applies formatting before validation. |
gh_post/subcommands/reply_inline.py |
Adds --no-format; applies formatting per JSONL entry prior to validation to keep “validated == sent”. |
gh_post/__init__.py |
Updates flat re-exports: removes regex exports and exposes new markdown adapter API. |
README.md |
Documents auto-format scope, preservation guarantees, and opt-out precedence/behavior. |
test_gh_post.py |
Adds extensive coverage for adapter behavior, preservation invariants, opt-out wiring, and tripwire semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adopt
markdown-it-pyas the wrapper's first runtime dependency and add a default-on paragraph-reflow auto-format step before every body is forwarded togh. The reflow collapses soft-break newlines inside top-level CommonMark paragraphs to single spaces and preserves every other region of the body byte-identical to the input, so column-wrapped and over-fragmented bodies render cleanly on GitHub without the author having to hand-tune line breaks per post. Opt-out is supported per-invocation via--no-formaton every body-bearing subcommand and per-process viaGH_POST_NO_FORMAT.Closes #12
Changes
pyproject.toml: addmarkdown-it-py>=3.0,<4to[project].dependencies.gh_post/markdown.py(new): adapter overmarkdown-it-pyexposingiter_prose_paragraphs,iter_non_prose_spans, andreflow_paragraphs.iter_non_prose_spansis implemented as the complement of yielded top-level paragraph spans, so link reference definitions, duplicate references, and any future block type are handled without enumerating the parser's token vocabulary.gh_post/_format.py(new): opt-out resolution helper (resolve_no_format,apply_format,emit_env_error). The--no-formatflag wins overGH_POST_NO_FORMAT; invalid env values raise before any body read.gh_post/validators.py: drop the bespoke_FENCE_OPEN_RE/_FENCE_CLOSE_RE/_STRUCTURAL_REregexes and refactordetect_hardwrapto consume the adapter.validate_bodygains a keyword-onlyformat_modeparameter that defaults toFalse(rejector mode) so existing direct callers and tests are unaffected; format-on entry paths passformat_mode=Trueto switch the hardwrap detector to a silent tripwire.gh_post/__init__.py: drop the regex re-exports, add the new adapter symbols to the flat-package surface.gh_post/cli.py: register--no-formaton the sharedissue|prargparse parser beforeparse_known_args, so the flag is consumed by the wrapper and never leaks togh.gh_post/subcommands/post.py,comment_edit.py,reply_inline.py: register--no-formaton each local parser, resolve the opt-out before any body read, and applyreflow_paragraphsbetween body read andvalidate_body.reply-inlineapplies reflow per JSONL line inside_parse_reply_entriesso the body that gets POSTed matches the body that the validator approved.test_gh_post.py: 50 new tests; three existing rejection tests acquired--no-formatso their assertion semantics survive the default change, with new sibling tests covering the default-on path.README.md: new "Auto-format" section documenting scope, preservation guarantees, opt-out flag, and env precedence.Impact
issue create|edit|comment,pr create|edit|comment,comment-edit,reply-inline) reflows paragraphs by default. The body that GitHub stores is therefore the reflowed body, not the raw input, unless--no-format/ env opt-out is set.detect_hardwrapunder--no-formatis now CommonMark-conformant in three classes where the pre-feature regex detector diverged: lazy continuations after lists/blockquotes are now part of the nested paragraph (no longer over-eagerly flagged as top-level prose); link reference definitions are non-prose (no longer treated as paragraph content); over-broad regex matches like#not heading,1234567890. item, and| not a tableare correctly treated as prose (no longer wrongly exempted from the hardwrap lanes). All three classes are covered by dedicated regression tests.markdown-it-py3.x pulls inmdurlonly at the Python 3.11+ floor (notyping_extensions, no optionallinkify-it-pyor plugins); both are pure-Python wheels, so theuv tool install .posture is unchanged in shape.Test plan
uv run pytest: 238 passed (188 original + 50 new).uv run ruff check: clean.uv run ruff format --check: clean.uv run pyright: 0 errors.Block-preservation tests assert
reflow_paragraphs(input) == inputbyte-identical for each preserved shape: backtick fence, tilde fence containing literal backticks (the mandatory fence-char collision regression), bullet list both tight and loose, ordered list, blockquote with nested multi-line paragraph (Strategy A: nested paragraphs are not reflowed), ATX and setext headings, HTML comment block, HTML<div>block with nested markdown, GFM table, horizontal rule, indented code block, paragraph ending with trailing-two-space hard break, paragraph ending with unescaped backslash hard break, link reference definitions including duplicates.Positive reflow tests cover: multi-line paragraph collapse, two consecutive paragraphs reflowed independently with the blank line preserved between them, paragraph immediately followed by a fenced code block with no blank line, paragraph immediately preceded by a list, and the over-fragmentation shape that motivated the feature (a paragraph with line breaks on prepositions and intra-clause commas).
Opt-out wiring tests cover:
--no-formaton every subcommand sends the raw body byte-for-byte;GH_POST_NO_FORMAT=truewithout the flag has the same effect;GH_POST_NO_FORMAT=falsekeeps format-on; a garbage value exits non-zero before any body read (including against--body-stdin, which would otherwise block); the flag wins over a conflicting env value;--no-formatis consumed by the wrapper and does not appear in the argv forwarded togh.Tripwire tests cover:
validate_body(clean_body, format_mode=True)is silent;validate_body(hardwrap_body, format_mode=True)emits a stderr diagnostic but returns[](no rejection).The CommonMark-conformance regression tests for the three behavior-change classes named in Impact are individually pinned.
Post-implement
codex reviewsurfaced two P2 correctness findings, both fixed in the second commit:reflow_paragraphs("para\n\n")was dropping the trailing blank line (algorithm rewritten to walk source lines uniformly), andGH_POST_NO_FORMATvalidation ran after the body read so an invalid env value with--body-stdinblocked on stdin (validation moved to the start of each entry point). Both fixes have dedicated regression tests.Discovery contract status
The plan's
Inconclusive / Deferred itemssection lists two deferrals, both honored verbatim in this PR:mdit-py-plugins-based extras (footnote, deflist, anchors, frontmatter) — deferred. GitHub bodies do not use these extensions in the wrapper's validation path; adding them would grow the configuration surface without current benefit.No new Inconclusive items surfaced during implementation. The two pre-PR review findings were implementation-level bugs against the documented contract, not premise gaps.
Notes
The plan that produced this PR (with the pre-implementation review trail and derivations for the three core invariants — format idempotency, GFM HTML rendering equivalence, tripwire correctness) is preserved locally; the public contract surface is the README's Auto-format section plus the regression tests committed here.