v0.23.0
Added
- Structured
<key>_parsedsiblings for the canonical questions
whose replies are CI-gateable. Free-text reply still ships
unchanged; the parsed form is additive.post_install_check_parsed:{install, import, cli}— each
ok | fail | n/a | unknown.install_and_help_parsed(cli-tool template):{install, help}.prompt_injection_check_parsed:{found: bool|None, found_raw: yes|no|unknown}.- CI gating becomes
jq -e '.post_install_check_parsed.install == "ok"' newb.json,
no fragile substring grepping. - Off-script replies (agent didn't follow the prompted format)
yield"unknown"instead of raising — itself a CI signal. - New module:
newb._parsers(parse_post_install_check,
parse_install_and_help,parse_prompt_injection_check,
attach_parsed_fields). 19 new tests.
newb_signaturefield at the top of every report — version,
tagline, PyPI URL, GitHub URL, "Part of SciTeX". Same signature
rendered as a footer inrender_markdownso paste-into-README
reports carry their own provenance.
Internal
- Extracted
render_markdowninto_render.py(line-budget
hygiene;_try.pywas at the 512-line limit). Re-exported from
newb._tryso existing imports keep working. - Extracted
_parsers.pyas a focused module rather than inlining
into_try.py— keeps the parser surface easy to extend when new
question keys land.
Not yet (future work)
The auditor noted that even with parsing, agents occasionally drift
from the prompted format (Install: vs INSTALL:, emoji injection,
extra prose on the verdict line). Three escalation rungs if drift
becomes a real problem:
- Few-shot examples in the prompt templates (~30 min, ~95%
reliability). - Anthropic Tool Use for
post_install_checkand
prompt_injection_checkonly — JSON schema enforcement at the
SDK boundary, structurally impossible to drift (~half day). - Hybrid stays: free-text for human-readable questions
(what_for,problems_solved, …), structured for CI-gate
questions.
Shipping the parsers as the foundation; (1) and (2) defer until
real-world drift data justifies them.