Release v0.23.0 · ywatanabe1989/newb

Added

Structured <key>_parsed siblings for the canonical questions
whose replies are CI-gateable. Free-text reply still ships
unchanged; the parsed form is additive.
- post_install_check_parsed: {install, import, cli} — each
  ok | fail | n/a | unknown.
- install_and_help_parsed (cli-tool template): {install, help}.
- prompt_injection_check_parsed: {found: bool|None, found_raw: yes|no|unknown}.
- CI gating becomes
  jq -e '.post_install_check_parsed.install == "ok"' newb.json,
  no fragile substring grepping.
- Off-script replies (agent didn't follow the prompted format)
  yield "unknown" instead of raising — itself a CI signal.
- New module: newb._parsers (parse_post_install_check,
  parse_install_and_help, parse_prompt_injection_check,
  attach_parsed_fields). 19 new tests.
newb_signature field at the top of every report — version,
tagline, PyPI URL, GitHub URL, "Part of SciTeX". Same signature
rendered as a footer in render_markdown so paste-into-README
reports carry their own provenance.

Internal

Extracted render_markdown into _render.py (line-budget
hygiene; _try.py was at the 512-line limit). Re-exported from
newb._try so existing imports keep working.
Extracted _parsers.py as a focused module rather than inlining
into _try.py — keeps the parser surface easy to extend when new
question keys land.

Not yet (future work)

The auditor noted that even with parsing, agents occasionally drift
from the prompted format (Install: vs INSTALL:, emoji injection,
extra prose on the verdict line). Three escalation rungs if drift
becomes a real problem:

Few-shot examples in the prompt templates (~30 min, ~95%
reliability).
Anthropic Tool Use for post_install_check and
prompt_injection_check only — JSON schema enforcement at the
SDK boundary, structurally impossible to drift (~half day).
Hybrid stays: free-text for human-readable questions
(what_for, problems_solved, …), structured for CI-gate
questions.

Shipping the parsers as the foundation; (1) and (2) defer until
real-world drift data justifies them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.23.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Internal

Not yet (future work)

Uh oh!