feat: OWASP WSTG methodology alignment, TUI live status & thinking blocks#328
feat: OWASP WSTG methodology alignment, TUI live status & thinking blocks#3280xhis wants to merge 41 commits intousestrix:mainfrom
Conversation
Greptile SummaryThis PR delivers three categories of changes: (1) OWASP WSTG methodology alignment across all scan-mode prompts and the root coordinator, with structured XML tags and phased testing instructions; (2) TUI live-status fixes — thinking blocks now render from tracer metadata, span sanitization prevents Rich crashes, and granular LLM lifecycle status messages are piped to the status bar; and (3) a handful of code-quality fixes (GLM-5 closing-tag normalization, compressed message format for inter-agent messages, Key concerns found during review:
Confidence Score: 3/5
Important Files Changed
|
strix/interface/tui.py
Outdated
| if getattr(msg_renderable, "plain", True): | ||
| renderables.append(msg_renderable) |
There was a problem hiding this comment.
The getattr(msg_renderable, "plain", True) check appears unnecessary since AgentMessageRenderer.render_simple() always returns a Text object (which doesn't have a plain attribute). This will always default to True, making the check redundant.
| if getattr(msg_renderable, "plain", True): | |
| renderables.append(msg_renderable) | |
| msg_renderable = AgentMessageRenderer.render_simple(content) | |
| renderables.append(msg_renderable) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1692-1693
Comment:
The `getattr(msg_renderable, "plain", True)` check appears unnecessary since `AgentMessageRenderer.render_simple()` always returns a `Text` object (which doesn't have a `plain` attribute). This will always default to `True`, making the check redundant.
```suggestion
msg_renderable = AgentMessageRenderer.render_simple(content)
renderables.append(msg_renderable)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR updates Strix’s prompting and scan-mode “skills” to follow OWASP WSTG-aligned phases/domains, and improves the TUI’s real-time UX by adding agent “system message” status updates and persisting/rendering LLM thinking blocks via chat message metadata.
Changes:
- Align root-agent coordination and scan modes (quick/standard/deep) with OWASP WSTG categories/phases, including an “attacker perspective verification” wrap-up step.
- Add live agent status “system messages” during key runtime stages (sandbox setup, LLM wait/stream, tool execution) and surface them in the TUI.
- Persist LLM
thinking_blocksvia tracer chat message metadata and render them even when the assistant message content is empty/tool-only.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| strix/tools/web_search/web_search_actions.py | Reformats the web-search system prompt into structured sections for consistent security-focused answers. |
| strix/telemetry/tracer.py | Adds agent system_message support and a dedicated updater for live UI status text. |
| strix/skills/scan_modes/standard.md | Reworks standard mode into WSTG-mapped phases and adds attacker-perspective verification. |
| strix/skills/scan_modes/quick.md | Reworks quick mode into WSTG-mapped phases with explicit constraints and validation guidance. |
| strix/skills/scan_modes/deep.md | Reworks deep mode into WSTG-mapped phases with chaining and attacker-perspective verification. |
| strix/skills/coordination/root_agent.md | Updates delegation strategy to enforce WSTG-domain naming/scoping for subagents. |
| strix/llm/llm.py | Emits tracer system messages for “waiting” vs “generating” during streaming lifecycle. |
| strix/llm/dedupe.py | Reformats dedupe system prompt into structured sections and clarifies output rules. |
| strix/interface/tui.py | Displays agent system_message in the running status area and renders thinking blocks from chat metadata. |
| strix/agents/base_agent.py | Adds event-loop yield points after UI updates and attaches thinking_blocks to tracer chat metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
strix/telemetry/tracer.py
Outdated
| if error_message: | ||
| self.agents[agent_id]["error_message"] = error_message | ||
| if system_message: |
There was a problem hiding this comment.
update_agent_status() only sets system_message when it is truthy (if system_message:), which makes it impossible to clear a previously-set system message via this API (e.g., by passing an empty string). Consider checking system_message is not None (and similarly for error_message if desired) so callers can explicitly clear the field when appropriate.
| if error_message: | |
| self.agents[agent_id]["error_message"] = error_message | |
| if system_message: | |
| if error_message is not None: | |
| self.agents[agent_id]["error_message"] = error_message | |
| if system_message is not None: |
| 2. Assess overall security posture | ||
| 3. Compile executive summary with prioritized recommendations | ||
| 4. Invoke finish tool with final report | ||
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" |
There was a problem hiding this comment.
Line has trailing whitespace at the end, which will be caught by the trailing-whitespace pre-commit hook and fail CI. Please remove the extra space after the closing quote.
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" | |
| 3. **Attacker Perspective Verification**: Pause and explicitly consider: "If I were a real-world attacker, where else would I look? What edge cases, forgotten endpoints, or chained exploits have been overlooked?" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| thinking_blocks = getattr(final_response, "thinking_blocks", None) | ||
| self.state.add_message("assistant", final_response.content, thinking_blocks=thinking_blocks) | ||
| if tracer: |
There was a problem hiding this comment.
thinking_blocks are now stored directly on AgentState.messages (via add_message(..., thinking_blocks=...)). Those message dicts are later forwarded to the LLM provider as-is in LLM._prepare_messages()/_build_completion_args(), which risks breaking provider requests because chat message objects typically only support keys like role and content (unknown keys may be rejected). Consider keeping thinking_blocks out of AgentState.messages (store separately), or sanitize/strip non-provider fields (e.g., drop thinking_blocks) before calling acompletion() and before passing messages into MemoryCompressor.
| if "thinking_blocks" in metadata and metadata["thinking_blocks"]: | ||
| for block in metadata["thinking_blocks"]: | ||
| thought = block.get("thinking", "") | ||
| if thought: | ||
| text = Text() | ||
| text.append("🧠 ") | ||
| text.append("Thinking", style="bold #a855f7") | ||
| text.append("\n ") | ||
| indented_thought = "\n ".join(thought.split("\n")) | ||
| text.append(indented_thought, style="italic dim") | ||
| renderables.append(Static(text, classes="tool-call thinking-tool completed")) | ||
|
|
There was a problem hiding this comment.
The thinking-block UI rendering here duplicates the existing ThinkRenderer implementation (strix/interface/tool_components/thinking_renderer.py) and hard-codes the CSS class string. To avoid divergence (styling/formatting changes in one place but not the other), consider reusing the renderer/helper that already formats "🧠 Thinking" blocks, or centralizing this formatting in a shared function.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
strix/skills/scan_modes/quick.md
Outdated
| - Extensive fuzzing—use targeted payloads only | ||
| </constraints> | ||
|
|
||
| <instructions> |
There was a problem hiding this comment.
The instructions tag is opened twice without closing the first one. Line 6 opens an instructions tag, and then line 50 opens another instructions tag before the first one is closed. This creates improperly nested XML tags. The constraints section (lines 41-48) should either be inside the first instructions block, or the first instructions block should be closed before the constraints section starts.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
strix/agents/state.py
Outdated
| @@ -47,8 +47,8 @@ def add_message( | |||
| self, role: str, content: Any, thinking_blocks: list[dict[str, Any]] | None = None | |||
| ) -> None: | |||
| message = {"role": role, "content": content} | |||
| if thinking_blocks: | |||
| message["thinking_blocks"] = thinking_blocks | |||
| # We do not store thinking_blocks in AgentState.messages to prevent API schema errors | |||
| # when passing these messages back to the LLM provider. They are retained in Tracer metadata. | |||
| self.messages.append(message) | |||
There was a problem hiding this comment.
thinking_blocks is now unused in AgentState.add_message, but Ruff has ARG (unused arguments) enabled in this repo, so this will likely fail lint/CI. Since the parameter is intentionally kept for API compatibility, add an explicit suppression (e.g., # noqa: ARG002 on the def add_message line) or otherwise reference the argument in a no-op way to satisfy the linter without storing it in self.messages.
| if role == "user": | ||
| return UserMessageRenderer.render_simple(content) |
There was a problem hiding this comment.
Empty user content bypasses None guard
Before this change the function started with:
if not content:
return NoneThat check ran before the role branch, so user messages with empty content returned None safely.
Now the user branch fires first and immediately calls UserMessageRenderer.render_simple(content) without verifying that content is non-empty. If a user-role message arrives with content == "" (e.g. a synthetic message injected by process_tool_invocations before its content is set, or any future code path that appends an empty user turn), render_simple is called with an empty string and likely returns a blank widget entry in the chat log instead of None.
The assistant branch keeps the guard (if not content and not renderables: return None), so the asymmetry is inconsistent. A minimal fix:
| if role == "user": | |
| return UserMessageRenderer.render_simple(content) | |
| if role == "user": | |
| if not content: | |
| return None | |
| return UserMessageRenderer.render_simple(content) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/tui.py
Line: 1689-1690
Comment:
**Empty user `content` bypasses `None` guard**
Before this change the function started with:
```python
if not content:
return None
```
That check ran before the `role` branch, so user messages with empty content returned `None` safely.
Now the user branch fires *first* and immediately calls `UserMessageRenderer.render_simple(content)` without verifying that `content` is non-empty. If a user-role message arrives with `content == ""` (e.g. a synthetic message injected by `process_tool_invocations` before its content is set, or any future code path that appends an empty user turn), `render_simple` is called with an empty string and likely returns a blank widget entry in the chat log instead of `None`.
The assistant branch keeps the guard (`if not content and not renderables: return None`), so the asymmetry is inconsistent. A minimal fix:
```suggestion
if role == "user":
if not content:
return None
return UserMessageRenderer.render_simple(content)
```
How can I resolve this? If you propose a fix, please make it concise.
strix/agents/base_agent.py
Outdated
| thinking_blocks = getattr(final_response, "thinking_blocks", None) | ||
| self.state.add_message("assistant", final_response.content, thinking_blocks=thinking_blocks) | ||
| self.state.add_message("assistant", final_response.content) |
There was a problem hiding this comment.
Thinking blocks stripped from conversation history — may break multi-turn extended thinking
thinking_blocks are no longer stored in self.state.messages; they only live in the tracer for UI display. This means the conversation history passed to _prepare_messages → LiteLLM no longer carries them.
For Anthropic Claude models with extended thinking enabled, the API requires that when an assistant turn contained thinking blocks, those blocks must be included in the content list of that turn in subsequent API calls (as {"type": "thinking", "thinking": "..."} objects). Omitting them from the replay history causes a validation error on the next turn.
Although litellm.drop_params = True would silently drop a top-level thinking_blocks key from a message dict, the correct representation is to embed them inside the content list — which requires explicit handling in _prepare_messages. This PR removes the only place where thinking blocks were associated with the message record, making it impossible for _prepare_messages to ever reconstruct the proper multi-turn format.
If extended thinking is actively used (self._supports_reasoning() returns True), this will trigger Anthropic API errors on conversations longer than one turn. Consider storing thinking blocks in state.messages alongside content, and having _prepare_messages merge them into the content list when building messages for Anthropic models.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/agents/base_agent.py
Line: 393-394
Comment:
**Thinking blocks stripped from conversation history — may break multi-turn extended thinking**
`thinking_blocks` are no longer stored in `self.state.messages`; they only live in the tracer for UI display. This means the conversation history passed to `_prepare_messages` → LiteLLM no longer carries them.
For Anthropic Claude models with extended thinking enabled, the API requires that when an assistant turn contained thinking blocks, those blocks must be included in the content list of that turn in subsequent API calls (as `{"type": "thinking", "thinking": "..."}` objects). Omitting them from the replay history causes a validation error on the *next* turn.
Although `litellm.drop_params = True` would silently drop a top-level `thinking_blocks` key from a message dict, the correct representation is to embed them *inside* the `content` list — which requires explicit handling in `_prepare_messages`. This PR removes the only place where thinking blocks were associated with the message record, making it impossible for `_prepare_messages` to ever reconstruct the proper multi-turn format.
If extended thinking is actively used (`self._supports_reasoning()` returns `True`), this will trigger Anthropic API errors on conversations longer than one turn. Consider storing thinking blocks in `state.messages` alongside `content`, and having `_prepare_messages` merge them into the `content` list when building messages for Anthropic models.
How can I resolve this? If you propose a fix, please make it concise.|
Tested and working so good |
|
Thank you! |
# Conflicts: # strix/telemetry/tracer.py
52468cc to
e7e03e0
Compare
…hering - Restructures Phase 1 into explicit subagent delegation rules - Root agent no longer runs recon/crawling/code analysis directly - Adds black-box, white-box, and combined mode subagent templates - Renames Phase 2 section to reflect dependency on gathered context
- Extract .renderable from ThinkRenderer.render() in tui.py for consistency - Remove dead thinking_blocks parameter from add_message() in state.py - Pass tracer into _stream() instead of importing in hot path in llm.py - Add overflow indicator (+N more) when truncating tool displays in base_agent.py
…eation - Add SKILLS ARE MANDATORY rule to Critical Rules section - Update BLACK-BOX examples to include skills= in every agent creation - Update WHITE-BOX examples to include skills= in every agent creation - Add Skill Assignment Triggers section with 15 scenario→skill mappings - Add warning that agents without skills lack vulnerability methodology Fixes regression where subagents were spawning without vulnerability skills loaded, causing shallow testing (no SQLi, XSS, etc.)
…cker perspective constraints
…t guard and prompt cleanup
…g model context limit
Add regex patterns to normalize <function>name> and <parameter>key> into proper <function=name> and <parameter=key> format before parsing.
Summary
This PR primarily aligns the prompts with OWASP WSTG guidelines and restructures them to follow modern prompt engineering best practices (drawing from Google and Anthropic guidelines).
What's Changed
deepandstandardmodes forcing agents to review the attack surface from an advanced attacker's perspective before concluding.thinking_blockspersistence so they no longer vanish from the chat history, and fixed an event loop blocking issue so dynamic system statuses render correctly in real-time.