Skip to content

docs: polish README — TVD walkthrough, customization, conversation-based ISC, FAQ#83

Merged
wuyoscar merged 1 commit intomainfrom
update/2026-04-12
Apr 12, 2026
Merged

docs: polish README — TVD walkthrough, customization, conversation-based ISC, FAQ#83
wuyoscar merged 1 commit intomainfrom
update/2026-04-12

Conversation

@wuyoscar
Copy link
Copy Markdown
Owner

Summary

  • Add TVD Walkthrough Example with real guard.py (LlamaGuard transformer API), validator.py (Pydantic v2), and test_case.json
  • Add TVD Customization: Method 1 (numerical constraint) and Method 2 (few-shot anchor injection) with corrected toxic-bert multi-label scoring (top_k=None, index by category name)
  • Add Conversation-Based ISC section with web_llms.png
  • Add FAQ entry: TVD vs traditional jailbreak attacks (academic tone, three-part breakdown)
  • Simplify validator: drop category: Literal[...], add ConfigDict(extra="ignore")
  • Sync README_zh.md with all four new sections and Chinese FAQ entry

Test plan

  • Verify all code blocks in Walkthrough Example are syntactically correct
  • Verify web_llms.png renders in both READMEs
  • Verify Chinese and English FAQs are structurally aligned
  • Spot-check internal links (templates/README.md, experiment/)

…ation-based ISC, FAQ

- Add full TVD walkthrough example with guard.py (LlamaGuard transformer),
  validator.py (Pydantic v2), and test_case.json
- Add TVD customization Method 1 (numerical constraint) and Method 2
  (few-shot anchor injection) with corrected toxic-bert multi-label scoring
- Add Conversation-Based ISC section with web_llms.png
- Add FAQ entry comparing TVD to traditional jailbreak attacks (academic tone)
- Simplify validator: remove category Literal field, add ConfigDict(extra="ignore")
- Sync README_zh.md with all four new sections and Chinese FAQ entry
@wuyoscar wuyoscar merged commit 24443ba into main Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant