@@ -116,10 +116,12 @@ categories:
116116` ` `
117117
118118**Configuration Inheritance:**
119+
119120- ` pii_enabled`: If not specified, inherits from global PII model configuration (enabled if `pii_model` is configured)
120121- `pii_threshold` : If not specified, inherits from `classifier.pii_model.threshold`
121122
122123**Threshold Guidelines by Category:**
124+
123125- **Critical categories** (healthcare, finance, legal): 0.9-0.95 - Strict detection, fewer false positives
124126- **Customer-facing** (support, sales): 0.75-0.85 - Balanced detection
125127- **Internal tools** (code, testing): 0.5-0.65 - Relaxed to reduce false positives
@@ -220,24 +222,28 @@ pii_requests_masked_total 15
220222Different categories have different PII sensitivity requirements :
221223
222224**Critical Categories (Healthcare, Finance, Legal):**
225+
223226- Threshold : ` 0.9-0.95`
224227- Rationale : High precision required; false positives on medical/financial terms are costly
225228- Example PII : SSN, Credit Cards, Medical Records
226229- Risk if too low : Too many false positives disrupt workflows
227230
228231**Customer-Facing Categories (Support, Sales):**
232+
229233- Threshold : ` 0.75-0.85`
230234- Rationale : Balance between catching PII and avoiding false positives
231235- Example PII : Email, Phone, Names, Addresses
232236- Risk if too low : Moderate false positive rate
233237
234238**Internal Tools (Code Generation, Development):**
239+
235240- Threshold : ` 0.5-0.65`
236241- Rationale : Code/technical content often triggers false positives; lower threshold needed
237242- Example PII : Variable names, test data that looks like PII
238243- Risk if too high : May still flag harmless code artifacts
239244
240245**Public Content (Documentation, Marketing):**
246+
241247- Threshold : ` 0.6-0.75`
242248- Rationale : Broader detection before publication; acceptable to review more false positives
243249- Example PII : Author names, example emails, placeholder data
0 commit comments