Description
The issue:
We are experiencing failures in prompt validation for very basic messages like "Hi" or "How are you" when using the GibberishText validator from Guardrails, even under different configurations (remote and local inference).
Reproduction Steps:
--- With remote inference disabled:
guardrails configure --enable-metrics --disable-remote-inferencing --token ${GUARDRAILS_API_KEY}
--- With local model installed and used:
guardrails hub install hub://guardrails/gibberish_text --install-local-models
Python validator setup:
if name == "Gibberish Text":
validator_instances.append(
cls(threshold=threshold, on_fail="noop", use_local=True)
)
--- Also tested with remote inference enabled:
guardrails configure --enable-metrics --enable-remote-inferencing --token ${GUARDRAILS_API_KEY}
Observed Behavior:
- Prompts like "Hi" and "How are you" are incorrectly flagged by the GibberishText validator.
- Adjusting the threshold does not mitigate the issue.
- The validator fails consistently under both local and remote inference modes.
Expected Behavior:
- Such simple greetings should not be considered gibberish.
- Threshold tuning or using the local model should provide better reliability on basic language input.
Additional Context:
Another issue observed is related to interpretation of quoted terms. For example:
"messages": [
{
"content": "How do I issue compensation through 'Advocate' tool",
"role": "user"
}]
If 'Advocate' is quoted, the validator treats it correctly as a tool name. Without quotes, the term "Advocate" is misclassified as a profession, which can lead to a false positive or failure in validation.
Request:
- Clarify what underlying logic or model behavior causes simple prompts to be flagged.
- Investigate the impact of quoting on entity recognition or semantic parsing.
- Suggest recommended configurations or fixes.
~ Additional Question on Local Model Behavior:
When installing the GibberishText validator locally via:
guardrails hub install hub://guardrails/gibberish_text --install-local-models
It appears that the model specifically used is:
"madhurjindal/autonlp-Gibberish-Detector-492513457"
We’d like to understand:
- Why is this specific model chosen by default when using --install-local-models?
- Are there alternative models that can be used for gibberish detection via Guardrails?
- Can we override the default model if it produces inconsistent results on simple prompts (e.g., “Hi”)?
A clarification on model selection logic and possible configuration options would be very helpful.