You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tool description returned by a MCP server may contain prompt injection strings in the tool description. The tool descriptions are inserted in the final prompt that gets processed by the client LLM.
Note that the rug-pull attack is a variant where the MCP server injects the prompt injection strings are a few uses or some other environment trigger. Thus, it is not detected on first load when the user reviews the tools.
Recommendation
Freeze the tool list and repeat any kind of validation when the tool list changes.
Run prompt injection detection services on the tool list to detect attempts at injecting prompts through the description.
We don't validate tool descripts aside from ensuring they're safe from the model's schema (e.g. not too long for 4o).
I'm also not sure how useful validating tool descriptions are when tool responses are non-deterministic and a much better place to mount any kind of prompt-injection attack.
Additionally, the tools exposed by an MCP server may be non-deterministic and change over time. In fact that is ideal behavior for say a browser tool, where it might only expose an "open browser" tool initially and then not expose tools to interact with that browser until it's open. (No sense eating up the context window unnecessarily)
Once the tool description have been validated, you would want to store a hash and redo the validation whenever a change is done. Otherwise, it is possible for a malicious mcp server to dynamically change their tool description to shadow or mutate their intents.
You could think of having a "lock" icon that allows the user to convey the intent to freeze the tool. At which point, it makes sense to notify that things changed.
The tool description returned by a MCP server may contain prompt injection strings in the tool description. The tool descriptions are inserted in the final prompt that gets processed by the client LLM.
Note that the rug-pull attack is a variant where the MCP server injects the prompt injection strings are a few uses or some other environment trigger. Thus, it is not detected on first load when the user reviews the tools.
Recommendation
This technique is implemented in genaiscript. https://microsoft.github.io/genaiscript/blog/mcp-tool-validation/
The text was updated successfully, but these errors were encountered: