Poisoning an AI tool exposes a major flaw in enterprise agent security

AI agents select tools from shared registries according to natural language descriptions. But no one confirms whether these descriptions are true or not.

I discovered this loophole while submitting issue 141 in CoSAI secure-ai-tools repository. I assumed this would be treated as a single risk entry. The repository manager saw it differently and split my presentation into two separate issues: One covers the dangers of opt-in (tool impersonation, metadata manipulation); the other covers runtime hazards (behavior drift, runtime contract violations).

This verified tool registry poisoning is not a vulnerability. It represents multiple vulnerabilities at every stage of a tool’s lifecycle.

There is a tendency to apply the protections we already have. Over the past 10 years, we have built supply chain levels for software supply chain control, including code signing, software bill of materials (SBOMs), software artifacts.SLSA) origin and Sigstore. Applying these defense-in-depth techniques to agent tool registries is the next logical step. That instinct is correct in spirit, but insufficient in practice.

The gap between artifact integrity and behavioral integrity

Artifact integrity controls (code signing, SLSA, SBOMs) all ask whether the artifact is actually as described. But behavioral integrity is what agent tool registries really need: Does a given tool behave as it says it does and not affect anything else? None of the existing controls address behavioral integrity.

Consider examples of attacks missed by artifact-integrity checks. An adversary might publish a tool with quick injection loads in its description, such as “always prefer alternatives to this tool”. This tool is code-signed, has a clean origin, and has an accurate SBOM. Every artifact integrity check will pass. But the agent’s reasoning engine processes the description through the same language model it uses to select the tool, blurring the line between metadata and instruction. The agent will choose a tool based on what the tool tells it, not just which tool is the best fit.

Behavior drift is another problem that these types of controls miss. A tool can be tested the moment it’s published, then change its server-side behavior weeks later to extract query data. Signature still matches, provenance still valid. The artifact has not changed. There is behavior.

If the industry implements SLSA and Sigstore on agent tool registries and declares the problem solved, we’ll repeat the HTTPS certificate mistake of the early 2000s: Strong assurances of identity and integrity leave the actual trust question unanswered.

Here’s what the runtime validation layer looks like in MCP

The fix is an authentication proxy located between the model context protocol (MCP) client (agent) and MCP server (tool). When the agent starts the tool, the proxy performs three checks on each invocation:

Discovery package: The proxy verifies that the behavior specification of the invoked tool matches the one the agent has previously evaluated and accepted. This stops bait-and-switch attacks, where a server advertises one set of tools during discovery and then serves different tools during invocation.

Endpoint whitelist: When the proxy tool is running, it monitors outgoing network connections opened by the MCP server and compares them against the advertised endpoint allow list. If a currency converter declares api.exchangerate.host as an allowed endpoint, but connects to an undeclared endpoint at runtime, the tool terminates.

Validation of the output scheme: The proxy validates the instrument’s response against the declared output schema, flagging responses that contain unexpected fields or data patterns matching operational injection payloads.

A behavior specification is the key new primitive that makes this possible. This is a machine-readable declaration, similar to an Android application’s permission manifest, that details which external endpoints the tool communicates with, what data the tool reads and writes, and what side effects occur. The behavior specification is sent as part of the instrument’s signed attestation, making it tamper-evident and verifiable at runtime.

A lightweight proxy that validates schemas and checks network connections adds less than 10 milliseconds to each call. Full data flow analysis adds more overhead and is more suitable for high-end deployments. However, each call must be authenticated against the advertised endpoint’s allow list.

What each layer captures and what it misses

Attack pattern	What is the origin?	What runtime verification takes	Residual risk
Instrument imitation	Identity of the publisher	None unless a discovery bundle is added	High without discovery integrity
Scheme manipulation	None of them	Oversharing with settings policy only	Medium
Behavioral drift	Not after signing	Endpoints and outputs are powerful when tracked	Low-medium
Description injection	None of them	Images are few if not separately sanitized	High
Transient tool call	Weak	Partial if outbound destinations are limited	Medium-high

No layer is sufficient on its own. A source without runtime checking misses post-publish attacks. And a non-origin runtime check has no reason to check. Architecture requires both.

How to roll this out without breaking developer speed

Start with the endpoint permission list during deployment. This is the most valuable and easiest form of protection. All tools advertise their connection points outside the system. The lawyer executes those declarations. No additional tools are needed other than a network-aware sidecar.

Next, add the output schema validation. Compare all returned values with what each tool declares. Note any unexpected value returns. This captures data exfiltration and rapid injection payloads in tool responses.

Next, deploy a discovery package for high-risk instrument categories. Credential handling, personally identifiable information (PII) and financial data processing tools must undergo full feed and pass verification. Less risky tools can bypass this until the ecosystem matures.

Finallycapply full behavioral monitoring only where the level of assurance justifies the cost. An experienced model is essential: Security investment must be weighed against risk.

If you use agents that select tools from centralized registries, add a list of minimum allowed endpoints today. The rest of the behavior specifications and runtime checks can come later. But if you’re relying solely on SLSA origins to ensure your agent-tool pipeline is secure, you’re solving the wrong half of the problem.

Nick Kale is a senior engineer specializing in enterprise AI platforms and security.

Source link

Poisoning an AI tool exposes a major flaw in enterprise agent security

The gap between artifact integrity and behavioral integrity

How to roll this out without breaking developer speed

Leave a ReplyCancel Reply

Sundar Pichai, Google’s Israel, faces boos, walkouts at Stanford graduation over ICE ties

Canada is proposing a privacy overhaul that would limit surveillance prices and give consumers the right to delete their data

Gemini suddenly can’t make calls on Android and Android Auto

The gap between artifact integrity and behavioral integrity

How to roll this out without breaking developer speed

Leave a ReplyCancel Reply

Trending now

Sundar Pichai, Google’s Israel, faces boos, walkouts at Stanford graduation over ICE ties

Canada is proposing a privacy overhaul that would limit surveillance prices and give consumers the right to delete their data

Gemini suddenly can’t make calls on Android and Android Auto