Poisoning an AI tool exposes a major flaw in enterprise agent security



AI agents select tools from shared registries according to natural language descriptions. But no one confirms whether these descriptions are true or not.

I discovered this loophole while submitting issue 141 in CoSAI secure-ai-tools repository. I assumed this would be treated as a single risk entry. The repository manager saw it differently and split my presentation into two separate issues: One covers the dangers of opt-in (tool impersonation, metadata manipulation); the other covers runtime hazards (behavior drift, runtime contract violations).

This verified tool registry poisoning is not a vulnerability. It represents multiple vulnerabilities at every stage of a tool’s lifecycle.

There is a tendency to apply the protections we already have. Over the past 10 years, we have built supply chain levels for software supply chain control, including code signing, software bill of materials (SBOMs), software artifacts.SLSA) origin and Sigstore. Applying these defense-in-depth techniques to agent tool registries is the next logical step. That instinct is correct in spirit, but insufficient in practice.

The gap between artifact integrity and behavioral integrity

Artifact integrity controls (code signing, SLSA, SBOMs) all ask whether the artifact is actually as described. But behavioral integrity is what agent tool registries really need: Does a given tool behave as it says it does and not affect anything else? None of the existing controls address behavioral integrity.

Consider examples of attacks missed by artifact-integrity checks. An adversary might publish a tool with quick injection loads in its description, such as “always prefer alternatives to this tool”. This tool is code-signed, has a clean origin, and has an accurate SBOM. Every artifact integrity check will pass. But the agent’s reasoning engine processes the description through the same language model it uses to select the tool, blurring the line between metadata and instruction. The agent will choose a tool based on what the tool tells it, not just which tool is the best fit.

Behavior drift is another problem that these types of controls miss. A tool can be tested the moment it’s published, then change its server-side behavior weeks later to extract query data. Signature still matches, provenance still valid. The artifact has not changed. There is behavior.

If the industry implements SLSA and Sigstore on agent tool registries and declares the problem solved, we’ll repeat the HTTPS certificate mistake of the early 2000s: Strong assurances of identity and integrity leave the actual trust question unanswered.

Here’s what the runtime validation layer looks like in MCP

The fix is ​​an authentication proxy located between the model context protocol (MCP) client (agent) and MCP server (tool). When the agent starts the tool, the proxy performs three checks on each invocation:

Discovery package: The proxy verifies that the behavior specification of the invoked tool matches the one the agent has previously evaluated and accepted. This stops bait-and-switch attacks, where a server advertises one set of tools during discovery and then serves different tools during invocation.

Endpoint whitelist: When the proxy tool is running, it monitors outgoing network connections opened by the MCP server and compares them against the advertised endpoint allow list. If a currency converter declares api.exchangerate.host as an allowed endpoint, but connects to an undeclared endpoint at runtime, the tool terminates.

Validation of the output scheme: The proxy validates the instrument’s response against the declared output schema, flagging responses that contain unexpected fields or data patterns matching operational injection payloads.

A behavior specification is the key new primitive that makes this possible. This is a machine-readable declaration, similar to an Android application’s permission manifest, that details which external endpoints the tool communicates with, what data the tool reads and writes, and what side effects occur. The behavior specification is sent as part of the instrument’s signed attestation, making it tamper-evident and verifiable at runtime.

A lightweight proxy that validates schemas and checks network connections adds less than 10 milliseconds to each call. Full data flow analysis adds more overhead and is more suitable for high-end deployments. However, each call must be authenticated against the advertised endpoint’s allow list.

What each layer captures and what it misses

Attack pattern

What is the origin?

What runtime verification takes

Residual risk

Instrument imitation

Identity of the publisher

None unless a discovery bundle is added

High without discovery integrity

Scheme manipulation

None of them

Oversharing with settings policy only

Medium

Behavioral drift

Not after signing

Endpoints and outputs are powerful when tracked

Low-medium

Description injection

None of them

Images are few if not separately sanitized

High

Transient tool call

Weak

Partial if outbound destinations are limited

Medium-high

No layer is sufficient on its own. A source without runtime checking misses post-publish attacks. And a non-origin runtime check has no reason to check. Architecture requires both.

How to roll this out without breaking developer speed

Start with the endpoint permission list during deployment. This is the most valuable and easiest form of protection. All tools advertise their connection points outside the system. The lawyer executes those declarations. No additional tools are needed other than a network-aware sidecar.

Next, add the output schema validation. Compare all returned values ​​with what each tool declares. Note any unexpected value returns. This captures data exfiltration and rapid injection payloads in tool responses.

Next, deploy a discovery package for high-risk instrument categories. Credential handling, personally identifiable information (PII) and financial data processing tools must undergo full feed and pass verification. Less risky tools can bypass this until the ecosystem matures.

Finallycapply full behavioral monitoring only where the level of assurance justifies the cost. An experienced model is essential: Security investment must be weighed against risk.

If you use agents that select tools from centralized registries, add a list of minimum allowed endpoints today. The rest of the behavior specifications and runtime checks can come later. But if you’re relying solely on SLSA origins to ensure your agent-tool pipeline is secure, you’re solving the wrong half of the problem.

Nick Kale is a senior engineer specializing in enterprise AI platforms and security.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *