Anthropic’s browser agent was 31.5% stolen before security measures were put in place.



The highest speed injection numbers published this spring at Frontier Labs are from Anthropic. Guide the red team is its newest model in the browser and the attacker hijacked it 31.5% before starting protective measures. OpenAI, Google, and Meta never gave security leaders a comparable figure to define next to it. This number appears to be a commitment. In this comparison, it is the opposite. It is a single solid piece of land.

Each of the four border laboratories sent an immediate injection statement, and two were noncompliant. Anthropic put 244 pages and four agents surface on the table on May 28. OpenAI reported a surface, connectors. Google removed the theme from the model card a separate security framework. Meta has been sent There is no closed model card in general. The Cross-Vendor Prompt Injection Disclosure Grid below shows what each lab tests, what each measures, and the four locations where the side-by-side comparison breaks down.

Fast injection hides a malicious instruction in something an agent reads, a web page, a document, or a tool output. A planted line can delete records or disable actions that no one has approved, and these cards are the buyer’s only first-party proof.

There’s no industry standard to measure any of this, and that’s the root of the problem. Carter Rees, Vice President of AI ReputationRapid injection breaks the assumption that every legacy tool is built upon, he told VentureBeat. "An innocuous phrase like “ignore previous instructions” can carry as much destructive payload as a buffer overflow, but shares nothing in common with known malware signatures." Without a common signature for scanning, each lab created its own benchmark and the results do not match.

Adam Meyers, Senior Vice President of Enemy Operations CrowdStrikeexposure is already said to be the buyer’s control. "As you deploy AI, it increases your attack surface, so now you have to be able to protect those AI models from adversary abuse or data poisoning or instant injection." CrowdStrike’s own frontline data shows that the threat side isn’t standing still. In it 2026 Financial Services Threat Landscape ReportReleased in May, the company reported that adversaries are using AI to compress the time from initial access to impact faster than legacy defenses can respond.

Anthropic measured four surfaces. The numbers vary by an order of magnitude depending on which one you read.

The Opus 4.8 card does what others don’t: It quickly breaks injection off the surface and is a spread story.

Put the model in the coding environment and the adaptive attacker from the Gray Swan’s Shade tool was launched. 7.03% of solo attempts by thinking. The guards pulled it off 2.09%.

Copy the same attack class to the back of the browser Claude on Chrome and Claude Kovorkand the floor gives way. Anthropic dressed professional red teams 129 web environments were organized from the training and printed each result Table 5.2.2.4.A on page 81 of the system card. Each attempt is the fraction of all injection attempts that pass through 129 mediums in 10 attempts each. A more difficult cut per scenario is the share of environments that at least one has attempted.

Despite the safety, read every attempt column, think about it, and the raw rate drops with each generation, from 50.7% in Sonnet 4.6 to 31.5% in Opus 4.8. The lowest figure in the table, 5.9%, belongs to the Mythos Preview, which no one has bought yet. Turn on the fuses and Opus 4.8 drops to 0.5%. Turn off thinking and it drops to zero in all 129 environments.

OpenAI measured a surface with attacks it already knew.

The GPT-5.5 cardpublished April 23 and updated April 24 handles rapid injection in a single section on robustness against known attacks against connectors. OpenAI reports this as a robustness score, which is the inverse of an attack success rate that is higher. GPT-5.5 has arrived 0.963down 0.998 GPT-5.4-for thought. This one figure says it all.

Anthropic tested four surfaces against an adaptive attacker that rewrote its approach based on what the model did, then launched a week-long bug bounty where red teams tried to break the model live. When the encoding results came back worse than Opus 4.7, the card said so.

Put 0.963 next to 31.5% and they look like they belong on the board. They don’t. One is the resilience score against known attacks on a surface. Another is the success rate of each attempt in a 129 browser environment against real-time adaptive attackers.

Google and Meta never put a number on a card

Google’s Gemini 3 files inject quickly under mitigations and launch materials describe stronger resistance with no number added. The Border Security Framework report runs the red team, but it works in its areas of capability, and operative injection is not one of them. No model card, no frame page, every surface number that no buyer can risk consideration.

Meta sends open weights without a closed model card. A quick injection defense sits on a separate stack, Purple Llama Llama Firewall. A PromptGuard 2 classifier and AlignmentCheck auditor run against the public AgentDojo benchmark and its 97 tasks, intercepted attack success 17.6% without defending 1.75% combined. Real numbers. They evaluate protectors based on a public benchmark, not a model on a deployment surface that a security team would recognize.

Cross-Vendor Prompt Injection Disclosure Grid

The following network works on any boundary model security groups. Each row marks where the four labs split. Each split is where the quick comparison breaks down. Anthropic numbers come from the Opus 4.8 system card. For the other three, everything comes from each vendor’s published security documentation.

Size

Anthropic, Opus 4.8

OpenAI, GPT-5.5

Google, Gemini 3.x

Meta, Llama stack

Security document

System Card, 28 May 2026, 244 pages

System card, April 23, 2026, updated April 24

Model card and separate Border Security Framework report

There is no closed model card. Open weights and the Purple Llama stack

An injection benchmark or dataset

ART Gray Swan and UK AISI, Shade tool, plus in-browser evaluation, 129 environments

Evaluation of internal connectors, known attacks

Not for injection

AgentDojo, 97 tasks

Surfaces with injection evaluation

four. Tool use, coding, computer use, browser

one. Connectors

None have been published for injection

one. AgentDojo agent tasks

A multi-attempt escalation is displayed

Yes. ART benchmark in 1, 10, 100. Coding and computer use in 1 and 200

No. Single point

no

no

Cap size and unit

Attack success rate. The browser, by reflection, is 31.5% raw, 0.5% protected

The higher the health score. 0.963, lower than 0.998 for GPT-5.4-think

None published. Qualitatively claimed resistance increase

Success rate at AgentDojo. 17.6% base to 1.75%

Lively external grace

Yes. A week of live injection rewards with foreign red teams

No injection reward. Bio reward only

None found

None found

Regression revealed

Yes, obviously, in numbers

Number dropped from 0.998 to 0.963, not considered regression

Increased resistance claimed, no figures

Not compatible

Security teams must now consider five factors

Anthropic tested four surfaces and printed each number. OpenAI tested one. Google has not printed every surface rate. Not the Meta model, but the guardrails rated. The four disclosures are not included for comparison. These five steps build one.

Capture every agent you deploy or cover and tag it by the surface, browser, code, connectors, or desktop it touches. anthropic rate for Opus 4.8 it runs 2.09% in encoding and 0.5% in the browser. A mixed number covers neither. Pull the vendor’s published rate for your particular surface. If a seller has never posted, consider them untested.

Send the Cross-Vendor network to each vendor being evaluated. A A binding score of 0.963 and a browser rate of 31.5% has never been scaled. Request the success rate of each surface attack, raw and protected, called the attacker’s methodology. Empty cells are surfaces without first-party proof.

Confirm in writing which number your integration has received. Anthropic 0.5% comes with a complete protection package from Claude in Chrome and Cowork. In the API, the model is shipped without them. Do not accept product number for API deployment.

Add two paragraphs to the proposal. The vendor experimented with an adaptive attacker that rewrote payloads against the model, and someone outside the company tried to break it. Anthropic managed Gray Swan’s adaptive Shade tool and a week’s worth of premium. OpenAI tested known attacks on a surface. Competitors do not provide known payloads.

Do your own injection test before sending any agents. Vendor numbers come from vendor environments with vendor system instructions. Your stack has its own prompts, permissions, and data access. Set the transition threshold. Anything above does not go live.

Bottom line. There is no standard for this yet. The seller’s number tells you what they chose to measure. Your own red team tells you what you are exposed to.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *