Anthropic and OpenAI just exposed the structural blind spot of SAST with free tools

OpenAI launched Codex Security on March 6Entering the app security market that Anthropic breached 14 days ago Claude Code Security. Both scanners use LLM reasoning instead of pattern matching. Both proved that traditional static application security testing (SAST) tools are structurally blind to all vulnerability classes. The enterprise security stack is caught in the middle.

Anthropic and OpenAI independently released inference-based vulnerability scanners, and both found classes of bugs that fit the SAST model and were never designed to detect. With a combined private market valuation of more than $1.1 trillion, competitive pressure between the two labs means detection quality will improve faster than either vendor can deliver on its own.

Neither Claude Code Security nor Codex Security replaces your existing stack. Both tools are permanently changing the math of purchasing. Both are currently free for enterprise customers. A head-to-head comparison and the following seven actions are all you need before the board of directors asks which scanner you tested and why.

How Anthropic and OpenAI came to the same conclusion from different architectures

Published by Anthropic Zero day investigation on February 5 Along with the release of Claude Opus 4.6. Claude Opus 4.6 has uncovered more than 500 previously unknown high-severity vulnerabilities in open-source production codebases that have survived decades of peer review and millions of hours of noise, Anthropic said.

In the CGIF library, Claude discovered a stack buffer overflow while thinking about the LZW compression algorithm, a flaw that scope-driven dimming cannot catch even with 100% code coverage. Anthropic shipped Claude Code Security as a limited research preview on February 20, with free accelerated access for Enterprise and Team customers to open source owners. Gabby Curtis, Head of Communications at Anthropic, He told VentureBeat in an exclusive interview Anthropic founded Claude Code Security to make its defense capabilities more widely available.

The OpenAI numbers come from a different architecture and a wider crawl surface. Codex Security evolved from Aardvark, an internal tool supported by GPT-5 that entered private beta in 2025. During the Codex Security beta period, the OpenAI agent scanned over 1.2 million commits across external repositories, revealing 792 OpenAI critical findings and 10,561 high findings. OpenAI reported vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, resulting in 14 designated CVEs. According to OpenAI, Codex Security’s false positives were reduced by more than 50% across all repositories during beta. Overreported severity was reduced by more than 90%.

Checkmarx Zero researchers demonstrated that moderately complex vulnerabilities sometimes escape detection by Claude Code Security. Developers can trick the agent to ignore sensitive code. In a full production-level codebase scan, Checkmarx Zero found that Claude identified eight vulnerabilities, but only two were true positives. If moderately complex clutter defeats the scanner, the detection ceiling is lower than the headline numbers suggest. Neither Anthropic nor OpenAI submitted discovery claims to an independent third-party audit. Security leaders should view the reported numbers as indicative, not verified.

Merritt Baer, CSO at Encrypt AI and the former deputy CISO at AWS told VentureBeat that the competitive scanner race is squeezing the window for everyone. Baer advised security teams to prioritize patches based on exploitability in the context of their runtime, not just CVSS scores, to shorten the window between discovery, triage and patching, and to provide software visibility of materials to immediately know where a vulnerable component is running.

The different methods barely match the codebases they scan, but the same result. The sample-matching SAST has a ceiling, and LLM reasoning extends past detection. When two competing labs share this capability at the same time, the math of dual use becomes unsettling. Any financial institution or fintech working with a commercial codebase should consider that if Claude Code Security and Codex Security can find these bugs, so can competitors with API access.

Baer made it clear: open source vulnerabilities exposed by reasoning models should be treated closer to zero-day class discoveries, not backlogs. The window between discovery and exploitation has simply compressed, and most vulnerability management programs are still only tested on CVSS.

What do seller responses prove?

secretlya developer security platform used by engineering teams to find and fix vulnerabilities in code and open source dependencies, acknowledged technical progress but argued that finding vulnerabilities was never the hard part. Fixing them at scale, across hundreds of repositories, without breaking anything. That’s the bottleneck. Snyk pointed to research showing AI-generated code It introduces 2.74 times more security vulnerabilities compared to human-written code Veracode’s 2025 GenAI Code Security Report. The same models that find hundreds of zero days also introduce new classes of vulnerabilities when writing code.

Cycode CTO Ronen Slavin wrote that Claude Code Security represents a real technical advance in static analysis, but AI models are probabilistic in nature. Slavin argued that security teams need consistent, repeatable, audit-grade results, and that the scanning capability built into an IDE is useful but does not constitute infrastructure. Slavin’s position: SAST is a broader discipline, and free scanning does not displace platforms that manage enterprise-wide management, pipeline integrity, and runtime behavior.

“If codebase scanners from major AI labs are effectively free for enterprise customers, then static code scanning becomes a commodity overnight,” Baer told VentureBeat. Over the next 12 months, Baer expects the budget to move in three areas.

Runtime and exploitation layersincluding runtime protection and attack path analysis.
AI governance and model securityincluding guardrails, rapid injection protection and agent control.
Automation of repair. “The net effect is that AppSec costs are likely not going down, but the center of gravity is shifting away from traditional SAST licenses and toward tools that shorten recovery cycles,” Baer said.

Seven things to do before your next board meeting

Run both scanners with a representative subset of the codebase. Compare Claude Code Security and Codex Security findings with your existing SAST output. Start with a single representative repository, not your entire codebase. Both tools are in research view with access restrictions making full estate scanning premature. Delta is your blind spot inventory.
Build the control framework before the pilot, not after. Baer told VentureBeat to approach both tools as a new data processor for your source code crown jewels. Baer’s governance model includes a formal data processing agreement with clear statements about training exclusions, data retention, and subprocessor usage, a segmented submission pipeline for transferring only repos you intend to scan, and an internal classification policy that differentiates code from code that can’t leave your border. In interviews with more than 40 CISOs, VentureBeat found that formal governance frameworks for logic-based scanning tools are not yet available. Baer cited acquired IP as a blind spot that most teams haven’t addressed. Can model providers store deployments or reasoning traces, and are these artifacts considered your intellectual property? Another gap is data residency for code, which has historically not been regulated like customer data, but is increasingly subject to export controls and national security controls.
Map what no other tool covers. Analysis of program content. Container scanning. Infrastructure-as-code. DAST. Runtime detection and response. Claude Code Security and Codex Security work at the coding level. Your existing stack handles everything else. The price power of this stack has changed.
Quantify dual use exposure. Every zero-day Anthropic and OpenAI lives in the open source project on which enterprise applications depend. Both labs release and patch responsibly, but the window between their discovery and your adoption of those patches is where attackers operate. AI security startup AISLE independently discovered them all 12 zero-day vulnerabilities in OpenSSL’s January 2026 security patchincluding a potentially remote stack buffer overflow without valid underlying material (CVE-2025-15467). Fuzzers ran against OpenSSL for years and missed every single one. Assume that competitors are running the same models against the same codebases.
Prepare a board comparison before they ask. Claude Code Security interprets code contextually, follows data flows, and uses multi-step self-authentication. Codex Security builds a project-specific threat model before scanning and validates findings in sandboxed environments. Each tool is in research review and requires human approval before any patches are applied. The board needs a side-by-side analysis, not from a single vendor. When it comes to why your existing kit misses what Anthropic found, Baer suggested a framework that works at the board level. Baer told VentureBeat that pattern-matching SAST solved a different generation of problems. It is designed to detect known anti-patterns. This capability is still important and reduces risk. But reasoning models can evaluate the multithreaded logic, state transitions, and developer intent that many modern bugs experience. Baer’s board-ready summary: “We got the right tools for the threats of the last decade; the technology just evolved.”
Follow the competitive cycle. Both companies are heading toward IPOs, and wins in enterprise security are driving the growth story. When one scanner misses a blind spot, it’s on another lab’s feature roadmap within weeks. Both laboratories send model updates on a monthly basis. This cadence will exceed any vendor’s release schedule. Running both is the right thing to do, Baer said: “Different models think differently, and the delta between them can reveal bugs that neither tool can consistently catch on its own. In the short term, using both isn’t redundant. It’s protection through different systems of thought.”
Set a 30-day pilot window. Before February 20, this test did not exist. Run Claude Code Security and Codex Security against the same codebase and let delta drive the procurement conversation with empirical data instead of supplier marketing. Thirty days gives you this information.

Fourteen days separated Anthropic and OpenAI. The gap between the next releases will be shorter. The strikers look at the same calendar.

Source link

Anthropic and OpenAI just exposed the structural blind spot of SAST with free tools

How Anthropic and OpenAI came to the same conclusion from different architectures

What do seller responses prove?

Seven things to do before your next board meeting

Leave a ReplyCancel Reply

We explain why cloud recovery is one of the most important new features of Windows 11 and how it allows you to restore your PC without a USB drive or complicated steps.

Google’s TabFM skips training on every database and predicts on tables it’s still never seen before

China has restored its first usable missile and shown a new way to do it

How Anthropic and OpenAI came to the same conclusion from different architectures

What do seller responses prove?

Seven things to do before your next board meeting

Leave a ReplyCancel Reply

Trending now

We explain why cloud recovery is one of the most important new features of Windows 11 and how it allows you to restore your PC without a USB drive or complicated steps.

Google’s TabFM skips training on every database and predicts on tables it’s still never seen before

China has restored its first usable missile and shown a new way to do it