The anthropic model prevailed in every measure, then the government pulled it



TL; DR

The Fable 5 outperformed the GPT 5.5 in every major benchmark, but was withdrawn by the US government after three days, making the GPT 5.5 the best model you can actually use.

Anthropic’s Fable 5 spent three days as the most capable AI model ever released to the public. It topped the Chatbot Arena leaderboard, cracked OpenAI’s GPT 5.5 by a double-digit margin on coding benchmarks, and gave paying subscribers access to Mythos-level reasoning for the first time. Then on June 12 The US government ordered Anthropic to shut it down.

The result is a strange moment in artificial intelligence. A model that clearly outperforms everything else on the market is one you can’t use. GPT 5.5 with OpenAI was launched at the end of April under the internal code name “Spud,” is now the most powerful model available to developers and consumers, not because it has improved, but because its only real competitor has been removed.

The criterion difference between the two is not close. In SWE-Bench Pro, which measures the model’s ability to solve real-world software engineering problems in open source codebases, Fable 5 scored 80.3%, compared to GPT 5.5’s 58.6%, a difference of 22 points. In SWE-Bench Verified, a select subset of the same benchmark, the Fable 5 achieved 95.0%.

Coding criteria tell a similar story. Fable 5 leads Code Arena with an Elo score of 98, with GPT 5.5 scoring 1,665 to 1,501. On FrontierCode Diamond, a benchmark designed to test the most difficult programming tasks, Fable 5 scores 29.3%, while GPT 5.5 scores 5.7%, while on the broader Chatbot Arena leaderboard, Fable 5 ranks first with GPT 5.5 in fourth place.

GPT 5.5 has a power field. On Terminal-Bench 2.0, which evaluates interactive terminal-based coding tasks rather than problem solving at the codebase level, the GPT 5.5 scored 82.7%, compared to the Fable 5’s roughly 88.0%. There, the gap narrows, and the benchmark tests a different skill by executing and debugging commands in real-time than by reading and patching large repositories.

The evaluation also favors OpenAI. GPT 5.5 costs $5 per million input tokens and $30 per million output tokens, which is half the price of Fable 5 at $10 and $50 respectively. For developers running high-volume applications where the performance difference is less critical than price, GPT 5.5 is the more practical choice, even if both models are available.

Fable 5 was released to the general public on June 9 as Anthropic’s first Mythos-class model. It offered one million token context windows and 128,000 output tokens. Anthropic made it available at no additional cost to Pro, Max, Team and Enterprise subscribers until June 22.

The closure came through an export control directive issued on June 12. The government cited the jailbreak vulnerability as the reason for pulling both the Fable 5 and the wider Mythos 5 model family. Anthropic disputed the seriousness of the finding, saying the vulnerabilities identified were small, publicly known and could be accessed by GPT 5.5 without any workarounds, while reports suggest Amazon CEO Andy Jassy played a role in the government’s review.

The practical result is that developers and researchers evaluating Fable 5 for production use have had to revert to GPT 5.5 or Anthropic’s earlier Opus models. The lower level is important for workflows that require coding. A gap of 22 points in SWE-Bench Pro represents the difference between a model that can solve four out of five real-world software problems and one that solves three out of five.

Whether or not Fable 5 can be returned depends on Anthropic’s negotiations with the government regarding export control classification. The company publicly argued that the directive was disproportionate and that the weaknesses cited did not justify pulling the model altogether. Until this controversy is resolved, GPT 5.5 is the best model by default, not because it is the best model available.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *