Anthropic’s Claude Opus 4.8 is here with 3X cheaper speed mod and near Mythos level alignment



Anthropic today Claude released Opus 4.8an upgrade to the flagship model, which ships at the same price as its predecessor, and is dramatically cheaper "fast mode" a new feature that allows the model to create hundreds of parallel subagents for tier- and codebase-scale work.

The model is immediately available on Anthropic’s platforms – claude.ai, Claude Code, API and Cowork – at unchanged prices: $5 per million input tokens and $25 per million output tokens. Developers can call it claude-opus-4-8.

The headline efficiency story is fast mode. Anthropic has reduced the cost of running Opus 4.8 in fast mode – where the model produces tokens at about 2.5x normal speed – from $30/$150 for Opus 4.7 to $10 per million input tokens and $50 per million output tokens.

This is a 3x reduction from the fast-mode cost of previous models and results in high performance for latency-sensitive production workloads.

Quick mode is immediately available through Claude Code /fast order; API access is subject to a waiting list claude.com/fast-mode.

In normal mode, Claude Opus 4.8 remains the most expensive of the leading edge models, but still comes in below its main competitor, OpenAI’s GPT-5.5.

Frontier AI Model API Evaluation Snapshot

Model

Introduction

Exit

Total Cost

Source

MiMo-V2.5 Flash

$0.10

$0.30

$0.40

Xiaomi MiMo

MiniMax M2.7

$0.30

$1.20

$1.50

MiniMax

Gemini 3.1 Flash-Lite

$0.25

$1.50

$1.75

Google

MiMo V2.5

$0.40

$2.00

$2.40

Xiaomi MiMo

Kimi-K2.6

$0.95

$4.00

$4.95

Moonshot/Like

GLM-5

$1.00

$3.20

$4.20

Z.ai

Grok 4.3 (bottom context)

$1.25

$2.50

$3.75

xAI

DeepSeek V4 Pro

$1.74

$3.48

$5.22

DeepSeek

GLM-5.1

$1.40

$4.40

$5.80

Z.ai

Claude Haiku 4.5

$1.00

$5.00

$6.00

anthropic

Grok 4.3 (high context)

$2.50

$5.00

$7.50

xAI

Qwen3.7-Max

$2.50

$7.50

$10.00

Alibaba cloud

Gemini 3.5 Flash

$1.50

$9.00

$10.50

Google

Gemini 3.1 Pro Preview (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.4

$2.50

$15.00

$17.50

OpenAI

Gemini 3.1 Pro Preview (>200K)

$4.00

$18.00

$22.00

Google

Closing the case 4.7

$5.00

$25.00

$30.00

anthropic

Closing the case 4.8

$5.00

$25.00

$30.00

anthropic

GPT-5.5

$5.00

$30.00

$35.00

OpenAI

More modest gains than 4.7, but Mythos-class capabilities are coming

According to the evaluations, Opus 4.8 is not a leap, but a step. It scores 88.6% on SWE-bench Verified (vs. 87.6% for Opus 4.7), 69.2% (vs. 64.3%) on the tougher SWE-bench Pro, and 74.6% (vs. 66.1%) on Terminal-Bench 2.1. Anthropic itself characterizes the model "a modest but noticeable improvement over its predecessor."

It routinely outperforms GPT-5.5 on at least 12 criteria, including most knowledge work, coding (problem level), use of agent tools, and long context criteria. GPT-5.5 excels in terminal/CLI workflows and is about web browsing and graduate-level science.

The larger signal sits on Anthropic’s internal capability ladder: Opus 4.8 sits between Opus 4.7 and the more capable Claude Mythos Preview, and is currently limited to a small number of organizations under Project Glasswing for cybersecurity work.

Anthropic says he is waiting to bring "Mythos-class models to all our customers in the coming weeks" when additional cyber security measures are implemented.

Several venture partners reported financial gains. Databricks has announced that it has unlocked Opus 4.8 "a step change in agent reasoning" Inside the Genie data agent, at "61% cheaper token cost than Opus 4.7" Thanks to multimodal efficiency in PDFs and charts.

Hebbia cited better quote accuracy and token efficiency in dense financial documents. This was reported by Devin-maker Cognition "directly translates into faster capability acquisition for engineers" and fixed comments and tool calling issues from 4.7 in Opus 4.8. One PC-using vendor reported 84% on Online-Mind2Web, a jump over both Opus 4.7 and GPT-5.5.

Dynamic workflows: hundreds of parallel subagents

Along with the model, Anthropic launched a research preview of dynamic workflows in Claude Code – a feature designed for tasks that are too large for a single context window. Claude schedules the job, creates hundreds of parallel subagents, then checks his results before reporting back. Anthropic example: codebase scale migration "across hundreds of thousands of lines of code from start to merge, the existing test suite as its stick."

Dynamic workflows are available in Claude Code’s Enterprise, Team, and Max plans.

Two small supplements round out the release:

  1. Effort control at claude.ai and Claude Cowork: A new selector allows users to dial in how much Claude thinks about each answer – higher effort spends more tokens for better answers, lower effort gives faster answers and burns speed limits more slowly. Available on all plans.

  2. System entries in the messages array in the API: Developers can now update Claude’s instructions in the middle of a task—adjusting permissions, token budgets, or environment context while the agent is running—without violating the operational cache.

Honesty and one "assessment awareness" warning

As a headline feature, Anthropic leads the way with integrity. The company’s compatibility team reports that Opus 4.8 "allows defects in code that are about four times less likely than its predecessor to go unnoticed," and misconduct rates now "It’s significantly lower than Opus 4.7 and similar to our best-fit Claude Mythos Preview."

Indeed, a bar chart released by Anthropic shows how close Opus 4.8 still is to the selectively released Mythos (a lower score is better), at around 1.9, down from 2.5 for Opus 4.7 and effectively tied with the more capable, limited Mythos Preview. The score is based on approximately 2,600 simulated research sessions for each model.

The 244 page system card Publicly released by Anthropic, it also provides more detail on certain categories of incompatibility – whether a model produces potentially harmful content around "military grade weapons," "Harmful sexual content", "unauthorized cyber attack"and "undermine liberal democracy," and yet, in all, Opus 4.8 scores significantly better than 4.7 or Sonnet 4.6, and is pretty close to Mythos.

Anthropic flags consider a find "most interesting" from training: Opus 4.8 shows an increasing tendency to think openly about how to evaluate its results, including environments where it is not reported to be evaluated. In other words: the model probably knows it’s being graded, and gives the answer it thinks will get a good grade on the test, not necessarily what it would if it thought it wasn’t graded.

Anthropic says this doesn’t translate into worse observable behavior—Opus 4.8 shows fewer deceptive task-success claims than previous models—but calls it "a worrying trend that could make training more difficult in the future." In about 5% of the training episodes, the initial interpretation task also found nonverbal reasoning related to the grader.

Anthropic put the model through a week of live bug bounty for quick injection — a first — and the Opus 4.8 ended up standing between the Opus 4.7 and the Sonnet 4.6. "all comparable frontier models" It has been tested with security measures in place that reduce the success rates of a browser-based attack to close to zero.

What’s next?

Anthropic teased two trajectories. Near term: cheaper models that provide "Many of the same capabilities as Opus." Long-term: Mythos-class models, the company said, have higher intelligence than Opus, but require stronger cybersecurity before general release.

Right now, Opus 4.8 is positioned as the new enterprise and development workhorse — slightly smarter than 4.7, dramatically cheaper to run faster, and noticeably more honest about what it doesn’t know.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *