Cursor’s new encoding model, Composer 2, is here: It outperforms Claude Opus 4.6, but still lags behind GPT-5.4.



Cursor, a San Francisco AI coding platform from Anysphere startup It is worth 29.3 billion dollarsstarted operating Composer 2The agent AI coding environment now has a new built-in coding model and offers dramatically improved benchmarks from its previous built-in model.

It is also launched and developed Composer 2 fastthe higher priced but faster option is the default experience for users.

Here is the cost breakdown:

  • Composer 2 Standard: $0.50/$2.50 for 1 million input/output tokens

  • Composer 2 Quick: $1.50/$7.50 for 1 million I/O tokens

This is a huge downgrade from the Cursor’s predecessor in-house model, Composer 1.5, from February$3.50 for one million input tokens and $17.50 for one million output tokens; Composer 2 is about 86% cheaper in both respects.

Composer 2 Fast also approx 57% cheaper than Composer 1.5.

There are also discounts for "cache-read values," ie sending some of the same tokens back to the model, $0.20 per million for Composer 2 and $0.35 per million for Composer 2 Fast, $0.35 per million for Composer 1.5.

It’s also important that this appears to be a local release of the Cursor, rather than a widespread stand-alone model. The company’s announcement and model documentation described Composer 2 as available on Cursor, tailored to Cursor’s agent workflow, and integrated with the product’s tool stack.

The provided materials do not indicate separate availability through external model platforms or as a general-purpose API outside of the Cursor environment.

Cursor offers not only better completions, but long-horizon coding

The deeper technical claim in this release isn’t just that Composer 2 scores higher than Composer 1.5. Cursor says that the model is more suitable for long-horizon agent coding.

On his blog, Cursor says the quality gains came from his first sustained training, giving him a stronger base for scaling reinforcement learning. From there, the company says it trains Composer 2 on long-horizon coding tasks, and the model can handle problems that require hundreds of moves.

This framework is important because it addresses one of the biggest unsolved problems in AI coding. Many models are good at generating isolated code. Fewer remain valid in the longer workflow of reading the repository, deciding what to change, editing multiple files, executing commands, interpreting failures, and continuing toward the goal.

Cursor’s documentation confirms that this is the use case it focuses on. It describes Composer 2 as an agent model with a 200,000-token context window configured for tool usage, file edits, and terminal operations within the Cursor.

It also mentions learning techniques such as self-generalization for long-term tasks. For developers who already use Cursor as their primary environment, this tight regulation may mean more than a general leaderboard claim.

Even if GPT-5.4 is still the leader in one main graph, the benchmark gains are significant

The published results of the Cursor show a clear improvement over previous Composer models. The company lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.

This compares to 44.2, 47.9 and 65.9 with Composer 1.5 and 38.0, 40.0 and 56.9 with Composer 1.

The release is more measured than the release of some models, because the Cursor does not claim universal leadership.

In Terminal-Bench 2.0, which measures how well an AI agent performs tasks in command-line terminal-style interfaces, GPT-5.4 still leads with 75.1, while Composer 2 leads with 61.7, Opus 4.6 with 58.0, Opus 4.5 with 52.1 and Composer 1.5 with 47.9.

This makes Cursor sound more pragmatic and more useful to buyers. The company isn’t saying that the Composer 2 is the best model in everything. The model is said to have moved to a more competitive level of quality, offering more attractive economics and stronger integration that product developers are already using.

Cursor has also included a performance vs. price chart in its CursorBench benchmark suite, which seems designed to make a Pareto-style argument for Composer 2.

In that chart, Composer 2 sits at a stronger performance point than Composer 1.5, and compares favorably with the higher-priced GPT-5.4 and Opus 4.6 settings shown by Cursor. The company’s message is not just that Composer 2 outperforms its predecessor, but that it can offer a more cost-effective trade-off for everyday coding work within Cursor.

Why is the “locked to cursor” point important to buyers?

Benchmark performance may not be the most important question for readers deciding whether to use Composer 2. A cursor may want a model that is optimized for their product experience.

This can be a strength. According to the documentation, Composer 2 can access Cursor’s agent toolkit, including semantic code search, file and directory search, file reading, file edits, shell commands, browser management, and web access.

This kind of integration can be more valuable than raw model quality if the goal is to perform real software tasks rather than to provide effective one-shot answers.

But it also narrows the target audience. Teams looking for a model that they can deploy broadly across multiple external tools and platforms should recognize that Cursor presents Composer 2 as a model for Cursor users, not as an existing stand-alone foundational model in general.

Bigger picture: Cursor is an operation argument

The significance of Composer 2 isn’t that Cursor suddenly took first place in every coding benchmark. It didn’t happen. More importantly, the Cursor makes an operational case: its model is getting better, its price is low enough to encourage wider adoption, and its faster tier is sensitive enough that the company is comfortable defaulting on it despite the higher price.

This combination may resonate with engineering teams that care less and less about the prestige of an abstract model and more about whether an assistant can be useful in long coding sessions without costing too much.

The cursor is wider price structure this helps build competitive pressure around the launch. On the current pricing page, Cursor offers a free Hobby level, a Pro plan for $20/month, Pro+ for $60/monthand Ultra for $200 a month for individual users, with higher tiers offering more usage between OpenAI, Anthropic and Google models.

On the business side, Teams cost $40 per user per monthEnterprise is individually priced and adds unified usage, centralized computing, usage analytics, privacy controls, SSO, audit trails, and granular admin controls. In other words, the Cursor is not loaded just to access the encoding model. The team charges for a managed application layer that sits on top of multiple model providers while adding features, management, and workflow tools.

That model is under increasing pressure as first-party AI companies push more into coding. OpenAI and Anthropic no longer only sell models through third-party products; they also ship their own coding interfaces, agents, and evaluation frameworks, such as Codex and Claude Code—begging the question of how much room is left for an intermediary platform.

Commentators on X, while untested and unrepresentative of the broader market, increasingly describe a shift from Cursor to Anthropic’s Claude Code, particularly among power users drawn to terminal-first workflows, longer-running agent behavior, and lower perceived overhead.

Some of these posts describe frustration with Cursor’s pricing, loss of context, or editor-centric experience, while also praising Claude Code as a more direct and fully agent way of working. Even cautiously, this kind of social chatter points to the strategic challenge Cursor faces: it needs to prove that its integrated platform, team control, and now its own internal models add enough value to justify sitting among the increasingly capable coding products of developers and model makers.

This makes Composer 2 strategically important to Cursor.

By offering a much cheaper internal model than Composer 1.5, fitting it into Cursor’s own tool stack, and defaulting to a faster version, the company is trying to show that it provides more than a wrapper around external systems.

The problem is that as first-party coding products improve, developers and enterprise buyers may increasingly ask whether they want a separate AI coding platform at all, or if model makers’ own tools are self-sufficient.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *