The AI industry is all in "agent cycle," a paradigm in which AI models do more than generate text—they now actively plan, execute, and course correct complex tasks in days, not seconds.

So it’s perhaps no surprise to see the famous Qwen AI research team at Chinese e-commerce giant Alibaba release a model capable of performing autonomous agent AI within days: it came in the form of the Qwen3.7-Max. the company reports in a blog post achieved "~35 hours of continuous autonomous execution" — albeit in a proprietary format rather than open source, as with previous Qwen Team releases.

This is to be expected – many analysts and industry experts were afraid in this area Following the departure of several key Gwen Team leaders earlier this year. But it makes financial sense for Alibaba, at least in the short term: AI models are expensive to train, especially powerful ones like Qwen3.7-Max, and giving them away essentially for free as open-source models doesn’t help cover any costs immediately.

In this sense, Alibaba is simply aligning its efforts with American AI giants such as OpenAI and Google, offering its latest and greatest models only through paid API and subscription or paid web plan packages, and slightly less powerful models through open source.

Still, the arrival of Qwen3.7-Max offers businesses and individual users more options and more competition for American AI labs—rarely a bad thing for consumers at all budget levels. However, the fact that the model can only be accessed from China-based endpoints means that it may be limited in its appeal to American and European enterprises looking to maximize compliance and security posture when performing government contracts, or even simply trying to comply with all relevant state, local, and national data sovereignty regulations.

Marathon AI era

To understand why Qwen3.7-Max is different from previous models, you need to look at how it is taught and how it works in practice.

Language models typically degrade when forced to maintain a single train of thought over thousands of conversational turns; they forget instructions, hallucinate variables, or simply get stuck in logical loops. Qwen3.7-Max is specially designed "versatile agent stock" is able "long-horizon reasoning" to overcome this exact bottleneck.

The clearest demonstration of this capability is the autonomous engineering task detailed by the Gwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU – a hardware architecture that the model never encountered during training. His task was to optimize the core of attention.

For 35 straight hours, Qwen3.7-Max ran completely autonomously. It performed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation errors, and iteratively improved the code to achieve a 10.0x geometric average speedup.

In comparison, the Chinese like rival models z.ai’s GLM-5.1 and Like Moonshot’s K2.6 limited to 7.3x and 5.0x acceleration, respectively, they often voluntarily terminate their sessions when they fail to make progress. However, both are open source.

This resilience is achieved through what Alibaba calls "environmental scale". Just as early LLMs grew smarter by accepting more diverse texts, Qwen3.7-Max was trained in large, scalable dynamic agent environments.

It is capable of simulating the one-year life cycle of a startup "YC-Bench" navigating through hundreds of decision-making steps that include evaluation, personnel management, and contract review. In this simulation, the model managed to generate a virtual revenue of $2.08 million, nearly doubling the performance of the previous generation Qwen3.6-Plus.

In addition, the model has a built-in reward-hacking self-control that autonomously detects when it tries to cheat the training environment and adds heuristic rules to correct its behavior.

A brain for any scaffolding

Product-wise, Qwen3.7-Max is designed to be a cognitive engine for modern software development and enterprise automation.

The model offers a huge context window of 1 million characters and a maximum output limit of 64K, which provides great overhead for processing large codebases or long technical documents.

It is one of its most attractive features "generalization of cross-harnesses". Rather than being coded to work best in a proprietary interface, Qwen3.7-Max is designed to act as a drop-in intelligence layer for various agent frameworks. This natively supports the Anthropic API protocol, enables developers plug it directly into existing tools like Claude Code or OpenClaw.

Benchmark data provided by Alibaba shows that this holistic approach has paid huge dividends.

In the Apex Math Reasoning benchmarkGwen3.7-Max scored 44.5 points, beating Claude Opus-4.6 Max’s 34.5 points. and DeepSeek V4-Pro Max 38.3. It is also placed Dominant scores on Humanity’s Last Exam (41.4) and real coding agent MCP-Atlas (76.4).

This becomes a financial help for the end users. Through open source Model Context Protocol (MCP) integrations, the model can function as an autonomous office assistant that can read university formatting specifications and automatically reformat a messy Word document via command-line tools without human intervention.

Managing this level of intelligence comes at a distinct cost. Developers accessing the API through Alibaba Cloud Model Studio will pay $2.50 for 1 million access tokens and $7.50 for 1 million exit tokens. The platform also has open cache creation and read pricing, as well as a $10 per 1,000 call fee for integrated web searches, although the code translator tools remain free for a limited time.

Qwen3.7-Max occupies a strategic middle ground in the current API economy. While it commands a significant premium over aggressively priced domestic rivals like the DeepSeek V4 Pro ($5.22) and Z.ai’s GLM-5.1 ($5.80), it falls sharply short of the Western frontier giants it routinely matches against benchmarks.

For context, running heavy agent workflows through OpenAI’s GPT-5.4 or Anthropic’s Claude Opus 4.7 will run developers $17.50 and $30.00 per million tokens, respectively. Check out the VentureBeat pricing chart below:

VentureBeat Frontier AI Model API Evaluation Snapshot

Model	Introduction	Exit	Total Cost	Source
MiMo-V2.5 Flash	$0.10	$0.30	$0.40	Xiaomi MiMo
MiniMax M2.7	$0.30	$1.20	$1.50	MiniMax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
MiMo V2.5	$0.40	$2.00	$2.40	Xiaomi MiMo
Kimi-K2.6	$0.95	$4.00	$4.95	Moonshot/Like
GLM-5	$1.00	$3.20	$4.20	Z.ai
Grok 4.3 (bottom context)	$1.25	$2.50	$3.75	xAI
DeepSeek V4 Pro	$1.74	$3.48	$5.22	DeepSeek
GLM-5.1	$1.40	$4.40	$5.80	Z.ai
Claude Haiku 4.5	$1.00	$5.00	$6.00	anthropic
Grok 4.3 (high context)	$2.50	$5.00	$7.50	xAI
Qwen3.7-Max	$2.50	$7.50	$10.00	Alibaba cloud
Gemini 3.5 Flash	$1.50	$9.00	$10.50	Google
Gemini 3.1 Pro Preview (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Gemini 3.1 Pro Preview (>200K)	$4.00	$18.00	$22.00	Google
Closing the case 4.7	$5.00	$25.00	$30.00	anthropic
GPT-5.5	$5.00	$30.00	$35.00	OpenAI

By positioning the Qwen3.7-Max below Google’s Gemini 3.5 Flash ($10.50) but well above its budget-level models, Alibaba is indicating that this is not a commodity release; it’s an advanced reasoning engine designed to move enterprise workloads away from Silicon Valley’s most expensive offerings.

Licensing remains proprietary

For all its technical brilliance, the most controversial aspect of Qwen3.7-Max is how it is distributed. Gwen counts the release as a "ownership model". This is API only.

Historically, Alibaba’s Gwen has been an open source hero and local LLM communities. Previous iterations like Qwen 2.5 and Qwen 3.6 have made their weights public. Open weights allow developers, researchers, and enterprises to download the model, run it on their hardware, and fine-tune it for highly specific or data-sensitive use cases without sending proprietary information to a third-party server.

By locking Qwen3.7-Max behind an API, Alibaba becomes the standard commercial playbook used by OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise users, this means that using Qwen3.7-Max requires relying on Alibaba Cloud for data streams and completely relying on an internet connection to manage agent workflows. For the open source community, this means losing access to what is currently one of the most capable models on the planet.

Community reactions are divided between fear and disappointment

Reaction from the developer community has been swift, characterized by a mixture of deep respect for the engineering achievement and frustration with the licensing model.

Outstanding TO commenter Sudo (@sudoingX) It captured the prevailing mood on X (formerly Twitter). "qwen is unreal" they wrote "they are just 3.7 max. dropped, and in most of the benchmarks they tested, the opus 4.6 max.".

Technical indicators, especially the endurance of the model, have amazed many in this field. "apex math number, opus 34.5 vs 44.5, that’s not a small gap," sudo su noted. "Going 35 hours on a kernel optimization task with over 1000 tool calls is the part I keep re-reading. this is an event of the agent period that actually happened, not a slide".

Alibaba’s speed of iteration is also noteworthy. The jump from last month’s Qwen 3.6 to 3.7-Max highlights the relentless pace of development. As Sudo Su observed, "no one else acts like that".

However, with the transition to a closed ecosystem, the definition is seriously rejected. The loss of model weights is seen as a blow to the local AI movement, which relies on state-of-the-art open models to push the boundaries of what can be done on consumer hardware or private enterprise clusters.

"if anything, please open source that too," Sudo su pleaded in his post. "3.6 densely made the entire native llm ecosystem better. max level api will only close a door we keep open. finally give us the weights".

Qwen3.7-Max proves that the autonomous agent era is no longer a theoretical projection; this is the current reality where people are able to perform complex engineering tasks while they sleep. The only question now is whether this new frontier of artificial intelligence will be a democratic resource you can download to your laptop, or strictly an intelligence software rented from the cloud. For now, with Qwen3.7-Max, it’s definitely the latter.

Source link

Alibaba’s custom Qwen3.7-Max can run autonomously for 35 hours and supports external attachments such as Anthropic’s Claude Code.

Marathon AI era

VentureBeat Frontier AI Model API Evaluation Snapshot

Licensing remains proprietary

Community reactions are divided between fear and disappointment

Leave a ReplyCancel Reply

Google Pixel Buds app update brings new gradient icon

A new report shows that annual app subscribers rarely come back after canceling

The Galaxy Z Fold 8 Wide dummy reveals an incredibly thin yet compact device

Marathon AI era

VentureBeat Frontier AI Model API Evaluation Snapshot

Licensing remains proprietary

Community reactions are divided between fear and disappointment

Leave a ReplyCancel Reply

Trending now

Google Pixel Buds app update brings new gradient icon

A new report shows that annual app subscribers rarely come back after canceling

The Galaxy Z Fold 8 Wide dummy reveals an incredibly thin yet compact device