
Alibaba Qwen3.7-Plus was released this weekthe latest AI large language model (LLM) in the globally loved and ever-expanding Gwen family, with more multimodal capabilities and a 60% lower cost. ago, the text-only Qwen3.7-Max model was released a few weeks ago.
However, like its immediate predecessor, Qwen3.7-Plus is only available under a "closed" commercial license through custom application programming interfaces (APIs) and Gwen Chat.
This marks a major departure from Gwen’s strategy to date, which has largely focused on releasing powerful, state-of-the-art open source models. Enterprises and users who rely on open source Gwen models—among them, US giants like Airbnb — will certainly be disappointing to see Alibaba shut down for its new releases.
However, the model is worth a look for its low cost and high performance in multimodal tasks such as creating enterprise-grade visuals or analyzing video, images, and screenshots that Qwen3.7-Max can’t (this is for text only). It’s among the cheaper AI-powered models currently available, and it’s priced slightly higher than its Chinese rival’s newer models. Limited time discount pricing on the MiniMax-M3.
VentureBeat Frontier AI Model API Evaluation Snapshot
|
Model |
Introduction |
Exit |
Total Cost |
Source |
|
MiMo-V2.5 Flash |
$0.10 |
$0.30 |
$0.40 |
|
|
deepseek-v4-flash |
$0.14 |
$0.28 |
$0.42 |
|
|
deepseek-v4-pro |
$0.435 |
$0.87 |
$1,305 |
|
|
MiniMax-M3 |
$0.30 |
$1.20 |
$1.50 |
|
|
Qwen3.7-Plus |
$0.40 |
$1.60 |
$2.00 |
|
|
Gemini 3.1 Flash-Lite |
$0.25 |
$1.50 |
$1.75 |
|
|
MiMo V2.5 |
$0.40 |
$2.00 |
$2.40 |
|
|
Grok 4.3 lower context |
$1.25 |
$2.50 |
$3.75 |
|
|
GLM-5 |
$1.00 |
$3.20 |
$4.20 |
|
|
Kimi-K2.6 |
$0.95 |
$4.00 |
$4.95 |
|
|
GLM-5.1 |
$1.40 |
$4.40 |
$5.80 |
|
|
Grok 4.3 high context |
$2.50 |
$5.00 |
$7.50 |
|
|
Qwen3.7-Max |
$2.50 |
$7.50 |
$10.00 |
|
|
Gemini 3.5 Flash |
$1.50 |
$9.00 |
$10.50 |
|
|
Gemini 3.1 Pro Preview ≤200K |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.4 |
$2.50 |
$15.00 |
$17.50 |
|
|
Gemini 3.1 Pro Preview >200K |
$4.00 |
$18.00 |
$22.00 |
|
|
Closing the case 4.8 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.5 |
$5.00 |
$30.00 |
$35.00 |
Maintaining continuity during complex tool execution loops
A major bottleneck for technical decision makers using autonomous agents has rarely been initial model exploration. Instead it is decay of the state— tendency of the agent framework to lose analytical trajectory over multi-step, long-horizon tasks.
Qwen3.7-Plus addresses this architectural weakness through a combined approach to context management and justification state preservation.
The model ships with a 1 million token context window and allocates up to 256K tokens specifically for internal thought chain processing. To contextualize this capability, imagine an automated cloud migration agent: it can ingest an entire codebase, map dependencies, and spend thousands of tokens silently evaluating outliers before executing a single-line bash script.
Essentially, the API exposes a parameter named ‘.preserve_thinking.’ Across Alibaba’s ecosystem, this capability serves as a standardized architectural bridge rather than a tiered advantage. Alibaba introduced this feature during the previous Qwen 3.6 generation and integrated it into both open weight. Qwen3.6-27B and special Max models.
Basically the setting works at the API and template level to keep internal <think> blocks in continuous talk queues.
This structural persistence solves a critical bottleneck for developers designing long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from losing its context or needlessly recomputing its cached history mid-operation.
When the model performs complex, multi-step agent encoding tasks, this storage allows the system to retain its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.
Alibaba is not alone in recognizing this technical imperative, as the core concept now dictates the architecture of nearly all major AI labs.
Anthropic implements this exact ability under an alias "Expanded Thinking" for its advanced models, including Latest Claude Opus 4.8. This framework requires developers to return unmodified reasoning blocks directly to the API in subsequent queues to maintain an unbroken chain of reasoning.
OpenAI solves the same problem through an encrypted justification backhaul mechanism for models like GPT-5.5. In the OpenAI ecosystem, developers must return specific reasoning elements generated alongside previous function calls, ensuring that the model explicitly remembers the rationale behind the execution of the tool.
Finally, preserve_thinking it simply represents Alibaba’s terminology, which is quickly becoming the undisputed table stake for modern multi-shift reasoning.
Benchmarks show a competitive yet state-of-the-art model
By raw capacity measures, this deep-thinking architecture translates into structural gains on multimodal and agency criteria. However, it is still inferior to many leading and previous generations of US proprietary models such as Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4.
Active Terminal Bench 2.0-Terminusobtained Qwen3.7-Plus, which measures the ability of this model to run real terminal-level code safely and iteratively. 70.3DeepSeek-V4-Pro outperforms Max (67.9) and Gemini-3.1 Pro (63.5).
in computer vision criteria that require a localized interface concept such as ScreenSpot Promodel shot 79.0GPT-5.4 (high) is significantly ahead of older industry benchmarks such as 67.4 and Claude-Opus-4.6 with 49.5. Agent Evaluation Metrics (Selected Quantities)
Why should businesses consider Qwen3.7-Plus?
When analyzing Qwen3.7-Plus for an enterprise architect, the main question is clear: What does it replace in our current tech stack?
The model is designed to directly replace first-order boundary models (such as GPT-5 level or Clod-Max level models) within high-frequency developer workflows, robotic process automation (RPA), and data engineering pipelines.
Instead of deploying an expensive, general-purpose flagship model to manage repetitive system operations, technical teams can redirect these tasks to Qwen3.7-Plus. It interprets the visual interface, executes commands, and generates code simultaneously.
Alibaba has structured its API delivery to align with existing open source and proprietary enterprise frameworks. The endpoints are fully OpenAI compliant, meaning that changing existing dependencies requires minimal infrastructure tuning. For groups using autonomous terminal frameworks, integration is supported natively in many environments.
Engineers can run Qwen3.7-Plus directly through local terminal installations by changing the target environment.
From a pure cost perspective, running an agent framework that constantly references massive code repositories or visual build histories can quickly become cost prohibitive.
Alibaba addresses this by exposing granular caching price points.
Standard input processing is $0.40 per million tokens, but if the agent is reading from an explicitly created cache (such as a massive base store that remains static for hundreds of automated cycles or a set of standard corporate UIs), the cost for subsequent reads drops dramatically to $0.04 per 1 million tokens.
This level makes high-frequency, multi-cycle agent iterations economically practical at enterprise scale.
No open source license or open weights raise the question of compatibility for enterprises
When evaluating any model in the Gwen ecosystem, a key concern for legal and security teams is the licensing framework and operational boundary of the data pipeline.
While previous iterations of the Qwen family gained significant enterprise traction due to Apache 2.0 or full open source weighting under customized open-use licenses, Qwen3.7-Plus is delivered as a strictly managed, commercial cloud API via Alibaba Cloud Model Studio. This distinction has particular implications for enterprise risk management:
-
No Local Weight Deployment: Organizations cannot download, sandbox, or deploy Qwen3.7-Plus weights locally in fully air-gapped internal data centers. All data validation, visual processing, and execution calls must go through Alibaba Cloud’s international endpoints (such as the Singapore example highlighted in the developer documentation).
-
Compliance and Sovereignty: Because the model requires cloud-based inference, companies operating under strict sovereign data boundaries (such as healthcare facilities or defense contractors subject to local HIPAA/GDPR restrictions) must clearly evaluate whether external API routing is consistent with their specific data residency obligations.
-
Reducing managed risks: Conversely, a managed API structure removes the internal infrastructure burden of provisioning, optimizing, and maintaining multi-GPU clusters (such as dedicated Nvidia H100 arrays) to simply host the internal agent network.
Still, Qwen3.7-Plus offers high intelligence among low-cost methods
Developer communities and early adoption of tech venture capital highlight the changing economics of agent deployment.
Recognized industry voice and Web3 venture capitalist @Boxmining emphasizing the strategic cost advantage:
"The fact that the Qwen 3.7 Plus is 40% cheaper than the Max changes the conversation. If the output is close enough for most coding and more powerful for visual workflows, do you really need a Mac every day, or just for heavy terminal work?"
This perspective is in line with the current trend of optimizing enterprise operating budgets: moving from raw, unrestricted calculations to automating targeted tasks. At the same time, specialized researchers deep in the ecosystem point out that this is not just an incremental optimization of text generation.
Duncie Lu, A research intern at Alibaba Qwen noted:
"It shows clear gains over Qwen3.6-Plus in computing capabilities, with stronger generalization from common desktop tasks to professional workflows such as data engineering and scientific research."
Finally, for enterprise buyers deciding on their next infrastructure roadmap, Qwen3.7-Plus provides a practical alternative. If your organization’s primary goal is for independent, visually capable autonomous software loops to interact directly with developer environments and cloud consoles—without breaking your budget to deliver results—the model provides a compelling reason to shift implementation away from more expensive frontier alternatives.





