
Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2753 billion parameter open-weighted large language model (LLM) specially designed to dominate "long horizon" autonomous coding and engineering tasks.
Available immediately Hugging Facethe Z.ai APIand with more than 20 third-party coding environments, the model has a highly stable 1 million token context window, along with enterprise subscription levels starting at just $12.60 per month.
In great news for cost- and security-conscious businesses, z.ai has released GLM-5.2’s key weights as unlimited. MIT open source licenseallows enterprises to freely download a model from Hugging Face, customize or refine it to their liking, and potentially run it through local or virtual machines with only computing and electricity costs.
It’s an increasingly attractive option for businesses as most modern American ownership models face an uncertain and potentially stalled regulatory future. The Trump Administration’s export control directive last week banned foreign nationals from using Anthropic’s new Claude Fable 5 model. (this company responded by taking said models completely offline all of them users).
For enterprise technical decision makers, z.ai’s GLM-5.2 provides a highly capable way to deploy frontier-level AI locally, completely bypassing geofencing and commercial restrictions.
IndexShare reuses one indexer for every four sparse focus layers, reducing computational needs
Under the hood, GLM-5.2 works with 753 billion parameters and features a core architecture optimization called "IndexShare".
In standard massive language models, recomputing attention mechanisms on long documents is computationally prohibitive. IndexShare solves this by reusing the same indexer in all four sparse focus layers.
With a maximum context length of 1 million tokens, this single innovation reduces computational FLOPs per token by a massive factor of 2.9.
The model also features an improved Multi-Token Prediction (MTP) layer for speculative decoding, which increases the received token length by up to 20% during inference.
In addition, Z.ai has a flexible, selectable application "Modes of thought". Users can switch between the model’s reasoning efforts "Max," designed to push the boundaries of logical problem solving or "high," strikes a careful balance between high-level performance and latency-sensitive token efficiency.
State-of-the-art benchmarks for an open model and proprietary leaders in some categories overlap
In industry-standard third-party benchmark tests, the GLM-5.2 outperforms even the most open-source flagship models. DeepSeek v4 and scores close to or above their closed-weight competitors OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.8.
The model particularly shines in agent tooling and long-term software engineering tasks:
-
SWE-bench Pro: The GLM-5.2 scored 62.1, soundly beating the GPT-5.5 (58.6) and its predecessor, the GLM-5.1 (58.4).
-
FrontierSWE (Dominance): GLM-5.2, designed to test long-term task performance, scored 74.4%, beating GPT-5.5 (72.6%) and equaling Claude Opus 4.8 (75.1%).
-
MCP-Atlas: In this usability evaluation, the GLM-5.2 outperformed the GPT-5.5 (75.3) with a score of 77.0, and performed just shy of the Claude Opus 4.8 (77.8).
-
Humanity’s Final Exam (with tools): When outfitted, the GLM-5.2 reached 54.7, ahead of the GPT-5.5 (52.2) and closely behind the Claude Opus 4.8 (57.9).
-
PostTrainBench & SWE-Marathon: On extended, multi-hour engineering workloads, GLM-5.2 consistently outperformed GPT-5.5, scoring 34.3% on PostTrainBench vs. 25.0% of GPT-5.5 and 13.0% vs. 12.0% of GPT-5.5 on SWE-Marathon.
While the GLM-5.2 slightly edges out the Claude Opus 4.8 and GPT-5.5 in raw Terminal-Bench 2.1 scores (81.0 vs. 85.0 and 84.0, respectively), it lags far behind Google’s Gemini 3.1 Pro (74.0).
Apart from the traditional coding metrics, GLM-5.2 took an impressive first place in the crowd design task benchmark Design Arenaeven the aforementioned state-of-the-art Claude Fable 5 is one ELO score 1360.
In addition, the newly selected effect of Z.ai "modes of thought" is evident in the data: under "Max" effort level, GLM-5.2 pushes peak intelligence, but uses about 85k output tokens per task. link to "High" effort determination sacrifices only a few points in performance while effectively halving the required token output, providing an important optimization lever for latency-sensitive applications.
Available via Coding Plans and API
To run the model, Z.ai GLM Coding Planit targets developer workflows rather than simple chat interfaces.
The plan offers out-of-the-box support for third-party US and global agent coding tools and tools, including Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory, among others. Coding Plan pricing levels (calculated annually) are highly competitive:
-
Lite: $12.60 per month ($151.20 per year starting in year 2), focused on light iteration on small repositories.
-
Pros: $50.40 per month for daily development on medium-sized repositories, offering 5x the usage fee of the Lite plan.
-
Max: $112.00 per month for heavy workloads, offers 20x Lite usage during peak hours and dedicated resources.
For enterprise developers integrating the raw model into their applications, Z.ai’s API pricing significantly undercuts its Western competitors by matching the exact pricing of the previous GLM-5.1 generation.
GLM-5.2 is API access priced at $1.40 per million entry tokens and $4.40 per million exit tokensmaking it a mid-priced model on a global scale, but approx
VentureBeat Frontier AI Model API Evaluation Snapshot
Sorted by total cost (input + output) from least to most expensive. The price shown is the standard paid price for 1 million tokens.
|
Model |
Introduction |
Exit |
Total Cost |
Source |
|
MiMo-V2.5 Flash |
$0.10 |
$0.30 |
$0.40 |
|
|
deepseek-v4-flash |
$0.14 |
$0.28 |
$0.42 |
|
|
deepseek-v4-pro |
$0.435 |
$0.87 |
$1,305 |
|
|
MiniMax-M3 |
$0.30 |
$1.20 |
$1.50 |
|
|
Gemini 3.1 Flash-Lite |
$0.25 |
$1.50 |
$1.75 |
|
|
Qwen3.7-Plus |
$0.40 |
$1.60 |
$2.00 |
|
|
MiMo V2.5 |
$0.40 |
$2.00 |
$2.40 |
|
|
Grok 4.3 (bottom context) |
$1.25 |
$2.50 |
$3.75 |
|
|
MiMo-V2.5 Pro (≤256K) |
$1.00 |
$3.00 |
$4.00 |
|
|
Kimi-K2.6 |
$0.95 |
$4.00 |
$4.95 |
|
|
GLM-5.2 |
$1.40 |
$4.40 |
$5.80 |
|
|
Grok 4.3 (high context) |
$2.50 |
$5.00 |
$7.50 |
|
|
MiMo-V2.5 Pro (>256K) |
$2.00 |
$6.00 |
$8.00 |
|
|
Qwen3.7-Max |
$2.50 |
$7.50 |
$10.00 |
|
|
Gemini 3.5 Flash |
$1.50 |
$9.00 |
$10.50 |
|
|
Gemini 3.1 Pro Preview (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.4 |
$2.50 |
$15.00 |
$17.50 |
|
|
Gemini 3.1 Pro Preview (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Closing the case 4.8 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.5 |
$5.00 |
$30.00 |
$35.00 |
|
|
Claude Tale 5 / Claude Myth 5 |
$10.00 |
$50.00 |
$60.00 |
To further optimize costs for long-context workloads, Z.ai offers a limited-time offer for free cached access memory, as well as a cached access rate of only $0.26 per million instances.
The stark contrast between open weight innovators and proprietary Western labs has not gone unnoticed by the developer community.
In X, a prolific AI observer Lisan al Ghaib (@scaling01) that he claimed "border labs are completely cheating you with their API prices".
The post noted that while public open models such as GLM-5.2 with 744 billion parameters charge $4.40 per million output tokens and DeepSeek-V4-Pro (1.6 trillion parameters) only $0.87, proprietary models command huge premiums: Anthropic’s Sonnet and Opus charge $4.60. $25.00 respectively, while OpenAI’s GPT-5.5 costs $30.00 per output.
Emphasizing that open model developers work profitably without relying on the latest "trendy blackwell chips," commenter suggested that lead proprietary laboratories "probably in the 90%+ margin at this point".
The beauty of the unmodified MIT License for enterprise use
The most disruptive aspect of the GLM-5.2 release is its license. Z.ai built it as one, releasing the model’s weights under the MIT open source license "Clear Open" system.
The company’s technical documentation clearly states that this license is guaranteed "there are no regional restrictions" and enables "unlimited technical access".
For enterprise technology leaders, an MIT license means software can be used, modified, and commercialized without paying royalties or subject to restrictive regulations. "fair use" management policies common to dual-use licenses.
This enables engineering teams to deploy edge-level AI on their own sovereign infrastructure and completely eliminates vendor lock-in.
Warm reception among AI developers and tool makers
Developer response to the release was immediate and overwhelmingly positive.
The team is behind KiloCode confirmed first day integration, publish on X: "GLM-5.2 works with Kilo Code on day one. 1M context window and Maximum effort mode are both live. Point your configuration at it and go!".
An open source coding environment The Cline IDE reflected this idea in Xnoting the economic advantage: "The GLM-5.2 is the first open weight model to pass 80% on Terminal-Bench, beating all other open models available. It also outperforms the Gemini, making it a borderline-level model for a fraction of the cost. The open weights are back. This model is a game changer. Now available at Cline!".
Likewise, a rival open-source coding desktop agent Owner of AI also noted in X tested the model’s new capabilities in complex agent workflows: "set a real long-horizon task: Research 30 companies across 6 sectors of the AI infrastructure stack, structure it in JSON, then produce an interactive HTML report… here goes 5.2: -> plans…".





