
Canadian AI lab Cohere by has made waves lately German AI startup Aleph announces merger with Alphabut now there’s more for enterprise builders around the world: today’s firm co-founded by a former Google employee and "You Need Attention" co-author Aidan Gomez opened A+ commanda highly optimized, 218 billion-parameter language model designed specifically for complex reasoning, multimodal document processing, and agent workflows.
The most important aspect of the release is not only the capabilities of the model; is its accessibility.
By releasing the weights of the model popular AI code sharing repository Hugging Face under a highly permissive Apache 2.0 open source license – according to, a first for the company A post by Gomez, now CEO of Cohere, on X — Cohere makes a calculated bet "sovereign AI"—is the thesis that enterprises, governments, and developers have the ability to control, manage, and adapt boundary-level AI within their own secure environments without sacrificing performance.
Sparse architecture with extreme quantization
At the architectural level, the Team A+ represents a major evolution from Cohere’s previous compact models. This is a Decoder-only Sparse Specialist Mixture (MN) Transformer.
Although the model has a relatively modest 218 billion total parameters, fewer—only 25 billion—are active during any given generation step. Compared to US-specific giants such as OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, it has a lighter footprint and requires fewer computational resources to render (serve the model to end users or agents in a production environment). estimated by third-party observers with trillions of parameters.
This sparse architecture is key to the model’s efficiency. Simply put, the TN model redirects incoming requests to a specific request only "expert" neural networks are best suited to handle them, leaving the rest of the model inactive.
This is a familiar arrangement and is followed by most leading LLMs today, allowing models to retain the large knowledge base and nuanced reasoning capabilities of a giant, but with the faster speed of a smaller model and reduced computational and energy requirements, as only a fraction of the parameters are activated at any time.
But where Cohere goes the extra mile for A+ Command is that it’s focused heavily on hardware efficiency through quantization—a process that compresses the memory footprint by reducing the memory of the model. precision from its settings.
Command A+ is available in 16-bit (BF16), 8-bit (FP8) and highly compressed 4-bit (W4A4) formats.
W4A4 quantization is the technical focus of this release. Typically, reasoning models suffer greatly "quantization tax," where model compression leads to visible regressions in solving complex problems.
Cohere reduced this by quantizing only the TN specialists to 4 bits, while maintain critical attention paths with full precision, It is completed with a technique called Quantization-Aware Distillation.
Conclusion a almost lossless compression this allows this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.
Speed increases are equally noticeable. According to performance data released by the company, W4A4 quantization at low parallelism reaches 375 tokens per second (TOPS), which means up to 63% increase in output speed and up to 17% reduction in latency compared to the previous A.
Additionally, Cohere has overhauled the model’s tokenizer. Tokenizers break the text into fragments that are processed by artificial intelligence models. The new tokenizer is highly optimized for global enterprise use and features local support for 48 languages.
It is more important dramatically improves tokenization efficiency for non-European languages, Reduced the number of tokens required to generate responses in Arabic by 20%, Japanese by 18%, and Korean by 16%. This lowers direct transaction costs for global, multilingual or non-English placements, as the resulting costs are calculated on a per-engage basis.
High performance in agent workflows and math, specialized areas
While raw speed and size dictate placement, a model’s usefulness is determined by its product capabilities. Team A+ is specially built "agent" tasks — workflows in which artificial intelligence works autonomously or semi-autonomously, uses external tools, queries databases and synthesizes information in several steps.
Compared to the previous generation, the benchmark jumps are drastic.
In 𝜏²-Bench Telecom, which tests complex reasoning, the model improved from 37% to 85%. On Terminal-Bench Hard, which measures agent coding performance, it went from 3% to 25%. Scored 90% out of 57% on AIME 25 in Advanced Math.
The Command punches above its weight in pure reasoning and math (25B active settings), competing directly with larger models like the DeepSeek V4 Pro in A+ math performance. However, for deep agent coding and overall large-scale intelligence index, it currently lags behind the latest generations of Chinese open source competitors. DeepSeek, Z.ai (GLM)and MiniMax.
That said, comparing them directly ignores Cohere’s core value proposition: hardware efficiency.
In addition to assessments, Command A+ provides deep integrations for enterprise trust and verification. The model supports the use of the conversational tool through standard conversation templates and allows developers to seamlessly connect it to internal APIs, search engines, or SQL databases.
Basically, Command A+ has a native quote generation feature. When Team A+ receives data from an external tool, it doesn’t just synthesize the answer; makes clear "grounding intervals." Using custom tags embedded in the output, the model directly links each factual claim it makes to a specific source document or database row took the information from him.
For highly regulated industries like finance, healthcare or law, this traceability is the difference between an interesting prototype and a production-ready application. If the user requests a daily sales report, the model will extract the total sales amount and clearly display the result of a database query that provides this figure, minimizing the risk of undetected hallucinations.
In addition, Command A+ is fully multimodal, capable of processing both text and images natively within its massive 128K input context window, making it highly efficient for processing complex documents such as analyzing scanned invoices, charts or technical manuals.
The first fully Apache 2.0 licensed Cohere AI model
In the current AI landscape, "open source" has become a full term. Many leading AI companies release their model weights under restrictive commercial licenses or fair use policies that expressly prohibit large enterprises from using the models for commercial purposes or prohibiting the use of the models in developing competing AI systems.
Indeed, Cohere’s previous models, incl Commander R and R+ commandReleased under a CC-BY-NC 4.0 (Creative Commons NonCommercial) license. While their model weights are open for researchers and developers to download, run, and evaluate, commercial use is strictly prohibited without obtaining a separate enterprise license from Cohere or going through its application programming interface (API), similar to the arrangement many enterprises use to access OpenAI, Anthropic, Google, and other leading AI models.
Cohere changed its approach by releasing Team A+ under the Apache 2.0 license. This is a critical distinction for the developer community. Apache 2.0 is a true, OSI-approved open source license. It allows anyone from independent developers to Fortune 500 corporations to use, modify, distribute, and commercialize the model without paying licensing fees or adhering to restrictive non-compete clauses.
as Gomez wrote in Xthe decision was supported by Cohere co-founder Nick Frost, who posted a two-minute long review. "the best model we’ve ever released."
For an enterprise, this license means complete independence of the seller. A company can download Command A+ weights, adapt them to highly classified internal data, and host them on their own private servers or air-gapped networks. They are not subject to Cohere infrastructure, price changes, or API runtimes. This is the ultimate realization of sovereign AI.
The release was met with immediate traction in the AI developer ecosystem, driven by day-one integration with major open-source extraction frameworks such as Hugging Face and vLLM.
What’s next?
The release of Command A+ marks the maturation of the open source AI ecosystem. By combining edge-level reasoning, the use of powerful agent tools, and multimodal capabilities with an architecture specifically designed for hardware efficiency, Cohere is changing computing for enterprise AI adoption.
The demand for massive, centralized computing clusters has long been a bottleneck for companies prioritizing data privacy and cost control. By democratizing access to a model of this caliber under a true open source license, Cohere has provided the enterprise market with exactly what it wants: the power of the cloud that can run safely in a server room down the hall.





