New MiniMax M2.7 proprietary AI model “self-evolves” and reinforcement learning can perform 30-50% of research workflow



A Chinese AI startup in the past few years MiniMax open source licenses for boundary-level large language models (LLM) and before that high-quality AI video generation models (Hailuo).

release of MiniMax M2.7 today — with support for new proprietary LLM and third-party plugins and tools like Claude Code, Kilo Code, and OpenClaw — designed to better power AI agents — marks yet another milestone: Instead of relying solely on human-driven fine-tuning, MiniMax used M2.7 to rebuild, track, and optimize its own learning capabilities.

This move toward recursive self-improvement heralds a shift in the industry: a future where the models we use are the products of human research as much as they are the architects of progress. The model is classified as a ground-based text-only model, providing intelligence comparable to other leading systems while maintaining significantly higher cost efficiency.

However, the fact that M2.7 is now proprietary shows again that Chinese AI startups — for much of the last year, standard bearers in the frontier world of open source AImaking them attractive to enterprises globally due to their low (or no) costs and customization – changing strategy and US leaders like OpenAI, Google and Anthropic have been following more private frontier models for years.

MiniMax has become the second Chinese startup to launch a dedicated advanced LLM in recent months After z.ai with GLM-5 Turboand rumors that Alibaba’s Gwen team also moved into special development after the incident the departure of senior management and other researchers.

Technical achievement: A period of self-evolution

The defining characteristic of the MiniMax M2.7 is its role in self-creation. According to the company’s documentsprevious versions of the model were used to create a research agent harness that could manage data pipelines, training environments, and evaluation infrastructure.

By autonomously running log reading, debugging, and metric analysis, M2.7 handled 30% to 50% of its development workflow.

It’s not just automating rote tasks; the model optimized its programming performance by analyzing failure trajectories and scheduling code changes over 100 rounds or more of iterative loops.

"We intentionally trained the model to better plan and clarify requirements with the user," MiniMax Engineering President Skyler Miao explained X in the social network. "The next step is a more sophisticated user simulator to take this even further."

This capability extends to complex environments MLE Bench Lite, a series of machine learning competitions designed to test autonomous research skills.

In these tests, the M2.7 won 66.6 percent of medals, a A level of performance that is close to the current state-of-the-art set by Google’s new Gemini 3.1 and set by Anthropic’s Claude Opus 4.6.

According to MiniMax, the goal is to transition to full autonomy in model training and inference architecture without human intervention.

Performance evolution: MiniMax m2.7 and m2.5

Compared to its predecessor, M2.5Released in February 2026, the M2.7 model shows significant gains in high-stakes software engineering and professional office tasks.

While M2.5 is marked for mastering polyglot code, M2.7 is designed for real-world engineering—tasks that require causal reasoning in live production systems.

Key performance indicators include:

  • Software engineering: The M2.7 scored 56.22 percent on the SWE-Pro benchmark, matching the highest levels of global competitors such as GPT-5.3-Codex.

  • Professional office delivery: At the time of documentation, the M2.7 achieved a GDPval-AA Elo score of 1495, which the company claims is the highest among open source and accessible models.

  • Reduction of hallucinations: The model scores plus one on the AA-Omniscience Index, a big jump from the M2.5’s minus 40.

  • Degree of hallucination: The M2.7 achieves a 34 percent hallucination rate, which is lower than the 46 percent for the Claude Sonnet 4.6 and the 50 percent for the Gemini 3.1 Pro Preview.

  • System concept: In Terminal Bench 2, the model scored 57.0 percent, demonstrating a deep understanding of complex operational logic rather than generating simple code.

  • Skill Match: In the MM Claw evaluation, which tested 40 complex skills with over 2,000 tokens each, M2.7 maintained a 97 percent match rate, a significant improvement over the baseline M2.5.

  • Intelligence parity: The inference capabilities of the model are equivalent to GLM-5, but it uses 20 percent fewer output tokens to achieve similar results.

The evolution of the model is further proof 50 points in AI Indexshows an 8-point improvement over its predecessor in just one month and also ranks 8th globally in overall intelligence on benchmarking tasks in various domains.

Not all independent, third-party benchmarks show an improvement for M2.7 over M2.5: Active BridgeBencha set of tasks developed by BridgeMind, an agent AI coding startup, to test the model’s performance "vibe coding," or converting natural language into working code, M2.5 took 12th place and M2.7 took 19th place.

Access, pricing and integration

MiniMax M2.7 is a proprietary model available through the MiniMax API and MiniMax Agent creation platforms. While the core model weights for M2.7 remain closed, the company continues to contribute to the ecosystem through an open-source interactive project. Open room.

Through direct API integration and third-party provider OpenRouterMiniMax M2.7 maintains a leading price point of $0.30 per 1 million entry tokens and $1.20 per 1 million exit tokens, unchanged from the M2.5 price.

To support different usage scales and methods, MiniMax offers a structured Token Plan with different subscription levels. These plans allow users to access text, speech, video, image and music models under a single unit quota.

To further increase adoption, MiniMax launched the Refer & Earn referral program, which offers a 10 percent discount to new invitees and a 10 percent discount voucher to the referrer.

Monthly standard Token Plan prices: Standard monthly tiers are designed for entry-level developers to heavy casual users.

  • Start: $10 per month for 1500 requests per 5 hours.

  • Moreover: $20 per month for 4500 requests in 5 hours.

  • Max: $50 per month for 15,000 requests per 5 hours.

Monthly high-speed Token Plan prices: The following tiers are available for production-scale workloads that require the M2.7 high-speed variant:

  • Plus-High Speed: $40 per month for 4500 queries in 5 hours.

  • Max-High Speed: $80 per month for 15,000 queries in 5 hours.

  • Ultra High Speed: $150 per month for 30,000 queries in 5 hours.

Annual Token Plan prices: Annual subscriptions provide significant discounts for long-term commitment:

  • Standard Startup: $100 per year (saves $20).

  • Standard Plus: $200 per year (saves $40).

  • Standard Max: $500 per year (saves $100).

  • High Speed ​​Plus: $400 per year (saves $80).

  • High Speed ​​Max: $800 per year (saves $160).

  • High Speed ​​Ultra: $1,500 per year (saves $300).

A request in these plans is approximately equivalent to a call to the MiniMax M2.7, although other models in the package, such as video or high-definition speech requests, consume at a higher rate.

Official tool integrations

To ensure flawless reception, MiniMax has released the official documentation To integrate M2.7 into more than 11 major developer tools and agent plugins.

This includes widely used platforms such as Claude Code, Cursor, Trae and Zed. Other officially supported tools include OpenCode, Kilo Code, Cline, Roo Code, Droid, Grok CLI, and Codex CLI.

In addition, the model supports the Model Context Protocol, allowing it to natively use tools such as Web Search and Image Understanding for multimodal reasoning. Developers using the Anthropic SDK can easily integrate M2.7 by changing the ANTHROPIC_BASE_URL to point to the MiniMax endpoint.

When using MiniMax as a provider in tools such as OpenClaw, image understanding capabilities are automatically configured through the model’s VLM API endpoint without requiring additional setup from the user.

With its deep integration base and pioneering approach to recursive evolution, the MiniMax M2.7 represents an important step towards an AI-native future where models are as involved in their progress as the humans guiding them.

Strategic implications for enterprise decision makers

Technical decision makers should interpret the M2.7 release as evidence that agent AI has moved from theoretical prototyping to production-ready utility.

The model’s ability to reduce recovery time to less than three minutes for live production incidents by autonomously linking monitoring metrics to code repositories offers a paradigm shift for SRE and DevOps teams.

Businesses currently facing pressure to embrace AI-driven efficiency must decide whether they can settle for AI as a perfect assistant or whether they are ready to assemble teams of local agents capable of completing the project.

From a financial perspective, M2.7 represents a significant improvement in cost efficiency for high-level justification. The analysis shows that the M2.7 costs less than a third of the GLM-5 at equivalent intelligence levels.

For example, running a standard intelligence index costs $176 in M2.7, $547 for GLM-5, and $371 for Kimi K2.5. This aggressive pricing strategy places the M2.7 on the Pareto frontier of the exploration versus cost table and offers enterprise-level justification at a fraction of the market rate.

The current market is saturated with high-performance models, many of which still have slight edges in overall justification scores. But M2.7’s specific optimization for Office Suite fidelity in Excel, PPT, and Word and its high performance on the GDPval-AA benchmark make it a prime candidate for organizations focused on professional document workflows and financial modeling.

Decision makers must weigh the benefits of a general-purpose frontier model against a specialized engine like the M2.7 built to interact with complex internal scaffolding and toolkits.

Finally, the fact that it’s offered by a Chinese company (headquartered in Shanghai) and subject to the laws of that country in addition to the user’s country, and yet not available for offline or local use, could make it a tough sell for businesses operating in the US and the West — especially in highly regulated or government-facing industries.

Nevertheless, the shift towards self-evolving models means that the ROI of an AI investment will increasingly depend on the recursive payoff of the system itself.

Organizations that adopt self-improvement models can find themselves on a faster iteration curve than those that rely on static, human-only improvement. With MiniMax’s aggressive integration into the modern developer stack, the barrier to testing these autonomous workflows has been lowered significantly, putting pressure on competitors to introduce similar native agent capabilities.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *