Blog - 5 Surprising Truths About MiniMax M2.5

1. Introduction: The End of the "Premium AI" Tax

For the past eighteen months, the AI industry has functioned under a "premium tax" regime. Operating autonomous agents with the reasoning depth of Claude 4.6 or the architectural logic of GPT-5.2 has remained a high-cost endeavor, often prohibitively expensive for high-volume enterprise automation. We accepted the idea that frontier intelligence must be scarce and costly.

That paradigm just shifted. Shanghai-based MiniMax has suddenly moved the goalposts with the release of the M2.5 series. By matching or exceeding the performance of Western flagship models at a fraction of the cost—and doing so on a hyper-accelerated release cycle of just 3.5 months—MiniMax is signaling a pivot from experimental "Pro" models to a new era: the realization of high-utility, high-volume production power where intelligence is no longer a luxury good.

2. "Intelligence Too Cheap to Meter": The $1/Hour Revolution

The most disruptive element of the M2.5 release is not just its logic, but its underlying economics. MiniMax is positioning M2.5 as the first frontier model where the "meter" effectively disappears.

This price disruption is driven by a sophisticated sparse Mixture-of-Experts (MoE) architecture, consisting of 230 billion total parameters but only 10 billion active parameters per token. This allows for frontier-level reasoning with the inference profile of a much smaller model. Supporting this is "Forge," MiniMax’s proprietary agent-native reinforcement learning framework. By decoupling agent logic from the inference engine and utilizing a tree-structured merging strategy, MiniMax achieved a 40x training speedup, essentially collapsing the cost of development and operation.

"M2.5 is the first frontier model where users do not need to worry about cost, delivering on the promise of intelligence too cheap to meter. It costs just $1 to run the model continuously for an hour at a rate of 100 tokens per second."

The M2.5-Lightning variant offers a stable 100 tokens per second (TPS) for just $0.30 per million input tokens and $2.40 per million output tokens. For lower-priority tasks, the Standard variant (50 TPS) cuts that cost in half to 0.15/1.20, meaning an hour of continuous intelligence costs only $0.30.

3. The Rise of the "Architect": Planning Before Coding

MiniMax M2.5 represents a paradigm shift in how LLMs approach technical tasks. It is no longer a "script-kiddie" generating snippets; it acts as a senior software architect. During training across 200,000+ real-world environments, a proactive "Spec-writing" tendency emerged. Before generating a single line of code, the model actively defines project structure, UI design, and features.

This architectural drive results in high-tier productivity metrics:

SWE-Bench Verified:
80.2%, placing it in a statistical tie with Claude 4.6 (80.8%).
Generalization:
On specific agentic harnesses like
Droid (79.7 vs. 78.9)
and
OpenCode (76.1 vs. 75.9)
, M2.5 actually outperformed Claude 4.6.
Velocity:
The model is
37% faster
at complex task completion than its predecessor, M2.1, completing SWE-Bench evaluations in 22.8 minutes—identical to the speed of Claude Opus 4.6.

4. The King of "Multi-Turn" Interactions

The primary friction in autonomous agents is the "search loop"—the tendency for models to wander through multiple tool calls before finding a solution. M2.5 has effectively solved this through RL optimization that incentivizes efficient reasoning trajectories.

On the Berkeley Function Calling Leaderboard (BFCL), which measures multi-turn tool orchestration, M2.5 scored 76.8, significantly outpacing Claude 4.6 (63.3) and Gemini 3 Pro (61.0). This represents a 20% reduction in search rounds, meaning the model finds the correct path with fewer wasted tokens and lower latency.

Core Agentic Benchmarks:

BFCL Multi-Turn:
76.8% (Industry Leader)
BrowseComp (w/ context):
76.3%
Multi-SWE-Bench:
51.3% (Leads Claude Opus 4.6 at 50.3%)

5. Native Fluency in the "Office Universe"

While Western labs focus on general-purpose chat, MiniMax has optimized M2.5 for the "Office Universe." By collaborating with finance, legal, and social science experts to build industry-specific "tacit knowledge" into the pipeline, the model handles professional deliverables natively.

"By working closely with finance, legal, and social science experts and incorporating their industry knowledge into the learning pipeline, M2.5 has achieved significant improvements in high-value business scenarios, such as financial modeling in Word, PowerPoint, and Excel."

This focus on deliverable-ready outputs is reflected in its GDPval-MM win rate of 59.0% against other leading models and its superior performance on MEWC (Excel esports) problems. It treats a spreadsheet or a slide deck not as a text task, but as a complex environment to be navigated and mastered.

6. The Catch: The "Omniscience" Trade-Off

In the pursuit of agentic autonomy, MiniMax appears to have made a calculated trade-off. While M2.5 is a superior execution engine, its factual grounding has regressed. According to Artificial Analysis, the model's hallucination rate rose to 88% (up from 67% in M2.1), even as its accuracy slightly improved to 25%.

This highlights a fundamental tension in agentic RL: by incentivizing "drive"—the ability to relentlessly pursue a task completion goal—the model becomes more prone to over-confidently asserting facts it doesn't actually know. In a strategic context, M2.5 is an elite execution engine, but it is not an oracle. For high-stakes information retrieval, robust verification layers remain a non-negotiable requirement.

7. Conclusion: A New Baseline for the Agent Economy

The release of M2, M2.1, and M2.5 in a mere 3.5 months signals that the "Western Moat" is no longer protected by iteration velocity. MiniMax has demonstrated that by using agent-native RL frameworks like Forge and sparse MoE architectures, a company can deliver frontier-level intelligence at 1/10th the price of current market leaders.

As the marginal cost of intelligence crashes toward zero, the competitive advantage shifts from the model itself to the orchestration of those models into the economy. If $1 per hour can buy the equivalent of a senior software architect or a financial analyst, the "Premium AI Tax" hasn't just been lowered—it has been abolished.

Can Western labs maintain a premium moat if the cost of frontier intelligence has already reached the floor? Link: MiniMax Offical Website