OrcaRouter Adds Support for Next-Gen AI Model 'MiniMax M3' API: 15.6x Faster Long-Context Processing and 1M Token Support Accelerate Enterprise AI Adoption

June 2, 2026

FlashLabs Inc. launched support for MiniMax's next-gen AI model 'MiniMax M3' on its LLM routing gateway 'OrcaRouter' on June 1, 2026. Featuring MSA technology, it enables 1M token context processing with 15.6x faster speeds, supporting advanced enterprise AI use cases.

techNQ 53/100出典：PR Times

📋 Article Processing Timeline

📰 Published: June 2, 2026 at 01:50
🔍 Collected: June 1, 2026 at 17:05
🤖 AI Analyzed: June 1, 2026 at 17:10 (4 min after Collected)

FlashLabs Inc. (HQ: Chiyoda-ku, Tokyo; CEO: Yoichi Hosoi) announced the launch of MiniMax's next-generation AI model, 'MiniMax M3,' on the LLM routing gateway 'OrcaRouter,' provided by partner Continuum AI, effective June 1, 2026. MiniMax M3 utilizes the proprietary 'MiniMax Sparse Attention (MSA)' technology, achieving a context window of up to 1 million tokens (minimum guaranteed 512K). It achieves a 15.6x speed increase compared to previous models, significantly enhancing performance in agent workflows and coding assistance.

Background and Objectives
As enterprise AI adoption grows, there is a surge in demand for tasks requiring ultra-long context, such as large-scale document processing, full code-base analysis, and long-running agent execution. Previously, context window limitations forced document splitting, leading to slower processing and increased costs.

In the enterprise sector, there is an increasing need for 1 million-token scale context processing for tasks like legal document analysis, large-scale code refactoring, and cross-document information extraction. High-speed processing of ultra-long context is essential for AI agents to maintain long-term reasoning processes.

FlashLabs provides access to over 200 AI models via OrcaRouter. The addition of the MiniMax M3 API enables faster and more cost-effective solutions for enterprise use cases requiring ultra-long context processing.

Overview of MiniMax M3
Price: Available via OrcaRouter (0% token markup). Please check the official OrcaRouter website for detailed pricing.
Key Features: Ultra-long context processing (up to 1M tokens), Sparse Attention technology (MSA), advanced coding performance, agent workflow optimization, and native multimodal support.

Value to Enterprises
1. Efficiency in Large-Scale Document Processing: Process hundreds of pages at once without splitting.
2. Full Code-Base Analysis and Refactoring: Load tens of thousands of lines of code at once for dependency analysis and bug detection.
3. Long-Running AI Agents: Maintain 1 million tokens of context for hours of agent execution.

Technical Innovation: Sparse Attention Technology
The core innovation of MiniMax M3 is the 'MiniMax Sparse Attention (MSA)' technology. By significantly reducing computational complexity, it achieves 9.7x faster prefill speeds and 15.6x faster decoding speeds compared to the MiniMax M2.7, while reducing inference costs by approximately 1/20th.

Synergy with OrcaRouter
OrcaRouter automatically routes prompts to the optimal AI model based on difficulty. The addition of MiniMax M3 allows for optimization across routine tasks, ultra-long context processing, and complex reasoning, potentially reducing LLM expenditure by approximately 40%.

Guardrails and Security
OrcaRouter integrates 8 guardrail functions, including PII Shield, Secrets & API Keys protection, Prompt Injection defense, and Brand Safety, to strengthen control in enterprise production environments.

Future Outlook
FlashLabs plans to continue providing the latest AI models rapidly and enhancing features required in the enterprise sector, such as ultra-long context processing and multimodal support.

FAQ

What is the benefit for global enterprises using OrcaRouter?

It provides a unified gateway to access over 200 AI models with built-in security guardrails and cost-efficient routing.

Back to Newsroom (33)