OrcaRouter Enters Japan: Optimizing 200+ LLMs with 'Adaptive Routing' to Reduce AI Inference Costs by up to 70%

May 21, 2026

FlashLabs has announced an exclusive distribution partnership with Continuum AI to launch 'OrcaRouter', an adaptive inference gateway that integrates and optimizes over 200 LLMs under a single API.

新製品NQ 46/100出典：PR Times

📋 Article Processing Timeline

📰 Published: May 21, 2026 at 02:30
🔍 Collected: May 20, 2026 at 18:31
🤖 AI Analyzed: May 20, 2026 at 18:40 (8 min after Collected)

FlashLabs Inc. (Headquarters: Chiyoda-ku, Tokyo; CEO: Yoichi Hosoi), a developer of next-generation AI infrastructure, has announced an exclusive distribution partnership with Continuum AI (Headquarters: USA), a research institute specializing in AI infrastructure. The two companies are launching 'OrcaRouter' in the Japanese market, an adaptive inference gateway that allows users to utilize over 200 LLMs through a single API, completes migration in just five minutes, and reduces production AI costs by up to 70% while maintaining flag-ship model quality.

## Background and Challenges
Founder of FlashLabs, Mr. Shi, points out that 'Enterprises running AI in production today are almost certainly paying double what they should.' Many existing AI gateways merely act as 'pipes,' forwarding calls to user-selected models without considering prompt complexity, leading to excessive costs. Furthermore, Japanese enterprises face specific hurdles such as complex procurement processes due to multiple LLM provider contracts, exchange rate risks from dollar-denominated billing, and a lack of cost optimization tools.

## Technical Innovation of OrcaRouter
OrcaRouter centralizes API calls to over 200 LLMs from 15+ companies, including OpenAI, Anthropic, Google, xAI, Meta, Mistral, DeepSeek, Alibaba, Moonshot, and ByteDance. Its core 'Adaptive Routing' engine functions via the following mechanisms:
1. Pre-assessment by lightweight classification models: It predicts and selects the most cost-effective LLM capable of processing a request to a specified quality standard in milliseconds.
2. Continuous learning system: Quality signals such as performance scores and user feedback are incorporated into routing policies weekly, allowing the system to improve automatically.
3. Real-time monitoring of market changes: It constantly tracks provider prices, latency, error rates, and new model releases to switch routing destinations instantly when a cheaper alternative is available.

## Benefits and Features
Compared to fixed-model operations, OrcaRouter has shown 47%–71% savings in inference spending without measurable quality degradation. It employs a transparent pricing structure with zero markup fees. Migration requires only updating the Base URL and API key, ensuring existing OpenAI SDK-based code runs without modification. For the Japanese market, it provides essential enterprise features including JPY billing, a fully localized management console and support, and domestic data routing.

FlashLabs founder Mr. Shi emphasizes, 'Adaptive routing is the only way to solve the cost problem. We are seeing cumulative savings of 60–70% in real Japanese enterprise workloads.' Continuum AI stated, 'We are delivering the predictability and transparency that Japanese customers demand through our partnership with FlashLabs.' The service is available starting today.

FAQ

Why does OrcaRouter reduce costs?

A lightweight router model assesses prompt complexity and dynamically routes to the most cost-effective model that can answer accurately each time.

Can I use the existing OpenAI SDK?

Yes. It works by updating the Base URL and API key, requiring no code changes.

Does it have features suited for Japanese companies?

Yes, including JPY billing, Japanese support, and domestic data routing to meet enterprise requirements.

Back to Newsroom (39)