FlashLabs Inc. (Headquarters: Chiyoda-ku, Tokyo; Representative Director: Yoichi Hosoi; hereinafter 'FlashLabs') announced that its AI routing gateway 'OrcaRouter,' exclusively distributed in Japan, ranked 2nd on the public leaderboard of 'RouterArena,' an open platform for evaluating LLM routers, as of the paper submission date (May 20, 2026). As of the submission on May 20, 2026, OrcaRouter recorded an accuracy of 75.54% and an Arena score of 72.08.
Background
As AI adoption becomes mainstream, companies are forced to select the optimal model from over 200 LLM models. However, using a single high-performance model for all processes means continuously paying high costs even for routine tasks, causing AI costs to skyrocket. On the other hand, manual model selection leads to rule obsolescence with each new model release, leaving maintenance burdens on development teams.
To solve this problem, OrcaRouter was developed as an adaptive inference gateway combining 'per-prompt difficulty assessment' and 'automatic routing to the optimal model.' This achievement of 2nd place on RouterArena objectively proves the effectiveness of this technical approach.
Research Paper Overview
Release Date: Wednesday, May 20, 2026
Paper Title: OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline–Online Learning (arXiv:2605.30736)
OrcaRouter-Adaptive Results:
Ranking: 2nd (as of submission on May 20, 2026)
Accuracy: 75.54% at a cost of $1.00 USD per 1,000 queries
Arena Score (integrating accuracy and cost): 72.08
Comparison Targets: GPT-5, Microsoft Azure routers, and other major LLM routers
The Arena score is a comprehensive evaluation metric considering both accuracy and monetary cost, proving that OrcaRouter achieves extremely low-cost operation while maintaining high accuracy.
*RouterArena is the first open platform for comprehensive evaluation and comparison of LLM routers. It is a continuously evaluated live leaderboard where new routers are added and updated periodically; rankings may fluctuate over time. Please check the official leaderboard for the latest rankings.
Technical Features of OrcaRouter
Key Features:
1. Embedding-Enhanced LinUCB Bandit Algorithm:
OrcaRouter formulates LLM routing as a contextual bandit problem and employs an Embedding-enhanced LinUCB (Linear Upper Confidence Bound) algorithm. This approach represents the affinity between prompts and LLMs in a shared embedding space, enabling adaptive model selection combining offline and online learning.
2. Hybrid Offline-Online Learning:
By integrating offline human evaluation data with online bandit feedback, it provides highly accurate routing from the initial stage while continuously learning from real traffic. This allows adaptation to real-world environments while avoiding overfitting.
3. Smart Warm-up Function:
Even for new models or unknown prompt patterns, the smart warm-up function optimizes the balance between exploration and exploitation. It gradually optimizes while adapting to real traffic and maintaining quality.
4. Integration of 200+ Models into One Endpoint:
API calls to over 200 LLMs from more than 15 companies, including OpenAI, Anthropic, Google, xAI, Meta, Mistral, DeepSeek, Alibaba, Moonshot, and ByteDance, are integrated into a single endpoint, a single API key, and a single invoice.
Supported Environments/URLs:
OrcaRouter Official Website
OrcaRouter Official Documentation
Example Available Models
Anthropic Claude Opus 4.8 API
OpenAI GPT 5.5 API
Gemini 3.5 Flash
MiniMax M3
DeepSeek V4 Pro API
Qwen3.7 Max
Value for Enterprises
1. Achieving Both High Accuracy and Low Cost:
As demonstrated by the 2nd place on RouterArena, OrcaRouter achieves extremely low-cost routing while maintaining an accuracy of 75.54%. Compared to GPT-5 and Azure routers, it offers equivalent or superior performance at a significantly lower cost.
2. Continuous Adaptation to Real Traffic:
The Embedding-enhanced LinUCB bandit algorithm learns from actual traffic patterns and continuously optimizes while avoiding overfitting. It automatically adapts to new model releases and changes in business patterns.
3. Improved Developer Experience:
With an OpenAI-compatible API, it can be introduced by changing just one line of existing code. Simply rewriting the Base URL and API key allows existing OpenAI SDK code to work without modification. No redesign, procurement cycles, or rewrites are necessary.
Technical Background: LLM Routing as a Contextual Bandit Problem
Traditional LLM routing relied on predefined rules or static classifiers. However, this method could not adapt to new model releases or changes in business patterns, increasing maintenance burdens.
OrcaRouter solved this problem by formulating LLM routing as a 'contextual bandit problem.' A contextual bandit is a type of reinforcement learning problem where the optimal model (arm) is selected for each prompt (context). The LinUCB (Linear Upper Confidence Bound) algorithm balances exploration and exploitation to select the optimal model in real-time.
Furthermore, by leveraging embedding representations of prompts and LLMs, OrcaRouter efficiently learns from similar prompts and achieves highly accurate routing even for unknown prompts.
Future Developments
FlashLabs will accelerate the deployment of OrcaRouter in the Japanese market, supporting companies in solving the challenge of 'reducing costs while maintaining quality' in their AI utilization.
Through continuous evaluation on RouterArena, it will maintain its technical superiority while pursuing continuous improvement using feedback from real traffic.
Representative Comment
Yoichi Hosoi, Representative Director of FlashLabs Inc.
'We view OrcaRouter's achievement of 2nd place on RouterArena as a testament to our technical approach.'
FACT BOX
- Source: PR TIMES
- Category: Survey
- Organizations: OpenAI / Anthropic / Google
- Products / services: OrcaRouter