FlashLabs Inc. (Headquarters: Chiyoda City, Tokyo; CEO: Koichi Hosoi; hereinafter 'FlashLabs'), an applied AI research institute, announces the launch of 'Model Fusion,' a new feature for its AI inference gateway 'OrcaRouter,' which enables parallel execution and integration of intelligence from multiple large language models (LLMs) in the Japanese market.
Background and Objective: Moving Beyond Single-Model Dependency
The current generative AI market tends to rely heavily on specific high-performance 'frontier models' for inference quality. However, every single model has its strengths and weaknesses, and the most advanced models come with skyrocketing API costs and increased latency (delay).
To overcome these 'limitations of single models,' FlashLabs is introducing 'Model Fusion' to the Japanese market—a technology that runs multiple LLMs in parallel and evaluates and integrates their responses in real time. This makes it possible to achieve inference quality comparable to or even surpassing ultra-high-performance models like 'Fable 5' using combinations of affordable models, all with overwhelming cost efficiency.
Overview of 'Model Fusion'
What is Model Fusion?: For a single prompt, multiple different LLMs (e.g., Claude, GPT, Gemini, Llama, etc.) are executed simultaneously. A 'judge (Arbiter)' evaluates the multiple generated responses, selecting the best one or synthesizing insights from multiple answers into a single superior response.
Key Features:
Parallel Fan-Out + Arbiter: The same prompt is sent in parallel to multiple models, and the arbiter returns the optimal solution.
Five Arbiter Strategies: best_of_n / synthesize / majority / first / tests_pass (see table below)
Selective Fan-Out (Difficulty Gate): The panel is activated only for prompts involving code, tool usage, or high difficulty (difficulty level 0.3 or above); routine tasks are routed to cheaper single models. No panel cost is incurred for light inputs like 'hi'.
Undiluted Consensus: Instead of averaging or diluting multiple responses, the strongest single response is returned verbatim.
Custom-Built via Routing DSL: Without being bound to presets, users can build and own custom panels using approximately 12 lines of YAML.
Core Technology 'Routing DSL': These complex fusion logics can be freely defined and customized by developers using the newly developed 'Routing DSL' in YAML format with just a few lines of code.
OrcaRouter Fusion: https://www.orcarouter.ai/ja/models/orcarouter/fusion
Available Model Examples
OrcaRouter Fable 5 Fusion API: (Model details here)
Anthropic Claude Opus 4.8 API
OpenAI GPT 5.5 API
Gemini 3.5 FlashAPI
MiniMax M3 API
DeepSeek V4 Pro API
Qwen3.7 Max API
Z.AI GLM5.2 API
Documentation / Details:
Routing DSL
Technical Explanation Blog
Value for Enterprises
1. Breaking Performance Limits Through 'Intelligence Synthesis'
By enabling multiple models to function in a 'consensus-based' manner, inference accuracy unattainable by a single model can be achieved. This is particularly effective for tasks requiring fact-checking, complex reasoning, and advanced programming, delivering results that surpass standalone frontier models.
2. Overwhelming Cost-Effectiveness
Instead of calling a single expensive top-tier model, using multiple cheaper and faster models in 'Fusion' mode allows maintaining or even exceeding equivalent quality while reducing inference costs by up to 70%.
3. Ensuring Reliability and Redundancy
If a specific AI provider experiences an outage, other models within the Fusion configuration automatically take over. This ensures continuous, stable, high-quality AI output without disrupting business operations.
Executive Comment
Koichi Hosoi, CEO, FlashLabs Inc.
"The future of AI utilization will shift from the era of 'selection'—choosing which model to use—to the era of 'composition,' where the focus is on how to combine multiple intelligences. Model Fusion, which we are launching today, is precisely the core technology for this new era. By fusing machine speed with multiple intelligences, we aim to create a society where Japanese enterprises can freely harness world-class intelligence without being hindered by cost barriers."
About OrcaRouter
OrcaRouter is a next-generation AI inference gateway developed by the U.S.-based AI research institution Continuum AI and exclusively distributed in Japan by FlashLabs Inc. It integrates over 200 LLMs into a single endpoint and a single API key, automatically routing each prompt to the optimal model based on difficulty level. The newly launched Model Fusion is a feature that enables parallel consensus of multiple models on this platform. There is zero token markup fee, and integration requires only changing one line of the Base URL. Guardrails, tracing, monitoring, and evaluation functions are also provided within the same gateway.
OrcaRouter Official Website
About FlashLabs Inc.
FlashLabs is an applied AI research institute aiming to automate, and ultimately autonomize, sales and customer experience. Through its 'Human-AI Hybrid' approach, it delivers results that surpass traditional methods for enterprises.
Company Name: FlashLabs Inc.
Headquarters: Chiyoda City, Tokyo
CEO: Koichi Hosoi
FlashLabs Inc. Official Website
About Continuum AI
Continuum AI is a U.S.-based AI company that develops OrcaRouter. It provides an efficient AI utilization platform across multiple LLM providers through adaptive routing technology.
Continuum AI Official Website
Inquiries
Marketing Department, FlashLabs Inc.
Contact: Koki Kobayashi
Email: koki.kobayashi@myflashcloud.com
*OrcaRouter is a trademark of Continuum AI.
*Fable 5, Claude, GPT, Gemini, and Llama are trademarks or registered trademarks of their respective companies.
FACT BOX
- Source: PR TIMES
- Category: New Product
- Organizations: Continuum AI / Anthropic / OpenAI
- Products / services: OrcaRouter / Model Fusion