What are the features of OrcaRouter?

Integrates over 200 LLMs, automatically routes prompts to optimal models. Integration requires only one line change.

How does Model Fusion reduce costs?

Runs multiple cheaper models in parallel instead of one expensive model, achieving up to 70% cost reduction.

Which models are supported?

Major models like Claude, GPT, Gemini, Llama, Qwen, and GLM are available via OrcaRouter.

A domain-specific language using YAML to define custom model fusion logic with flexible customization.

FlashLabs Launches 'Model Fusion' in Japan via OrcaRouter — Achieving Fable5-Level Inference Performance through Parallel Execution of Multiple Models

Q: What is Model Fusion?

A technology that runs multiple LLMs in parallel and integrates their outputs for high-precision, low-cost AI inference.

Published Jun 18, 2026 2:52 AM ・ Updated Jun 19, 2026 6:05 AM ・ 12 min read ・ Source: PR TIMES

FlashLabs has launched 'Model Fusion,' a new feature for its AI inference gateway 'OrcaRouter,' enabling parallel execution and intelligence integration of multiple large language models (LLMs). This allows Japanese enterprises to achieve Fable5-level inference performance with up to 70% cost reduction by combining affordable models.

FlashLabs Inc. (Headquarters: Chiyoda City, Tokyo; CEO: Koichi Hosoi; hereinafter 'FlashLabs'), an applied AI research institute, announces the launch of 'Model Fusion,' a new feature for its AI inference gateway 'OrcaRouter,' which enables parallel execution and integration of intelligence from multiple large language models (LLMs) in the Japanese market.

Background and Objective: Moving Beyond Single-Model Dependency

The current generative AI market tends to rely heavily on specific high-performance 'frontier models' for inference quality. However, every single model has its strengths and weaknesses, and the most advanced models come with skyrocketing API costs and increased latency (delay).

To overcome these 'limitations of single models,' FlashLabs is introducing 'Model Fusion' to the Japanese market—a technology that runs multiple LLMs in parallel and evaluates and integrates their responses in real time. This makes it possible to achieve inference quality comparable to or even surpassing ultra-high-performance models like 'Fable 5' using combinations of affordable models, all with overwhelming cost efficiency.

Overview of 'Model Fusion'

What is Model Fusion?: For a single prompt, multiple different LLMs (e.g., Claude, GPT, Gemini, Llama, etc.) are executed simultaneously. A 'judge (Arbiter)' evaluates the multiple generated responses, selecting the best one or synthesizing insights from multiple answers into a single superior response.

Key Features:

Parallel Fan-Out + Arbiter: The same prompt is sent in parallel to multiple models, and the arbiter returns the optimal solution.

Five Arbiter Strategies: best_of_n / synthesize / majority / first / tests_pass (see table below)

Selective Fan-Out (Difficulty Gate): The panel is activated only for prompts involving code, tool usage, or high difficulty (difficulty level 0.3 or above); routine tasks are routed to cheaper single models. No panel cost is incurred for light inputs like 'hi'.

Undiluted Consensus: Instead of averaging or diluting multiple responses, the strongest single response is returned verbatim.

Custom-Built via Routing DSL: Without being bound to presets, users can build and own custom panels using approximately 12 lines of YAML.

Core Technology 'Routing DSL': These complex fusion logics can be freely defined and customized by developers using the newly developed 'Routing DSL' in YAML format with just a few lines of code.

OrcaRouter Fusion: https://www.orcarouter.ai/ja/models/orcarouter/fusion

Available Model Examples

OrcaRouter Fable 5 Fusion API: (Model details here)

Anthropic Claude Opus 4.8 API

OpenAI GPT 5.5 API

Gemini 3.5 FlashAPI

MiniMax M3 API

DeepSeek V4 Pro API

Qwen3.7 Max API

Z.AI GLM5.2 API

Documentation / Details:

Routing DSL

Technical Explanation Blog

Value for Enterprises

1. Breaking Performance Limits Through 'Intelligence Synthesis'

By enabling multiple models to function in a 'consensus-based' manner, inference accuracy unattainable by a single model can be achieved. This is particularly effective for tasks requiring fact-checking, complex reasoning, and advanced programming, delivering results that surpass standalone frontier models.

2. Overwhelming Cost-Effectiveness

Instead of calling a single expensive top-tier model, using multiple cheaper and faster models in 'Fusion' mode allows maintaining or even exceeding equivalent quality while reducing inference costs by up to 70%.

3. Ensuring Reliability and Redundancy

If a specific AI provider experiences an outage, other models within the Fusion configuration automatically take over. This ensures continuous, stable, high-quality AI output without disrupting business operations.

Executive Comment

Koichi Hosoi, CEO, FlashLabs Inc.

"The future of AI utilization will shift from the era of 'selection'—choosing which model to use—to the era of 'composition,' where the focus is on how to combine multiple intelligences. Model Fusion, which we are launching today, is precisely the core technology for this new era. By fusing machine speed with multiple intelligences, we aim to create a society where Japanese enterprises can freely harness world-class intelligence without being hindered by cost barriers."

About OrcaRouter

OrcaRouter is a next-generation AI inference gateway developed by the U.S.-based AI research institution Continuum AI and exclusively distributed in Japan by FlashLabs Inc. It integrates over 200 LLMs into a single endpoint and a single API key, automatically routing each prompt to the optimal model based on difficulty level. The newly launched Model Fusion is a feature that enables parallel consensus of multiple models on this platform. There is zero token markup fee, and integration requires only changing one line of the Base URL. Guardrails, tracing, monitoring, and evaluation functions are also provided within the same gateway.

OrcaRouter Official Website

About FlashLabs Inc.

FlashLabs is an applied AI research institute aiming to automate, and ultimately autonomize, sales and customer experience. Through its 'Human-AI Hybrid' approach, it delivers results that surpass traditional methods for enterprises.

Company Name: FlashLabs Inc.

Headquarters: Chiyoda City, Tokyo

CEO: Koichi Hosoi

FlashLabs Inc. Official Website

About Continuum AI

Continuum AI is a U.S.-based AI company that develops OrcaRouter. It provides an efficient AI utilization platform across multiple LLM providers through adaptive routing technology.

Continuum AI Official Website

Inquiries

Marketing Department, FlashLabs Inc.

Contact: Koki Kobayashi

Email: koki.kobayashi@myflashcloud.com

*OrcaRouter is a trademark of Continuum AI.

*Fable 5, Claude, GPT, Gemini, and Llama are trademarks or registered trademarks of their respective companies.

FACT BOX

Source: PR TIMES
Category: New Product
Organizations: Continuum AI / Anthropic / OpenAI
Products / services: OrcaRouter / Model Fusion

FlashLabs Launches 'Model Fusion' in Japan via OrcaRouter — Achieving Fable5-Level Inference Performance through Parallel Execution of Multiple Models

⚡ Key Points

FACT BOX

Editorial & Verification Standards

FAQ

Cite this article — HOW TO CITE

AI CRAWLER ACTIVITY