AI Inference Gateway 'OrcaRouter' Integrated with High-Speed LLM Framework 'SGLang' — Unified Access to 200+ Models and Cost Optimization Achieved

June 18, 2026

Key facts

AI Inference Gateway 'OrcaRouter' Integrated with High-Speed LLM Framework 'SGLang' — Unified Access to 200+ Models and Cost Optimization Achieved
FlashLabs announces the integration of its AI inference gateway 'OrcaRouter' with the high-speed LLM framework 'SGLang'. Developers using SGLang can now access over 200 AI models through a single endpoint and achieve up to 40% cost reduction without compromising quality.
Source: PR Times
Date: June 18, 2026

Direct answer

FlashLabs announces the integration of its AI inference gateway 'OrcaRouter' with the high-speed LLM framework 'SGLang'. Developers using SGLang can now access over 200 AI models through a single endpoint and achieve up to 40% cost reduction without compromising quality.

Citation: AI Inference Gateway 'OrcaRouter' Integrated with High-Speed LLM Framework 'SGLang' — Unified Access to 200+ Models and Cost Optimization Achieved (June 18, 2026), PR Times
Source: PR Times
Date: June 18, 2026

FlashLabs announces the integration of its AI inference gateway 'OrcaRouter' with the high-speed LLM framework 'SGLang'. Developers using SGLang can now access over 200 AI models through a single endpoint and achieve up to 40% cost reduction without compromising quality.

新製品出典：PR Times

📋 Article Processing Timeline

📰 Published: June 18, 2026 at 04:00
🔍 Collected: June 17, 2026 at 19:18
🤖 AI Analyzed: June 19, 2026 at 06:53 (35h 35m after Collected)

FlashLabs Inc. (Headquarters: Chiyoda City, Tokyo; CEO: Koichi Hosoi; hereinafter 'FlashLabs') announces that OrcaRouter, an AI inference gateway developed by U.S.-based Continuum AI and exclusively distributed in Japan by FlashLabs, has now been integrated with SGLang, the high-speed LLM serving framework led by LMSYS Org. This integration enables developers using SGLang to seamlessly access over 200 cutting-edge AI models without significant code changes, while leveraging adaptive routing to achieve up to 40% cost reduction while maintaining quality.

Background and Objectives

By 2026, enterprise AI adoption is evolving from 'using a single model' to 'advanced agent workflows combining multiple models'. This shift brings new challenges: improving inference speed and optimizing the rapidly growing LLM usage costs.

SGLang, developed by LMSYS Org, is a next-generation runtime that delivers up to 5x faster inference speeds compared to traditional frameworks, earning strong support from AI engineers worldwide. Meanwhile, OrcaRouter is an LLM gateway that balances cost and quality by analyzing prompt complexity and automatically routing requests to the most suitable model.

This integration combines SGLang’s exceptional performance with OrcaRouter’s flexible model management and cost optimization capabilities, delivering an enterprise-grade infrastructure for AI application development that excels in speed, quality, and cost efficiency.

Integration Overview

Key Features:

Unified Access to 200+ Models: Connect to major models from OpenAI, Anthropic, Google, DeepSeek, and others via a single endpoint within the SGLang interface.

Adaptive Auto-Routing: Analyzes prompt complexity in milliseconds. Automatically routes routine tasks to low-cost open models and complex reasoning tasks to frontier models.

Agent Firewall & Guardrails: Transparently applies PII masking and prompt injection protection within SGLang workflows.

Unified Billing: Consolidate payments through OrcaRouter even when using multiple providers. Zero markup on token costs.

Supported Model Examples:

OrcaRouter Fable 5 Fusion API (Model details here)

Anthropic Claude Opus 4.8 API

OpenAI GPT 5.5 API

Gemini 3.5 FlashAPI

MiniMax M3 API

DeepSeek V4 Pro API

Qwen3.7 Max API

Z.AI GLM5.2 API

Value for Enterprises

1. Dramatic Improvement in Development Speed

Maintain SGLang’s high-speed runtime while instantly prototyping and deploying the latest models without worrying about differences in API specifications across models.

2. Up to 40% Reduction in LLM Spending

Instead of sending all requests to the highest-performing model, OrcaRouter automatically selects the 'optimal model', optimizing costs without sacrificing quality.

3. Enterprise-Grade Reliability

The 'mid-stream failover' feature seamlessly switches to alternative models during provider outages without interrupting the stream, ensuring 24/7 stable operations.

Future Developments

FlashLabs will enhance Japanese enterprises’ adoption of OrcaRouter by providing Japanese-language documentation, integration guides for SGLang environments, and dedicated enterprise environments with SLA support. We will continue to support the optimization of production AI by combining self-hosted infrastructure with AI gateways.

Executive Comment

Koichi Hosoi, CEO, FlashLabs Inc.

'SGLang is a game-changer in AI inference speed. By combining it with OrcaRouter’s intelligent routing, Japanese enterprises can now access world-class AI intelligence in the most efficient, cost-effective, and secure manner. We remain committed to eliminating infrastructure complexity and enabling developers to focus entirely on creating business logic.'

About OrcaRouter

OrcaRouter is a next-generation AI inference gateway developed by U.S. AI research company Continuum AI and exclusively distributed in Japan by FlashLabs. It integrates over 200 LLMs into a single endpoint, automatically routing each prompt to the optimal model based on complexity analysis. With zero token markup fees and integration starting from just one line of code, it also provides guardrails, monitoring, and evaluation features within the same gateway.

OrcaRouter Official Website

About FlashLabs Inc.

FlashLabs is an applied AI research lab aiming to automate and ultimately autonomize sales and customer experience. Through our 'Human-AI Hybrid' approach—merging machine speed and precision with human strategic insight—we deliver results that surpass traditional methods for enterprises.

Company Name: FlashLabs Inc.

Headquarters: Chiyoda City, Tokyo

Representative: CEO Koichi Hosoi

Business: Development and sales of AI solutions; provision of the AI gateway 'OrcaRouter'

FlashLabs Official Website

About Continuum AI

Continuum AI is a U.S. AI company that developed OrcaRouter. It provides an efficient AI utilization platform across multiple LLM providers through adaptive routing technology.

Continuum AI Official Website

Inquiries

Marketing Department, FlashLabs Inc.

Contact: Koki Kobayashi

Email: koki.kobayashi@myflashcloud.com

FAQ

Which companies is OrcaRouter suitable for?

Ideal for enterprises using multiple LLMs or prioritizing cost, reliability, and governance. Especially effective in finance, manufacturing, and customer support.

How much effort is required for integration?

For SGLang users, integration requires only one line of code change. No major modifications to existing systems are needed.

Are security measures sufficient?

Yes. Built-in guardrails include PII masking, prompt injection detection, and content filtering for enterprise-grade security.

What is OrcaRouter's pricing model?

Zero markup on token costs. You pay only the provider's actual rates, with unified billing through OrcaRouter.

Is Japanese language support available?

Yes. Full Japanese documentation, support, and NLP guardrails ensure smooth adoption for Japanese enterprises.

Back to Newsroom (97)