FlashLabs Inc. (Headquarters: Chiyoda-ku, Tokyo; CEO: Yoichi Hosoi), operator of the LLM automatic routing service 'OrcaRouter,' announced today, June 17, 2026, that it has started API provision for 'GLM-5.2,' the latest model developed by Z.ai (formerly Zhipu AI).

GLM-5.2 is an open-weight model with a practical 1 million token context window, enabling advanced agentic coding capabilities. It has been immediately added to OrcaRouter's lineup of over 200 models, allowing for automatic routing to the optimal model based on prompt difficulty.

Background

AI usage fees are a 'new cost of goods sold' that continually increases with product growth. While relying on high-performance frontier models for all processing ensures stable quality, it means paying a high price even for routine tasks like extraction, classification, and formatting.

Meanwhile, the evolution of open models is accelerating. In June 2026, Z.ai released 'GLM-5.2,' which achieved performance comparable to frontier models in practical areas like agent-based coding and tool use, but as an open-weight model.

This change presents an opportunity to rethink the cost structure of AI utilization. By routing even difficult tasks to appropriate open models, quality can be maintained while lowering costs. OrcaRouter is designed not to 'throw everything at a high-performance model,' but to select the optimal model for each prompt.

This launch enables developers and companies in Japan to use the latest high-performance open models in their production environments quickly and reliably.

About Z.ai GLM-5.2

Developed by Z.ai (Zhipu AI), GLM-5.2 is a flagship model specialized for coding and long-term agent tasks. It is the third update in the GLM-5 series in just four months, following GLM-5 in February and GLM-5.1 in April.

Pricing (OrcaRouter public price, zero markup): - Input: $1.40 / 1M tokens - Output: $4.40 / 1M tokens *For reference: Claude Opus 4.8 is $5 for input and $25 for output.

Key Specifications: - Practical 1M context (up to 1,048,576 tokens) - Process large codebases or long documents at once. - MIT-licensed open weights - Allows for commercial use, modification, and self-hosting. - Strong in agent-based coding/tool use (see benchmarks below). - Adjustable 'thinking effort' - Balance quality and cost. - Approx. 750B parameter MoE architecture (approx. 40B active).

Key Benchmarks: - SWE-bench Pro (Software Development): 62.1 — Surpassing GPT-5.5 (58.6). - MCP-Atlas (Agent-based tool use): ~77 — Nearing Claude Opus 4.8 (77.8). - KingBench 3 (Independent coding evaluation): 3rd place — Ranking high as an open-weight model among frontier models. - AIME 2026 (Math): 99.2 / GPQA-Diamond (Science): 91.2 (values from HuggingFace model card).

Value for Businesses

1. Consistent Processing of Large Codebases with Ultra-Long Context GLM-5.2's practical 1M token context window allows for analyzing and refactoring entire large legacy codebases in a single session. Complex tasks like code reviews, dependency analysis, and migrations can be performed without context splitting.

2. Frontier Quality at Open Model Cost Achieved a score of 81.43 on KingBench 3, close to Claude Opus 4.8. OrcaRouter's prompt analysis feature enables an optimal balance of quality and cost by automatically selecting GLM-5.2 for difficult coding tasks and even cheaper open models for routine processing.

3. Enabling Autonomous Coding by AI Agents Integration with OpenCode supports workflows where AI agents autonomously generate, edit, and test code. With up to 128K token output, large-scale code generation is possible in a single response.

OrcaRouter's Features

OrcaRouter is not just a model proxy but an LLM routing service equipped with contextual bandit technology to determine prompt difficulty on a per-request basis.

Key Features: - Prompt Analysis — Judges the difficulty of each prompt in milliseconds to route it to the optimal model. - 0% Commission — Token charges are the same as the provider's public price. - Learning Routing — Continuously improves routing accuracy based on request results using LinUCB contextual bandits. - Per-Request Visibility — Records decision, model, provider, and price for every request. - 200+ Models — Access a wide variety of models through a single endpoint. - Mid-stream Failover — Seamlessly recovers from provider outages mid-stream. - 8 Guardrails — PII masking, prompt injection prevention, etc. - 1-Line Integration — Simply change the base_url of the OpenAI SDK.

Future Developments

OrcaRouter will continue to swiftly add optimal models, from frontier to open. We will also enhance routing strategies and evaluation functions for advanced LLM routing in enterprise production AI workloads.

A Comment from the CEO

Yoichi Hosoi, CEO of FlashLabs Inc. 'OrcaRouter's vision is to optimize AI costs while protecting quality. We believe the approach of choosing the best model for each prompt—rather than just replacing with a cheaper one—is essential for production AI operations in Japanese companies. The more high-performance open models there are, the more valuable OrcaRouter becomes.'

FACT BOX

  • Source: PR TIMES
  • Category: New Product
  • Organizations: Z.ai / Zhipu AI / Anthropic
  • Products / services: OrcaRouter / Z.ai GLM-5.2