Agata Inc. Accelerates 'RAG+AI Chatbot' Response Speed by 1.9x — Publishes Measured Performance Data Under Worst-Case Conditions with 60 Simultaneous Users
Key facts
- Agata Inc. Accelerates 'RAG+AI Chatbot' Response Speed by 1.9x — Publishes Measured Performance Data Under Worst-Case Conditions with 60 Simultaneous Users
- Agata Inc., headquartered in Tonami, Gunma Prefecture, has announced a 1.9x acceleration in response speed for its internal document AI search service 'RAG+AI Chatbot'. The company has also published real-world performance metrics under extreme conditions—60 simultaneous users—with full transparency on measurement methods. The service maintains all infrastructure in-house, ensuring data sovereignty and uninterrupted supply.
- Source: PR Times
- Date: June 17, 2026
Direct answer
Agata Inc., headquartered in Tonami, Gunma Prefecture, has announced a 1.9x acceleration in response speed for its internal document AI search service 'RAG+AI Chatbot'. The company has also published real-world performance metrics under extreme conditions—60 simultaneous users—with full transparency on measurement methods. The service maintains all infrastructure in-house, ensuring data sovereignty and uninterrupted supply.
- Citation
- Agata Inc. Accelerates 'RAG+AI Chatbot' Response Speed by 1.9x — Publishes Measured Performance Data Under Worst-Case Conditions with 60 Simultaneous Users (June 17, 2026), PR Times
- Source
- PR Times
- Date
- June 17, 2026
Agata Inc., headquartered in Tonami, Gunma Prefecture, has announced a 1.9x acceleration in response speed for its internal document AI search service 'RAG+AI Chatbot'. The company has also published real-world performance metrics under extreme conditions—60 simultaneous users—with full transparency on measurement methods. The service maintains all infrastructure in-house, ensuring data sovereignty and uninterrupted supply.
📋 Article Processing Timeline
- 📰 Published: June 17, 2026 at 20:03
- 🔍 Collected: June 17, 2026 at 11:17
- 🤖 AI Analyzed: June 17, 2026 at 11:37 (20 min after Collected)
The response performance previously communicated as a design estimate at launch (announced May 2026) is now validated through real-world measurements, with further speed enhancements achieved. Pricing remains unchanged.
■ Why Publish 'Measured Values Under the Most Stringent Conditions'?
AI service response speeds are typically advertised under ideal conditions, leaving enterprises uncertain about actual performance during peak usage—a key concern for adopters.
Agata deliberately measures performance under an 'artificial worst-case scenario' where all users send requests simultaneously and continuously, publishing these figures along with measurement conditions and methods. Real-world usage cannot exceed this theoretical maximum load. Thus, these values represent a 'real-world performance floor'—close to a minimum guaranteed capability.
■ Measured Results (60 Users, Maximum Load, In-House Testing)
Simultaneous Users | Response Start Median | Response Start Average | 95th Percentile
30 users: 386 ms | 460 ms | 827 ms
60 users: 560 ms | 1,015 ms | 3,314 ms
Measurement Conditions: Closed-loop method where all clients continuously send queries simultaneously. Each condition measured over 60 seconds, repeated multiple times; median values reported. Zero error rate across all tests.
During low-usage periods or single-user access, response start time is around 0.15 seconds median, feeling nearly instantaneous.
Answer quality was evaluated against question guidelines, achieving a 96% valid response rate (24 out of 25 questions answered appropriately, 95.7% for guideline-compliant questions), exceeding the 92% target.
Specifications of measurement tools and raw data are available as technical documentation for prospective clients.
Answer Display Speed (Measured)
This refers to how quickly the answer text appears after response initiation. If faster than human silent reading speed (~15–20 characters per second), users perceive it as 'no waiting'.
Agata's measurements show answers displayed at ~60–101 characters per second (4–6x silent reading speed), exceeding 500 characters per second during single-user use. Even under heavy concurrent usage, display speed surpasses human reading pace, ensuring smooth, uninterrupted text flow.
■ Speed Enhancement Details
This performance boost results from two independent improvements:
① Re-ranking Speedup (~2x)
Applied FP8 quantization (8-bit floating-point computation) to the re-ranking process (precisely reordering search results by relevance to the query), achieving ~2x speed improvement (63ms → 29ms) under proprietary accuracy validation. Pre- and post-quantization answer quality was confirmed equivalent through testing.
② Inference Engine Upgrade (~1.9x generation speed, 17% faster response start)
Adopted the latest-generation inference engine, optimizing AI model computation to align with the data center's GPU generation. This reduced time-to-first-token (TTFT) by ~17% (178ms → 148ms in single-user mode) and increased text generation speed by ~1.9x (280 → 524 characters per second).
As Agata operates all layers—AI models, search engine, database, and servers—internally, such deep-level optimizations can be continuously implemented and passed to customers at no additional cost.
Technical background on these improvements (inference optimization, speculative decoding stabilization, etc.) is detailed in a developer-focused article published the same day. → https://zenn.dev/articles/0a3af1960fba0d/edit
■ 'The AI You Rely On Might Suddenly Disappear' — The Importance of Sustainable Supply
Cloud-based external AI services carry inherent risk: they may become inaccessible overnight due to provider policy changes, regulations, or contract revisions—factors beyond user control. Relying on such AI for core operations introduces continuity risk beyond technical performance.
Agata's RAG+AI Chatbot is structurally resilient to such risks.
It uses open-weight AI models under Apache License 2.0. Model weights are stored and operated within Agata's private renewable-energy-powered data center in Tonami.
With no dependency on specific vendor servers or APIs, remote termination is fundamentally impossible. Once deployed, the AI cannot be 'cut off' due to external circumstances.
Beyond data sovereignty (no data leaves premises), the service ensures supply continuity.
■ AI Running on Solar-Powered, On-Premise Data Center — Zero Data Export
The service operates within Agata's self-owned data center, leveraging renewable energy through the Ministry of the Environment's 'Zero Emissions and Resilience Enhancement Initiative'. On-site solar power and large-scale batteries maximize renewable electricity usage. A closed-loop design on in-house GPU servers ensures customer data never reaches external cloud platforms.
This creates a sustainable AI infrastructure that balances data security with reduced environmental impact.
■ Continuous Improvement in Accuracy and Speed
Agata operates all layers—AI models, search engine, database, servers—in-house. This vertical integration allows continuous, autonomous optimization of search accuracy, answer quality, and response speed, without waiting for third-party component updates.
Answer Accuracy: Ongoing tuning of document preprocessing, search, and re-ranking stages to further improve valid response rates.
Response and Token Generation Speed: Continued optimization via quantization and inference path refinement, extending beyond current improvements.
These enhancements will be passed to existing customers at no extra cost whenever possible.
Performance is not a 'fixed value at deployment' but improves over time—this is the service vision.
■ Pricing and Terms (Unchanged — Performance Gains Included at No Extra Cost)
Base Fee: ¥40,000/month (excl. tax), zero setup fee (2-year contract)
Includes RAG processing for up to 100MB of document data
1 account = 1 concurrent user (internal sharing allowed; multiple accounts可)
No limits on token count (text volume, response length)
FAQ
Does Agata's AI really not send data externally?
Yes. Models and data operate entirely within Agata's private data center in Tonami, with zero external transmission.
What is the response speed under 60 simultaneous users?
Median response start is 560ms, with 95% of requests starting within 3.3 seconds.
Can this AI suddenly become unavailable like cloud services?
No. Owned models and infrastructure ensure no service interruption due to external factors.
Is there an upfront cost for deployment?
No. Zero setup fee with a 2-year contract, starting at ¥40,000/month (excl. tax).
How accurate are the answers?
96% of responses were valid under guideline-based evaluation, exceeding the 92% target.