AI Inside Develops Full-Duplex Voice Interaction Model for Simultaneous Dialogue and Task Execution—Demonstrates 96% Reduction in Task Completion Time as GENIAC Project Outcome
AI inside has developed a Full-Duplex voice interaction model that simultaneously handles dialogue and task execution. This model allows for real-time conversational responses and achieves a 96% reduction in task completion time. The achievement is a result of the GENIAC project.
📋 Article Processing Timeline
- 📰 Published: April 8, 2026 at 20:00
- 🔍 Collected: April 8, 2026 at 11:31
- 🤖 AI Analyzed: April 20, 2026 at 17:04 (293h 33m after Collected)
AI inside Corporation (CEO: Taku Watakuchi, Headquarters: Minato-ku, Tokyo, hereinafter "AI inside") has developed a Full-Duplex voice interaction model that simultaneously processes human conversation and task execution.
This research and development is based on the research theme "Research and Development of a Consistent Japanese Full-Duplex Speech Multimodal LLM," which was adopted for the GENIAC (Generative AI Accelerator Challenge) project, aiming to strengthen generative AI development capabilities in Japan, conducted by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO).
## Technical Features of the Full-Duplex Voice Interaction Model
### ① Simultaneous Processing of Dialogue and Task Execution—Full-Duplex Voice Interaction
This model supports Full-Duplex voice interaction, capable of capturing user intent mid-utterance and immediately starting response generation and task processing. While conventional voice AIs start processing after the utterance is complete, this model proceeds with processing during the utterance. This enables real-time conversational responses.
**Casual Conversation**
Responds by instantly changing utterance content according to the flow of the conversation.
**Work Consultation**
Generates non-verbal expressions such as laughter in real-time, in addition to confirmation responses.
**Travel Consultation**
Maintains calm dialogue by naturally controlling the timing and intensity of interjections.
### ② Image Understanding for Recognizing Present Information
A mechanism for comprehensively processing images, audio, and text with a single model has been realized. In evaluations for describing image content in Japanese, it showed approximately 6.1 times higher explanation accuracy compared to Qwen3-8B-VL.