Building an Interviewer AI with Whisper × Claude API: Unveiling the Technical Flow for 5-Axis Scoring Specialized for Engineers from Voice

April 9, 2026

X-HACK Inc., provider of the AI interview practice web app "Mentai" specializing in engineer job changes, has released the technical architecture supporting Mentai as a practical example of generative AI utilization. By combining highly accurate speech recognition from OpenAI Whisper API and AI feedback generation specialized for engineer interviews from Anthropic Claude API, Mentai provides an "AI interview coach" experience that goes beyond simple chat with AI.

product_launch|otherNQ 100/100出典：PR Times

📋 Article Processing Timeline

📰 Published: April 9, 2026 at 19:00
🔍 Collected: April 9, 2026 at 10:30
🤖 AI Analyzed: April 18, 2026 at 17:26 (222h 55m after Collected)

X-HACK Inc. (Headquarters: Shinagawa-ku, Tokyo; Representative Director: Shinsuke Matsuda), which provides "Mentai," an AI interview practice web application specializing in engineer job changes, is pleased to announce the technical architecture that supports Mentai as a practical example of generative AI utilization, following its recent service release.

By combining highly accurate speech recognition from the OpenAI Whisper API and AI feedback generation specialized for engineer interviews from the Anthropic Claude API, we are realizing an "AI interview coach" experience that cannot be achieved by simply "chatting with AI."

Service URL: https://mentai.recruit-hub.ai/

■ Challenges that cannot be solved by simply asking ChatGPT to "practice interviews"

We are now in an era where you can easily experience a mock interview by speaking to ChatGPT's voice mode and asking it to "practice interviews." However, when trying to use it for engineer interview preparation, the following challenges remain:

Ambiguous evaluation criteria: Only generic comments like "That's a good answer" are returned, and points emphasized in engineer recruitment, such as technical explanation ability and specificity, are not evaluated.
No accumulation of practice records: No matter how many times you practice, no history remains, so you don't know if you are improving.
Inability to notice issues with "speaking ability": The frequency of fillers ("um," "uh") and logical breakdowns that cannot be noticed through text input cannot be visualized.

Mentai solves these challenges with a technical architecture that combines speech recognition AI and the Claude API.

■ Why we are particular about "voice"

An interview is a place to "speak." Even if you input answers you thought of in your head as text, you cannot necessarily speak them the same way in a real interview. Logic breaks down when speaking aloud, you don't know how to manage your time, and fillers increase—these are issues that cannot be noticed without actually speaking aloud.

This is why Mentai is particular about voice input. By having users speak into a microphone, we provide an AI interview practice environment that is as close to the real thing as possible.

■ Technical Flow: From Voice to AI Feedback

In Mentai, users go through three AI processing steps from speaking their answer to receiving feedback.

【Step 1: Voice Recording (Browser)】

Voice is recorded using the MediaRecorder API on the user's browser. No dedicated app installation is required; it works solely in the browser. During recording, a waveform animation is displayed to give users a "sense of speaking."

【Step 2: Speech Recognition (OpenAI Whisper API)】

The recorded voice is sent to the server and transcribed by the OpenAI Whisper API.

Technical terms (e.g., "microservices," "CI/CD," "Scrum") frequently appear in Japanese interview answers. The Whisper API can recognize these technical terms, which are often misrecognized by general speech recognition, with high accuracy.

In the early stages of development, there was a problem where "CI/CD" was recognized as "shi-ai-shi-dee" in katakana, but this was solved by passing a list of technical terms to the Whisper API's prompt parameter. We were particularly particular about selecting a speech recognition engine because this recognition accuracy affects the quality of the feedback.

【Step 3: AI Feedback Generation (Anthropic Claude API)】

The transcribed text, along with the question and the user's profile information (job type, experience level, industry), is sent to the Anthropic Claude API.

The Claude API's role is not just "correction." It scores based on five evaluation axes specialized for engineer interviews (technical explanation ability, logical structure, specificity, expressiveness, fluency) and generates specific comments on strengths and areas for improvement.

Furthermore, it automatically adjusts the weighting of each axis according to the question category (e.g., technical challenge, self-PR, motivation for applying), thereby providing accurate AI interview preparation feedback that matches the intent of the question.

■ UX and Evaluation Design Pursued in Development

【UX Design that Transforms Waiting Time into "Interview Pauses"】

Mentai incorporates AI processing waiting time into the experience as "the time the interviewer is thinking."

In actual interviews, there is a short period of time when the interviewer thinks after hearing an answer. Mentai reproduces this "pause" and displays a nodding animation of the AI interviewer "thinking," transforming the approximately 10 seconds of processing time, which combines Whisper API's speech recognition (average 2-3 seconds) and Claude API's feedback generation (average 5-8 seconds), into a natural interview experience.

【5-Axis Evaluation Design Adapted to the Engineer Recruitment Scene】

Mentai's 5-axis evaluation (technical explanation ability, logical structure, specificity, expressiveness, fluency) was designed based on interviews with experienced engineer interviewers.

Initially, there were three axes (technical ability, logicality, expressiveness), but it was found that "specificity involving numerical values and team size" and "fluency, such as few fillers and good answer tempo," were key factors in success or failure, so it was expanded to five axes.

Furthermore, by automatically adjusting the weight of the evaluation depending on the type of question, such as "questions that should emphasize technical explanation ability" and "questions that require expressiveness," we provide non-uniform feedback.

■ Technology Stack

Layer	Technology	Role
Frontend	Next.js (TypeScript)	UI, voice recording, feedback display
Backend	Ruby on Rails (API mode)	Business logic, API
Speech Recognition	OpenAI Whisper API	Voice to text conversion
AI Evaluation	Anthropic Claude API	5-axis scoring, feedback generation
Infrastructure	AWS (ECS Fargate / RDS / CloudFront)	Production environment
Authentication	Supabase Auth	User authentication

■ Future Technical Outlook

Currently, users practice one question at a time, but in the future, we are developing an "interview mode" that utilizes the streaming function of the Claude API to generate in-depth follow-up questions in real-time based on the answer content. We aim to reproduce the "tension of being pressed" felt in a real interview with AI.

Furthermore, we have also started providing an SNS integration function that allows users to share practice results on X (Twitter). In line with the engineer's culture of "wanting to show off scores," 5-axis scorecards can be easily shared. Details of this function will be introduced again in the next press release.

Service URL: https://mentai.recruit-hub.ai/

■ Company Overview

Company Name　　　　X-HACK Inc.

Location　　　　2-5-2 Higashi-Gotanda, Shinagawa-ku, Tokyo, THE CASK GOTANDA 702

Representative　　　　Representative Director Shinsuke Matsuda

Established　　　　March 2018

Business Activities　　　　Generative AI/LLM utilization support, design and development of AI-driven development platforms, IT system implementation support and contract development

Corporate Website　https://x-hack.jp

■ Contact for this matter

X-HACK Inc.

Contact: Toyoda

E-mail: support@mentai.recruit-hub.ai

Back to Newsroom (30)