IntEx Lab Builds CEFR-J Level Estimation Model, Confirming Correlation Between Speaking Volume and English Proficiency Evaluation

IntEx Lab, operating within the non-profit HelloWorld, collaborated with Tokyo University of Foreign Studies to build a statistical CEFR-J level estimation model using AI English learning tool data. A large-scale survey revealed a strong correlation between speaking volume and English proficiency, providing valuable insights for school education.
調査NQ 84/100出典:PR Times

📋 Article Processing Timeline

  • 📰 Published: April 15, 2026 at 20:00
  • 🔍 Collected: April 15, 2026 at 11:31
  • 🤖 AI Analyzed: April 19, 2026 at 12:48 (97h 16m after Collected)
IntEx Lab (International Exchange Laboratory), which conducts research and study activities for the social implementation of diversity at the non-profit general incorporated association HelloWorld (Location: Okinawa City, Okinawa Prefecture; Representative Directors: Hikaru Nonaka, Keisuke Tomita), announced that it has constructed a statistical CEFR-J level estimation model utilizing free-speech audio, in cooperation with the Tono Laboratory of Tokyo University of Foreign Studies. In addition, through a large-scale survey utilizing this model, they confirmed the correlation between the volume of speech and English proficiency evaluation, demonstrating the validity of the model. The data used to build this model was provided by partner HelloWorld Inc., and consists of statistical data that does not identify individuals, acquired through the AI English learning tool 'WorldClassroom'. * 'IntEx' is a collective term for International Exchange / International Experience. It is a concept that holistically captures activities where people engage in international exchange for the purpose of multicultural understanding, regardless of whether it is offline or online. 'IntEx' is a registered trademark of HelloWorld Inc. ■ Background of This Research To achieve effective English education, it is essential to observe how much school education has contributed to the capacity development of children and students, and to utilize those insights to improve classes. On the other hand, many evaluation tools provided by private businesses have barriers to introduction in schools, such as heavy cost burdens or the need to allocate limited class time to take the tests. CEFR-J is a new reference framework for English proficiency built for use in Japanese English education, based on the Common European Framework of Reference for Languages (CEFR). Determining a CEFR-J level requires judgment by an expert, which posed a challenge to its penetration in public education. Therefore, IntEx Lab tackled the construction of this model to eliminate challenges such as money, time, and human resources, aiming to implement a mechanism to measure English proficiency achievement in schools by realizing a system that 'estimates the CEFR-J level of speaking ability for a large number of subjects while ensuring a certain degree of reliability.' Additionally, to contribute to school education by indicating instructional content and curricula that are effective in improving students' English skills, a large-scale survey utilizing this model was also conducted. ■ Overview and Results of the Research In this research, 'correct CEFR-J labels' were assigned to 600 pieces of free-speech audio data acquired from 'WorldClassroom' with the cooperation of the Tono Laboratory at Tokyo University of Foreign Studies. Machine learning model construction was undertaken based on this ground-truth data. - Target period: Started April-June 2025, ended December 2025. Collected free-speech audio on the same theme before and after this period. - Using the above logic, investigated the fluctuations in estimated CEFR-J levels before and after learning for 3,779 elementary, junior high, and high school students over the target period. - Regardless of school type (elementary, junior high, high school), it became clear that the volume of speech during a speech greatly contributes to the improvement of estimated CEFR-J levels. On the other hand, the trend of ability characteristics that improve differs depending on the school type (age group). Trends by school type: - Elementary school: Among the evaluation items used for CEFR-J estimation, improvement was seen in 'continuing to speak smoothly,' which contributed to an overall level increase. - Junior high school: The difference in speaking volume directly translates to the difference in estimated CEFR-J levels. - High school: In addition to the same trend as junior high school, acquiring sentence complexity leads to a level improvement. ■ Implications for School Education In this model, the features extracted from the audio data of English speeches are statistically linked to the estimated CEFR-J level. As a result, by analyzing the features of the audio data of students whose 'estimated CEFR-J level (approx. English proficiency) improved,' points to focus on in training to enhance comprehensive English skills were suggested. While the acquisition of English knowledge is accumulating, 'automation' training to use it extemporaneously seems even more critical in enhancing future educational effectiveness. Furthermore, to improve speaking volume, it is suggested that practicing 'speaking fluently in sentences' in elementary school, 'speaking coherent content' in junior high school, and 'developing structural capability and speaking extemporaneously' in high school are effective.