Qlean Dataset Launches 'Japanese Regional Dialect Dialogue Speech Dataset'
Visual Bank Inc. has released a new dialect speech dataset for AI training through its Qlean Dataset solution. Featuring 5 hours of natural conversations in Osaka and Hiroshima dialects, the dataset is designed for commercial use in enhancing ASR, LLM, and TTS models.
📋 Article Processing Timeline
- 📰 Published: May 19, 2026 at 20:00
- 🔍 Collected: May 19, 2026 at 11:31
- 🤖 AI Analyzed: May 27, 2026 at 11:28 (191h 56m after Collected)
Visual Bank Inc. (Minato-ku, Tokyo; CEO: Masayuki Nagai) has announced the release of the 'Japanese Regional Dialect Dialogue Speech Dataset' through its AI training data solution, 'Qlean Dataset,' operated by its subsidiary Amana Images Inc.
### About the Dialect Speech Dataset
This dataset is a speech corpus containing regional speech patterns, accents, and vocabularies that are not covered by standard language corpora. It is intended for machine learning tasks such as verifying the generalization performance of ASR models, improving dialect understanding in LLMs, and building region-specific TTS models. Custom recordings and additional dialects are also supported upon request.
### Dataset Overview
The dataset features natural, spontaneous two-party conversations between Japanese men and women speaking Osaka and Hiroshima dialects. Unlike scripted readings, these recordings capture natural prosody, sentence-ending expressions, and vocabulary, providing acoustic features close to real-world environments. The speaker information includes gender labels, supporting acoustic model evaluation by attribute and adaptation experiments for multi-speaker models.
- **Data Type:** Audio (2-speaker dialogue format)
- **Subject Attributes:** Japanese speakers from various regions (with gender labels)
- **Capacity:** 5 hours
- **Format:** mp3 / wav
- **Audio Rate:** 44.1kHz・48kHz / 16・24bit
- **Dialects:** Osaka dialect, Hiroshima dialect, etc.
- **Commercial Use:** Allowed
### FAQ
- **ASR Development:** Can be used for robustness benchmarking (measuring WER) and dialect adaptation using LoRA or full fine-tuning for models like Whisper and ESPnet.
- **LLM Development:** Useful for training dialect-to-standard style conversion models and evaluating context-dependent semantic interpretation tasks.
- **TTS Applications:** Suitable for fine-tuning models like VITS and StyleTTS to generate natural dialect speech for regional guide robots or dialogue agents.
- **Custom Requests:** Custom collection for specific regions, ages, or situations is available.
### Key Use Cases
1. **ASR Robustness Benchmarking:** Quantitative evaluation of recognition accuracy for dialect speech using WER/CER.
2. **Dialect Adaptation Fine-tuning:** Use for few-shot or LoRA fine-tuning to adapt models to specific regional speech.
3. **LLM Dialect Understanding:** Training and evaluation for sentiment analysis, dialect conversion, and discourse structure analysis.
4. **Region-specific TTS Construction:** Building speech generation engines with natural intonation for local services.
5. **Domain Adaptation for Contact Centers:** Developing custom language models for business environments where dialects are frequently used.
### About Qlean Dataset
Qlean Dataset is a solution provided by Amana Images Inc. (a Visual Bank subsidiary) offering legally cleared, commercially available AI training data. It covers various formats including audio, image, video, 3D, and text, enabling AI developers to procure high-quality data without legal risks.
### About the Dialect Speech Dataset
This dataset is a speech corpus containing regional speech patterns, accents, and vocabularies that are not covered by standard language corpora. It is intended for machine learning tasks such as verifying the generalization performance of ASR models, improving dialect understanding in LLMs, and building region-specific TTS models. Custom recordings and additional dialects are also supported upon request.
### Dataset Overview
The dataset features natural, spontaneous two-party conversations between Japanese men and women speaking Osaka and Hiroshima dialects. Unlike scripted readings, these recordings capture natural prosody, sentence-ending expressions, and vocabulary, providing acoustic features close to real-world environments. The speaker information includes gender labels, supporting acoustic model evaluation by attribute and adaptation experiments for multi-speaker models.
- **Data Type:** Audio (2-speaker dialogue format)
- **Subject Attributes:** Japanese speakers from various regions (with gender labels)
- **Capacity:** 5 hours
- **Format:** mp3 / wav
- **Audio Rate:** 44.1kHz・48kHz / 16・24bit
- **Dialects:** Osaka dialect, Hiroshima dialect, etc.
- **Commercial Use:** Allowed
### FAQ
- **ASR Development:** Can be used for robustness benchmarking (measuring WER) and dialect adaptation using LoRA or full fine-tuning for models like Whisper and ESPnet.
- **LLM Development:** Useful for training dialect-to-standard style conversion models and evaluating context-dependent semantic interpretation tasks.
- **TTS Applications:** Suitable for fine-tuning models like VITS and StyleTTS to generate natural dialect speech for regional guide robots or dialogue agents.
- **Custom Requests:** Custom collection for specific regions, ages, or situations is available.
### Key Use Cases
1. **ASR Robustness Benchmarking:** Quantitative evaluation of recognition accuracy for dialect speech using WER/CER.
2. **Dialect Adaptation Fine-tuning:** Use for few-shot or LoRA fine-tuning to adapt models to specific regional speech.
3. **LLM Dialect Understanding:** Training and evaluation for sentiment analysis, dialect conversion, and discourse structure analysis.
4. **Region-specific TTS Construction:** Building speech generation engines with natural intonation for local services.
5. **Domain Adaptation for Contact Centers:** Developing custom language models for business environments where dialects are frequently used.
### About Qlean Dataset
Qlean Dataset is a solution provided by Amana Images Inc. (a Visual Bank subsidiary) offering legally cleared, commercially available AI training data. It covers various formats including audio, image, video, 3D, and text, enabling AI developers to procure high-quality data without legal risks.
FAQ
Qlean Datasetの「日本語・地域方言対話音声データセット」にはどの方言が含まれますか?
現在は大阪弁と広島弁を収録しており、今後ニーズに応じて他の地域の方言の追加収録も可能です。
このデータセットはどのような形式で提供されますか?
音声データはmp3およびwav形式、サンプリングレートは44.1kHz・48kHz、ビット深度は16・24bitで提供されます。
台本読み上げの音声ですか?
いいえ、台本なしの自然な発話による対話音声を収録しているため、方言特有のイントネーションや語彙が実環境に近い形で含まれています。
商用利用は可能ですか?
はい、権利クリアなデータとして商用利用が可能です。
LLM開発においてどのように活用できますか?
方言特有の文末表現や助詞を含むテキストを用いて、スタイル変換モデルの学習や意味解釈タスクの評価データとして活用できます。