[Release] Commercially Usable Japanese Speaker Diarization Speech Dataset | High Precision, Large Scale, Samples Available

Nexdata has announced three of its latest 2026 Japanese speech datasets for commercial use, including 205 hours of speaker diarization data and 100 hours of NER-specific data, providing high-quality training resources for AI development.
新製品NQ 46/100出典:PR Times

📋 Article Processing Timeline

  • 📰 Published: April 1, 2026 at 22:10
  • 🔍 Collected: April 1, 2026 at 16:47
  • 🤖 AI Analyzed: April 21, 2026 at 10:42 (473h 54m after Collected)
The biggest challenge in Japanese AI model development is securing high-quality training data. Specifically, to improve the accuracy of Speech Recognition (ASR) and Natural Language Processing (NLP), data that closely mimics real-world interactions or contains properly tagged named entities is indispensable, rather than simple reading data.

To address these challenges, we introduce three of our latest self-developed 2026 Japanese speech datasets. All feature high-precision annotation and sample data are available. We hope this serves as a reference for selecting the optimal dataset for your AI development needs.

### Reproducing Real-World Dialogue: 205-Hour Japanese Speaker Diarization Natural Conversation Dataset
**Use Cases**: Speaker diarization models, voice assistants, customer center analysis, natural dialogue models.

**Features & Benefits**:
- **Versatility through Real-Device Recording**: Recorded using smartphones to capture acoustic characteristics close to actual user environments, including noise cancellation and compression.
- **Speaker Diarization & Bi-directional Support**: Two speakers are recorded on separate tracks, capturing interruptions and speech overlaps to ensure all information needed for dialogue system development is recorded.
- **Diverse Speaker Attributes**: A total of 234 participants (102 males, 132 females). Covers a wide range of ages from 18 to 60, enabling data construction with less bias.
- **High-Precision Annotation**: Character recognition accuracy of over 98%. Timestamps, speaker IDs, and gender information are included, making utterance segment identification easy.

### Specialized in Named Entity Recognition: 100-Hour Japanese Entity Reading Dataset
**Use Cases**: Voice input forms, NER (Named Entity Recognition), personal information extraction.

This dataset is specialized for 'Named Entities (names, addresses, amounts, etc.)' where accuracy is particularly demanded in speech recognition. Although it is script-based reading data, it includes practical entity tags, making it suitable for training information extraction models.

**Features & Benefits**:
- **Rich Entity Tags**: Elements crucial in business scenes, such as personal names, phone numbers, addresses, email addresses, product model numbers, and amounts, are tagged individually (e.g., [PHO], [LOC], [MONEY]).
- **Inclusion of Real-World Noise**: Includes environments with 'noise that does not affect recognition' as well as completely silent environments, contributing to improved model robustness.
- **Smartphone Recording**: Uses an audio setting (16kHz) intended for actual mobile device use, offering high compatibility with mobile app development.
- **Structured Transcriptions**: Not just simple transcriptions; it is clearly marked which part belongs to which entity, significantly reducing post-processing costs.