Code-switching Speech Recognition and Language Alignment

The AI Digest

Audio

Speech

Code-switching

Language Alignment

ASR

Code-switching Speech Recognition and Language Alignment

Code-switching (CS) poses significant challenges in ASR due to language switching within a speech. The innovative language alignment loss in this research uses pseudo language labels for frame-level language identification. In parallel, generative error correction with large language models and a novel linguistic hint are proposed to handle the complex token alternatives in bilingual speech. Tested on SEAME and ASRU datasets, this method achieves impressive improvements.

Introduces language alignment loss to mitigate language confusion
Large language models prompted with linguistic hints for better recognition
Shows substantial improvements in CS-ASR performance
Notably successful in training with primary-language-dominant bilingual data
Surpasses previous models with fewer parameters

This study advances the field of multilingual speech recognition, suggesting that these methodologies could greatly facilitate the integration of multilingual capabilities into ASR systems, broadening accessibility and improving user experience for diverse populations.

Personalized AI news from scientific papers.