Code-switching (CS) poses significant challenges in ASR due to language switching within a speech. The innovative language alignment loss in this research uses pseudo language labels for frame-level language identification. In parallel, generative error correction with large language models and a novel linguistic hint are proposed to handle the complex token alternatives in bilingual speech. Tested on SEAME and ASRU datasets, this method achieves impressive improvements.
This study advances the field of multilingual speech recognition, suggesting that these methodologies could greatly facilitate the integration of multilingual capabilities into ASR systems, broadening accessibility and improving user experience for diverse populations.