Research delves into the integration of Large Language Models with automatic speech recognition (ASR), specifically focusing on their effectiveness when applied to opensource Chinese datasets. The study evaluates various configurations of speech encoders and LLMs, introducing a three-stage training approach to enhance model performance.
Why this research matters: The fusion of LLMs with ASR showcases a significant advancement in speech recognition technologies, particularly for languages such as Chinese. This integration not only improves performance but also promotes reproducible research with the planned release of training scripts and models.