Outline

Ingegneria Sismica

Ingegneria Sismica

LSTM Networks Optimize English Speech Recognition Accuracy

Author(s): Jingwen Liu1, Yannan Li1
1School of Humanities and Education, Jinan Preschool Education College, Jinan 250000, Shandong, China
Liu, Jingwen. and Li, Yannan. “LSTM Networks Optimize English Speech Recognition Accuracy.” Ingegneria Sismica Volume 43 Issue 1: 1-21, doi:10.65102/is2026491.

Abstract

In order to solve the problems of insufficient context dependence modeling, high recognition error of long sentences and limited decoding stability in English continuous speech recognition, this paper proposes a recognition accuracy optimization method based on LSTM network. From the perspective of computer implementation, this study constructs an integrated recognition process of “speech preprocessing-feature representation-time series modeling-sequence decoding”. After completing pre-emphasis, framing and window-adding, log Mel spectrum extraction and feature normalization, bidirectional LSTM is used to jointly model the context of speech sequence. CTC beam search and language model re-ranking are combined to improve the consistency and readability of the output text. At the same time, joint optimization is carried out around the number of hidden units, the number of network layers, the learning rate, the Dropout rate and the decoding parameters to enhance the adaptability of the model under different speaking rates and sentence lengths. Experimental results show that the word error rate of the optimized LSTM model is reduced to 5.7%, the character error rate is 3.1%, and the sentence recognition accuracy is 84.6% on the English speech test set. The overall performance of the optimized LSTM model is better than DNN, RNN and unoptimized LSTM model. The results show that the LSTM network has strong temporal expression advantages in English speech recognition tasks, and can effectively improve the recognition accuracy and operation stability of the system after combining reasonable parameter adjustment and decoding strategy.

Keywords
LSTM network; English speech recognition; Temporal modeling; Recognition accuracy Optimization

Related Articles

Huiqiao Liu1
1Yinchuan University of Energy, Ningxia, 750000, China
Xin Zhao1, Yan Li1, Xiangyang Cao1, Qiushuang Li1, Jianing Zhang1
1State Grid Shandong Electric Power Company Economic and Technological Research Institute ShanDong JiNan 250001, China
Dan Yang1
1School of Marxism, Suzhou Polytechnic University, Suzhou, 215104, China
Liuhang Shen1, Xiangwen Sun1
1Ulster college at Shaanxi University of Science &Technology, Xi’an,710021, Shaanxi, China