In order to solve the problems of insufficient context dependence modeling, high recognition error of long sentences and limited decoding stability in English continuous speech recognition, this paper proposes a recognition accuracy optimization method based on LSTM network. From the perspective of computer implementation, this study constructs an integrated recognition process of “speech preprocessing-feature representation-time series modeling-sequence decoding”. After completing pre-emphasis, framing and window-adding, log Mel spectrum extraction and feature normalization, bidirectional LSTM is used to jointly model the context of speech sequence. CTC beam search and language model re-ranking are combined to improve the consistency and readability of the output text. At the same time, joint optimization is carried out around the number of hidden units, the number of network layers, the learning rate, the Dropout rate and the decoding parameters to enhance the adaptability of the model under different speaking rates and sentence lengths. Experimental results show that the word error rate of the optimized LSTM model is reduced to 5.7%, the character error rate is 3.1%, and the sentence recognition accuracy is 84.6% on the English speech test set. The overall performance of the optimized LSTM model is better than DNN, RNN and unoptimized LSTM model. The results show that the LSTM network has strong temporal expression advantages in English speech recognition tasks, and can effectively improve the recognition accuracy and operation stability of the system after combining reasonable parameter adjustment and decoding strategy.