In this paper, the speech sensor is first used to obtain the original speech emotion signal of the cabin service dialog scene, and the speech collected by the speech sensor is preprocessed, and its preprocessing includes pre-emphasis, frame-splitting, windowing, and end-point detection, which can effectively eliminate the noise in the original speech signal. Subsequently, with reference to several common speech emotion features in the cabin service dialog scene, Mel’s acoustic spectrogram and Mel’s cepstrum coefficient are finally adopted to carry out the speech emotion feature extraction. Aiming at the unsatisfactory performance of RepVGG network, CBAM is adopted to improve the RepVGG network, and finally the speech emotion recognition model based on the improved RepVGG network is obtained. By constructing a service quality detection platform form, the speech emotion recognition technology is directly applied to the service process of the cabin service dialogue scene, in order to realize the correlation between speech emotion recognition and service quality of the cabin service dialogue scene. The speech emotion signal processing capability, speech emotion signal recognition capability, and speech emotion signal classification capability have moderate positive correlation with the platform service quality, and their Pearson’s coefficient values are 0.672, 0.467, and 0.487, which sufficiently reveal the correlation between the speech emotion recognition and the service quality of the cabin service dialog scene.