With the rise of generative adversarial network technology, generative adversarial network brings new research direction for cross-cultural English conversation scenarios, which helps the development of foreign trade English teaching in colleges and universities. In this paper, we introduce the “encoding-decoding” structure as a cross-cultural English response generator on the basis of the generator, and the encoder and decoder are both composed of GRU units. The word vector approximation layer is generated by multiplying the word probability distribution output from the cross-cultural English response generator by the corresponding word vector. A discriminator based on convolutional neural network is used to prompt the cross-cultural English reply generator to produce results closer to the real data, and finally the work of constructing the Generative Adversarial Network-based Conversation Reply Generation Model (GAN-AEL) is completed and the loss function for model training is set. With the support of the corresponding development tools, the Generative Adversarial Network-based Conversation Response Generator model is successfully integrated into cross-cultural English conversation scenarios, and the supporting teaching system is finally designed and analyzed. Before using the FTES, the students’ English writing scores were in the range of 6~15, and after using the FTES for one semester, the scores increased to 21~29, with a difference of 7~21, and the rest of the English reading scores, English speaking scores, and English listening scores were the same, which comprehensively confirmed the practical application effectiveness of FTES based on the cross-cultural English conversation scenario, aiming at the design and analysis of the supporting teaching system. It comprehensively confirms the effectiveness of the foreign trade English teaching system based on cross-cultural English dialog scenes, and aims to boost the development of foreign trade English teaching in colleges and universities.