Multi-modal English Translation Production Model Combining Cross-modal Alignment and Attention Mechanism

Wang, Shufang

doi:10.65102/is20261051

Research article

Ingegneria Sismica

Volume 43 Issue 3
Pages: 1
-22

Multi-modal English Translation Production Model Combining Cross-modal Alignment and Attention Mechanism

Author(s): ^¹

¹School of Foreign Languages, Zhengzhou Shengda University of Economics and Management, Zhengzhou 451191, Henan, China

Published: 10/06/2026

Cite

Wang, Shufang. “Multi-modal English Translation Production Model Combining Cross-modal Alignment and Attention Mechanism.” Ingegneria Sismica Volume 43 Issue 3: 1-22, doi:10.65102/is20261051.

https://doi.org/10.65102/is20261051

Abstract

Aiming at the problems of traditional English translation models, such as insufficient scene constraints, limited ambiguity resolution ability, and unstable image-text semantic coordination, this paper constructed a multimodal English translation production model combining cross-modal alignment and attention mechanism. Based on the collaborative input of text and image, the model forms an integrated technical link of “alignment-attention-generation” through multimodal input representation, shared semantic space mapping, bidirectional cross-modal alignment and attention-driven decoding generation. Experimental results show that the BLEU, METEOR and ROUGE-L of the proposed model on the test set reach 37.4, 32.5 and 41.3 respectively, which are 5.6, 4.4 and 5.9 percentage points higher than those of the basic Transformer model. The accuracy of image-text consistency, ambiguity resolution and entity alignment reaches 85.9%, 84.2% and 85.1%, respectively. The results show that cross-modal alignment can effectively reduce the representation deviation between text semantics and visual semantics, and the attention mechanism can enhance the dynamic screening ability of key contexts in the translation generation stage, thereby improving the accuracy, stability and application adaptability of multimodal English translation production.

Keywords
Multimodal English translation; Cross-modal alignment; Attention mechanism; Translation production model

Research article
https://doi.org/10.65102/is20261302

Visual analysis of related hotspots affecting Diab...

Volume 43 Issue 3
Pages: 1
-18
08/07/2026

^¹, ^¹

¹Guangzhou University of Chinese Medicine, School of Pharmaceutical Medicine, Guangzhou,Guangdong,China,510006

Research article
https://doi.org/10.65102/is20261300

Research on high-quality image super-resolution re...

Volume 43 Issue 3
Pages: 1
-21
08/07/2026

^¹,², ^¹,², ^¹

¹Hainan Vocational University of Science and Technology, Haikou 571126, China

²Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Malaysia

Research article
https://doi.org/10.65102/is20261301

Multi-scale Dual Transformer based Multi long-term...

Volume 43 Issue 3
Pages: 1
-18
08/07/2026

^¹,², ^¹,², ^¹

¹Hainan Vocational University of Science and Technology, Haikou 571126, China

²Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Malaysia

Research article
https://doi.org/10.65102/is20261299

Ultra-Short-Term Wind Power Forecasting Based on V...

Volume 43 Issue 3
Pages: 1
-15
08/07/2026

^¹, ^², ^¹, ^¹, ^¹

¹Electric Power Research Institute, State Grid Shanxi Electric Power Co., Ltd., Taiyuan, 030001, Shanxi, China

²Jincheng Power Supply Branch, State Grid Shanxi Electric Power Co., Ltd., Jincheng, 048000, Shanxi, China

Research article
https://doi.org/10.65102/is20261298

Integration of Traditional Culture Elements and Co...

Volume 43 Issue 3
Pages: 1
-12
01/07/2026

^¹,²

¹China Academy of Cultural Heritage, Chaoyang District, 100029, Beijing, China

²Beijing University of Civil Engineering and Architecture, Xicheng District, 100044, Beijing, China

Outline

Ingegneria Sismica

Multi-modal English Translation Production Model Combining Cross-modal Alignment and Attention Mechanism

Abstract

Related Articles

Visual analysis of related hotspots affecting Diab...

Research on high-quality image super-resolution re...

Multi-scale Dual Transformer based Multi long-term...

Ultra-Short-Term Wind Power Forecasting Based on V...

Integration of Traditional Culture Elements and Co...

Open Access