Knowledge Graph Construction Method under Multimodal Information Fusion

Zhang, Shenghe; Xie, Yongxu

doi:10.65102/is20261265

Research article

Ingegneria Sismica

Volume 43 Issue 3
Pages: 1
-15

Knowledge Graph Construction Method under Multimodal Information Fusion

Author(s): ^¹, ^²

¹School of Data Science, City University of Macau, Macau 999078, China

²School of Chemical Engineering and Technology, Tianjin University, Tianjin 300354, China

Published: 10/06/2026

Cite

Zhang, Shenghe. and Xie, Yongxu. “Knowledge Graph Construction Method under Multimodal Information Fusion.” Ingegneria Sismica Volume 43 Issue 3: 1-15, doi:10.65102/is20261265.

https://doi.org/10.65102/is20261265

Abstract

This article proposes a knowledge graph construction method under multimodal information fusion, which enhances text semantic information and improves the accuracy of entity recognition and relationship extraction by introducing feature guidance and multimodal cross attention mechanism. The proposed model adopts a multi-level visual cue mechanism and aligns multimodal feature distributions, effectively bridging the semantic gap between text and images and achieving accurate matching of associated objects between entities and images. In terms of model training, Adam optimizer and linear scheduler are used, with different learning rates for language, common sense, visual, and EICF encoders, and a large number of hyperparameter search experiments are conducted to ensure fair comparison. The experimental results on public datasets such as Amazon, YouTube, and self-built datasets show that the proposed model is significantly better than baseline models such as Seq2Seq, NFM, CKE, KGCN, and MMGCN in evaluation metrics such as AUC, AP, and F1. The experimental results have verified the effectiveness and superiority of the proposed model in multimodal information fusion and knowledge graph construction.

Keywords
Multimodal; Information fusion; Knowledge graph; Entity recognition; Relation extraction; Adam optimizer