Outline

Ingegneria Sismica

Ingegneria Sismica

A Multimodal Data Fusion Framework for Knowledge Graph Construction and Data Mining in Information Science

Author(s): Ming Li 1, Yongjia Xu1
1Business School of Hohai University, Nanjing 211100, Jiangsu, China
Li , Ming. and Xu, Yongjia . “A Multimodal Data Fusion Framework for Knowledge Graph Construction and Data Mining in Information Science.” Ingegneria Sismica Volume 43 Issue 3: 1-18, doi:10.65102/is20261284.

Abstract

To solve the problem of scattered storage of information science objects among paper texts, citation paths, author institutions, source journals, topic tags and time periods, a multimodal knowledge graph empirical sample based on open academic metadata has been established and its function in knowledge organization and data mining has been tested. The sample includes records of information science and related directions from 2014 to 2024. After processing, 5200 papers, 13172 entities and 105148 relationships remain, forming seven kinds of nodes: papers, authors, institutions, sources, keywords, topics and years, and relationships such as authorship, institutional affiliation, source publication, keyword association, topic attribution, citation linkage and co-occurrence. In terms of methodology, the titles, abstracts and keywords of the papers, the citation network, the author-institution network, the source-topic distribution and the annual period were encoded into five kinds of modal features and compared in four tasks: entity alignment, relationship completion, cross-modal retrieval and topic mining: Text only, Citation GCN, Late Fusion, MKG-BERT and Proposed MDF-KG. The results show that the average Macro-F1 of Proposed MDF-KG in four types of tasks is 0.8535, which is higher than the 0.8213 of MKG-BERT and the 0.7880 of Late Fusion; the Mean Reciprocal Rank (MRR) reaches 0.912 in the relationship completion task and the normalized discounted cumulative gain at 10 (nDCG@10) reaches 0.858 in cross-modal retrieval. The results of the ablation study indicate that the text modality has the highest average contribution and the performance decreases by 0.0553 after deleting it; the citation modality decreases by 0.0415 and the source topic modality decreases by 0.0318. The robustness test shows that when the field is missing or the noise ratio reaches 0.5, the average Macro-F1 of the Proposed MDF-KG is still 0.8080. The research findings suggest that the multimodal field organization can enhance the connectivity, ranking quality and error correction capability of information science knowledge graphs, but author alias, topic granularity and weak citation context are still the main obstacles for further development.

Keywords
Informatics; Knowledge graph; Multi modal data fusion; Data mining; Academic metadata

Related Articles

Liqin Zheng1, Dongrui Qing2, Yan Zhang1
1School of Mathematics and Statistics, Shaan Xi Xue Qian Normal University Xi’an 710100, P.R.China
2School of Marxism, Xi’an University of Finance and Economics Xi’an 710100, P.R.China
Yanan Gao1, Aiqun Peng2, Nina Ma2
1Management School of Anhui Business and Technology College Hefei 230000, Anhui, China
2Economics and Trade School of Anhui Business and Technology College Hefei 230000, Anhui, China
Ya’ning Liu1, Ping Ma1
1School of Teacher Education, Shihezi University, Shihezi, Xinjiang, 832000, China
Yuhui Li1, Zhongliang Gong1
1College of Mechanical and Intelligent Manufacturing, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
Hanqing Hu1, Chengjin Liu1, Tianmu Tian1
1School of Management Science and Engineering, Beijing Information Science & Technology University, Beijing 100192