Evaluation, Selection, and Deep Adaptation of General-Purpose Large Models for Power Industry Applications

Shi, Ke; Niu, Jing; Qiao, Weixiang; Jin, Jiwei; Zhang, Xing

doi:10.65102/is20261026

Research article

Ingegneria Sismica

Volume 43 Issue 2
Pages: 1
-20

Evaluation, Selection, and Deep Adaptation of General-Purpose Large Models for Power Industry Applications

Author(s): ^¹, ^¹, ^¹, ^², ^²

¹Power Dispatching Control Center of Guizhou Power Grid Co., Ltd., GuiZhou, China

²Power Dispatching Control Center of Zunyi Power Supply Bureau of Guizhou Power Grid Co., Ltd., GuiZhou, China

Published: 30/04/2026

Cite

Shi, Ke. et al “Evaluation, Selection, and Deep Adaptation of General-Purpose Large Models for Power Industry Applications.” Ingegneria Sismica Volume 43 Issue 2: 1-20, doi:10.65102/is20261026.

https://doi.org/10.65102/is20261026

Abstract

Building a practical research model for evaluating, selecting, and adapting general-purpose large models in the two energy application scenarios of substations’ intelligent inspection and transmission corridors visualisation. The benchmark includes 18,642 retained multimodal evidence records that include: visible images; thermal frames; OCR string; equipment metadata; corridor attribute; rule clause; and historical ticket text. Anonymised six models were evaluated at set data divisions, prompts templates, inference upper limit and scoring script. Targeted power service judgment: Object localisation, risk inference, rule-based evidence, unsupervised alarm control, and robustness to field perturbations.Based on a weighted-score screen of the candidates, an adaptive selection result of the selected model included retrieval evidence, LoRA tuning, visual-grounding calibration, and safety verifier. Adapted Power-GM obtained the best comprehensive scores of 89.0%, 86.0%, 87.0%, 88.0% and 84.0%, respectively, for visual anchoring, risk judgement, rule obedience, hallucination suppression, and robustness. Eight of the selected tasks surpassed the most powerful open multimodal baseline by 9.9%.-20.3 percentage points and the closed multimodal baseline by 3.3-8.5 percentage points. The best response-surface area is LoRA ranking 48 and retrieval top-k 6, which still has a power-biz score of around 89.0% inside the latency bound. Ablation demonstrated that retrieval enhanced rule adherence, LoRA strengthened task reasoning, grounding calibration reduced object-Region misalignment, and the safety verifier decreased hallucinated risk assertions. This study is confined to the two tested scenes, with fixed model labels, task definitions, scoring scripts, test record retention policies, and only included the evidence types from the benchmark collection.8.5 percentage points. The best response-surface region was LoRA rank 48 with retrieval top-k 6, where the Power-Biz score remained near 89.0% within the latency target. Ablation showed that retrieval improved rule compliance, LoRA strengthened task reasoning, grounding calibration reduced object-region mismatch, and the safety verifier reduced hallucinated risk statements. The conclusion is limited to the two tested scenes, fixed model labels, fixed task definitions, fixed scoring scripts, retained test records, and the evidence types in the benchmark corpus.

Keywords
Large power grid model; Empirical verification; Model choice; Deep adaptation; Substation intelligent inspection; Transmission corridor visualisation.