EdgeDistill: A Knowledge Distillation Approach for Deploying Large Language Models on Resource-Constrained Edge Devices in Industrial IoT

Chen, Changan; Ming, Yan

doi:10.65102/is2026876

Research article

Ingegneria Sismica

Volume 43 Issue 2
Pages: 1
-17

EdgeDistill: A Knowledge Distillation Approach for Deploying Large Language Models on Resource-Constrained Edge Devices in Industrial IoT

Author(s): ^¹, ^¹

¹School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400064, China

Published: 30/04/2026

Cite

Chen, Changan. and Ming, Yan. “EdgeDistill: A Knowledge Distillation Approach for Deploying Large Language Models on Resource-Constrained Edge Devices in Industrial IoT.” Ingegneria Sismica Volume 43 Issue 2: 1-17, doi:10.65102/is2026876.

https://doi.org/10.65102/is2026876

Abstract

The deployment of large language models (LLMs) on resource-limited edge equipment within Industrial Internet of Things (IIoT) scenarios encounters key difficulties that come from the extreme incompatibility between the huge calculation and memory demands of LLMs and the restricted hardware abilities of edge platforms. The currently existing model compression methods, which include uniform quantization and unstructured pruning, thus frequently bring about comparatively obvious performance decreasing on domain-specified industrial tasks, for example predictive maintenance, anomaly detection, and fault diagnosis. To overcome these limitations, this paper presents EdgeDistill, a task-adaptive knowledge distillation framework that efficiently transfers domain-specific knowledge from a large teacher LLM to a compact student model tailored for IIoT edge deployment. Firstly, one Industrial Semantic Alignment Distillation (ISAD) module is put forward, which uses a two-granularity alignment tactic that at the same time refines token-level logit distributions and sentence-level semantic expressions, therefore guaranteeing that the student model can loyally hold both fine-grained industrial terms and overall context comprehension. Second, one mechanism called Frequency-Aware Layer Selection (FALS) is brought in, which dynamically finds out and gives priority to the most information-containing middle layers of the teacher model for knowledge passing according to spectrum analysis of feature activation modes, therefore maximizing distillation efficiency while reducing calculation consumption. Third, a Hardware-Perception Adaptive Quantization-Distillation (HAQD) co-optimization module is designed, which in a unified training flow together carries out mixed-precision quantization and knowledge distillation, therefore making the student model can be compressed and knowledge-increased at the same time hence it obeys the special memory and delay restrictions of the target side edge hardware. At last, one Domain-Calibrated Evaluation Protocol (DCEP) has been built up, which brings in a complete group of IIoT-special measuring norms that include task correctness, inference time delay, energy use, and domain word loyalty to on the whole assess edge-placed language models. Experimental results on three IIoT datasets show that EdgeDistill achieves 96.2% of the teacher model’s performance while compressing the model by 51.1× (from 13.5 GB to 264 MB) and reducing inference latency by 24.7×, enabling real-time processing on edge devices like NVIDIA Jetson Nano and Raspberry Pi 4.

Keywords
Large Language Models; Knowledge Distillation; Edge Computing; Industrial IoT; Model Compression; Resource-Constrained Deployment