Addressing the issues of weak distinguishability of thermal defect features and a large number of model parameters in infrared image diagnosis of power equipment, this study constructed an infrared image dataset for substation equipment and proposed a thermal defect diagnosis method for power equipment based on deep residual networks. First, convolutional kernel decomposition technology was used to simplify the basic structure of the network, significantly reducing the parameter size of the model. A multi-scale convolutional feature fusion strategy is then employed to integrate semantic features from both shallow and deep layers of the network, thereby enhancing the diagnostic accuracy of thermal defect states. Finally, a Bayesian optimization algorithm based on coupled constraints is designed to adaptively adjust hyperparameters such as the number of convolutional kernels and network depth, enabling lightweight identification. Experiments show that the thermal defect recognition accuracy of this model can reach 93.12% in a simple background, and the optimized thermal defect diagnosis model for power transformation equipment can effectively classify seven different types of thermal defects. This method provides a reliable technical route for intelligent diagnosis of power equipment.