The communication environment of the power Internet of Things (IoT) is characterized by massive heterogeneous nodes, strong spatiotemporal correlations in traffic behavior, and constrained edge resources. Traditional intrusion detection methods struggle to address its complex and dynamic security threats. This paper first employs convolutional neural networks (CNNs) for offline training in the cloud to generate diverse baseline detection models. It then introduces Pearson correlation coefficients for feature selection and utilizes the ST-GCN model to deeply mine the inherent spatial-topological dependencies and temporal dynamics within power communication traffic. The ST-GCN model achieves an accuracy of 98.62% on the test set, significantly outperforming CNN (96.39%) and GCN (97.31%). In specialized DDoS attack detection, ST-GCN achieved an accuracy of 98.63% with a false positive rate as low as 0.01%, while maintaining an average detection latency of only 22.12ms. Its performance comprehensively outperformed multiple traditional machine learning baseline models. Under simulated harsh communication conditions with high packet loss rates (up to 5%), the ST-GCN model maintained an accuracy of 90.29%, demonstrating exceptional robustness. Ablation experiments further confirmed the significant contributions of the feature selection and spatio-temporal modeling modules to enhancing detection performance. The edge-cloud collaborative intrusion detection system based on ST-GCN provides efficient, precise, and reliable proactive security protection for the power Internet of Things.