With the wide application of Internet technology in smart grid, the time delay jitter of user-side data has caused serious interference to the accuracy of power supply decisions. This paper establishes a delay analysis model based on tandem queuing system, observes the queue at the moment of service packet arrival, and adopts the strategy of analyzing one by one to obtain the probability generating function of delay jitter for real-time class service packets. On this basis, a power supply decision monitoring system based on IoT communication is constructed, which collects real-time data of power supply through 4G/fiber optic communication and MV carrier network, and realizes optimal scheduling and decision making of power supply with the help of reinforcement learning algorithm. The experimental results show that compared with the optimal trend and genetic algorithm, the reinforcement learning-based algorithm proposed in this paper has faster convergence speed and higher stability, and its economy is better than the comparison algorithm, and its total cost is reduced by more than 2000 yuan. The practicality of the algorithm in this paper in coping with communication delay and user-side data fluctuation is verified.