Electrocatalytic reduction reaction is one of the important technical paths to realize the carbon neutrality strategy, but the reaction faces core challenges such as high sensitivity to operating conditions, severe competitive hydrogen evolution reaction, and difficulty in precise regulation of product selectivity. Traditional catalyst design and static condition optimization methods cannot adapt to the requirements of dynamic working conditions, so there is an urgent need to develop new intelligent regulation strategies. This paper proposes a dynamic optimization framework for electrocatalytic reduction reaction conditions and product selectivity regulation based on deep reinforcement learning. The process is modeled as a Markov Decision Process (MDP), a digital twin simulation environment integrating density functional theory (DFT), microkinetic models and experimental data is constructed, a state space including real-time potential, current density, and intermediate coverage is designed, as well as a continuous action space focusing on the adjustment of potential step size, electrolyte concentration and flow rate. A multi-objective weighted reward function that takes into account Faradaic efficiency, energy conversion efficiency and long-term stability is proposed, and the Proximal Policy Optimization (PPO) algorithm is adopted to realize online dynamic regulation. Verified by simulation training and real flow electrolyzer experiments, the results show that the reinforcement learning optimization strategy enables the Faradaic efficiency of CO to reach , the Faradaic efficiency of product to reach 68.7%, the energy conversion efficiency to be increased to 56.3%, and the long-term operation stability to exceed 120 h, which are 27.1, 27.2 and 17.6 percentage points higher than those under traditional fixed conditions respectively. Furthermore, dynamic and fast switching between CO and products is realized (response time < 5 min, selectivity stabilized above 85%), and the microscopic mechanism of regulation is revealed through the analysis of the dynamic evolution of intermediate coverage. Multi-objective Pareto frontier analysis verifies the flexibility of the framework in the efficiency-selectivity trade-off. The work in this paper breaks through the limitations of traditional static optimization, provides a new method and new paradigm for the intelligent and real-time regulation of electrocatalytic reduction reaction, and has important theoretical significance and engineering application value for promoting the efficient resource utilization of under the background of carbon neutrality.