In order to solve the problems of rough state identification, static teaching intervention and insufficient feedback utilization in the process of improving students ‘artistic literacy in vocational colleges, this paper constructs a Markov decision process model for continuous teaching scenarios. Based on classroom behavior, work performance, process revision and reflection text, this study established the mapping relationship between student state representation, teaching action set and long-term reward function, and realized the optimization of intervention path through strategy learning. Experimental results show that the proposed model is superior to the empirical rule group, SVM group and MLP group in terms of average cumulative return, strategy stability and action coverage. After 12 weeks of teaching, students ‘comprehensive artistic literacy score is increased to 76.4 points, which is 14.9 points higher than the initial level. This method provides a computable and iterative teaching decision path for the cultivation of artistic literacy.