Adaptive learning systems frequently carry out optimization on correctness and completion, meanwhile they neglect the changes of learners’ self-efficacy. This research establishes a reinforcement learning frame which together models learning condition, mission hardness, feedback category, and support strength, and makes them align through a multi-goal reward that balances self-efficacy, achievement, and perseverance. In one 8-week Python small course which has 228 undergraduate students and 21,864 interactive records, the RL group displayed stronger results, which include a self-efficacy increase of 0.49, post-test correct rate of 87.6%, a finish rate of 94.2%, and a dropout rate that is 11.3% lower; The subgroup which has low level before SE obtained an effect size with the value of d=0.74. Further deeper analysis brings forward the idea that continuous challenge, motive-related feedback, and rule-based control can assist in the enhancement of successful experiences, self-confidence, and the stability of learning. These results give support to a route which has more interpretability and extensibility for intelligent education intervention.