With the continuous growth of the demand for complex terrain operations, the efficient perception, smooth switching and stable control of wheel-legged robots in unstructured environments have become the research focus. Focusing on the problem of multimodal motion switching and cooperative control of wheeled legged robots on complex terrain, this paper constructs a system modeling and kinematic analysis framework, fuses stereo vision, IMU, encoder, wheel speed meter and foot contact information, and designs a multimodal environment perception and feature fusion method. A motion mode switching mechanism based on switching demand function, dual-threshold hysteresis determination and smooth trajectory transition is proposed, and a wheel-leg cooperative control strategy for whole-body stability constraint is constructed. The experimental results show that the terrain recognition accuracy of the proposed method reaches 97.4%, and the reasoning time is 18.7 ms. The comprehensive passing rate on four types of complex terrain reaches 94.6%, and the average passing time is 12.8 s. Under the disturbance condition, the maximum attitude deviation is controlled within 6.4°, the recovery time is shortened to 1.8 s, and the task completion rate reaches 95.4%. The results show that the proposed method can effectively improve the continuous passing ability and operation robustness of the wheel-legged robot in complex terrain, and has practical significance for promoting the intelligent development of autonomous mobile equipment in complex environments.