With the development of computer vision and multi-modal perception, the dynamic posture assessment of athletes has become the research content in the field of intelligent sports analysis. In order to build a posture assessment system, this paper designs a deep vision detection module to locate athletes and extract skeleton key points in competition videos. Then, the inertial measurement, plantar pressure and joint Angle signals were synchronized by the time alignment strategy, and fused by the gated attention fusion network. The system is trained with 38640 synchronized samples from 86 athletes in sprint, basketball, football and combat training. Experimental results show that the visual branch of FusionNet is stable in complex scenes, the AP of clear scenes is 97.2%, and the AP of crowded scenes is 86.4%. Compared with the single-modal recognition method, the fusion model achieves a better classification effect, with a recall rate of 94.87%, a precision rate of 95.31%, and a F1 value of 95.09%. The overall posture evaluation accuracy reaches 96.18%, and the average inference delay remains at 31.6 ms, which supports real-time athlete evaluation in the deployment phase.