In this paper, the micro-inertial sensor (MEMS) is used to capture the finger movement changes of the piano player, and the features extracted from them are used to form an improved multi-scale deep learning network to obtain better image fine-grained description ability and realize the recognition of the piano player’s fingerwork. Finally, the perception and recognition function of piano playing gesture were realized by using the wearable piano playing glove. In addition, the speed of touching the key is analyzed to study its influence on sound quality. The experimental results show that our algorithm can solve the problems of low regularity, high variability and sudden change of hand movement pattern in piano playing, and has a recognition accuracy of higher than 99%. At the same time, speed and strength are the main control parameters, which directly affect the timbre change produced by the tapping technique, and all tapping techniques are in subtle changes to these factors.