With the development of music visualization, people are more and more likely to accept it, and people unconsciously form a different concept of music art. Among them, image processing technology is an important part of music visualization design. In audio multi feature extraction, preprocessing is the first step of speech signal analysis. After a certain preprocessing process, the sound signal can be converted into a data format that can be processed by algorithms and computers. The specific steps are MIDI file note extraction, theme extraction, segment segmentation, and segment feature extraction. In the overall design of music visualization, this paper proposed two design schemes for different music forms. One is for different music characteristics, and the other is to use a single or combined way to enrich the music visual effect. Different main and secondary melodies (pitch, beat, speed, time fractal dimension, etc.) would produce different visual effects. The classification accuracy of metal music, classical music and folk music in this paper was 93%, 92.7% and 93.3% respectively. The visualization method mentioned in this paper is more targeted and has better implementation effect.