Anxiety disorder is a highly prevalent mental illness, and its current diagnosis mainly relies on symptomatology standards, which can lead to missed diagnosis and misdiagnosis. There is a lack of reliable imaging biomarkers and efficient recognition methods. To solve this problem, this paper proposes a brain image multi head attention CNN Transformer modeling method for anxiety disorder recognition. Firstly, patients with anxiety disorders (GAD and PD) and healthy individuals were selected as the research subjects, and 3.0T magnetic resonance imaging data was collected. After preprocessing, multi-modal features of ALFF, GMV, and FA were extracted and feature maps were constructed; Secondly, design a simplified multi head attention CNN Transformer model that removes the decoder and only optimizes the encoder, combined with Adam optimizer and dropout regularization to improve model performance; Finally, the effectiveness of the model was validated through baseline comparison experiments and ablation experiments. The experimental results show that the proposed model performs stably on three datasets, with accuracy rates of 86.0% and 86.9% in datasets 2 and 3, respectively, significantly better than the baseline algorithm; The ablation experiment showed that after removing CNN or multi head attention modules, the maximum decrease in accuracy reached 13.3%, and each module is crucial to the performance of the model. Research has shown that the proposed model can effectively capture anxiety disorder features in brain imaging, with good effectiveness and generalization, and can provide reliable technical support for early diagnosis of anxiety disorders.