Gastrointestinal cancer, as a highly prevalent malignant disease worldwide, early and accurate identification of precancerous lesions is the key to improve the survival rate of patients. In this paper, we propose a gastrointestinal precancerous lesion recognition framework that integrates multi-task learning and improved SSD, and improves the model performance through data preprocessing optimization and algorithm innovation. The study is based on white light gastroscopy image data, and a standardized dataset is constructed after pathological biopsy confirmation. For the medical image noise problem, the denoising process of YOLOv3 detection combined with median filtering and Sym4 wavelet transform is adopted. At the algorithmic level, the improved SSD model introduces semantic segmentation branching and recursive pyramid network (RFPN), enhances the shallow semantic expression ability by fusing the Conv4_3, Conv7, and Conv8_2 layers of features, and extracts multi-scale information by combining with the modified receptive field module (RFB), which significantly improves the accuracy of the detection of small lesions. Focal Loss is adopted to alleviate the category imbalance problem, and weighted cross entropy and IoU loss are integrated to achieve multi-task co-optimization. Experiments show that the global fine-tuning strategy is significantly better than the local fine-tuning. r150×3 network, for example, the global fine-tuning has an accuracy of 98.52%, an F1 score of 94.73%, a specificity of 99.44%, and an AUC value of 0.996. Comparing with the traditional models (VGG19, ResNet50, and Inception-V3), the performance of the improved SSD in the identification of the three categories of lesions is the The best, gastric cancer recognition accuracy rate of 98.87%, false judgment rate of only 1.69%, single image recognition time is only 0.05s, efficiency than manual diagnosis to improve a hundred times. The model has the strongest ability to locate bulging lesions, with an accuracy of 90.29% when the overlap degree is ≥60%, and the localization accuracy of flat lesions is lower, which needs to be further optimized for texture feature extraction.