This study constructs a street spatial imagery elements dataset as the research object. A visual hyperbolic spatial model based on deep learning and supervised learning is established, as well as a street architectural landscape classification model ResNet and a street element analysis model DeepLabV3+. The models are used for feature extraction and semantic segmentation of street spatial imagery to improve the classification accuracy while preserving the detailed features, and to lay a data foundation for the subsequent calculation of Pearson correlation coefficients between street spatial imagery elements and visual perceptual experience. According to the calculation results, a total of 19 street spatial imagery elements in 6 categories are adjusted and optimized. The correlation of the elements in the positive perception dimension is increased to more than 0.9, that of the neutral perception dimension is increased to more than 0.8, and that of the negative perception dimension is decreased to less than 0.7, and the optimized spatial perceptual experience is biased in the positive direction.