This study proposes a set of multidimensional perception shaping framework for urban architectural landscape integrating computer vision, spatial analysis and color theory, aiming to provide decision support for landscape protection and sustainable development. Firstly, on the basis of Mask-RCNN, k-means algorithm is used to cluster the anchor frames to improve the accuracy of detection. Then the FCN semantic segmentation branch is improved to improve the accuracy of urban building mask and realize the accurate recognition of different buildings. Then based on spatial analysis techniques such as kernel density and spatial syntax, the spatial characteristics of urban architectural style are revealed. Based on the color partitioning method, the color level of Beijing urban buildings is divided into regions, and the color level data of Beijing urban buildings are mined and analyzed. The improved Mask-RCNN building recognition model achieves an average accuracy mean of 91.36% compared with the current mainstream deep learning network. The spatial features show that the Second Ring and Third Ring are the core areas of urban architectural landscape and the highest road network integration area, respectively, which better constructs the skeleton of landscape perception. The color analysis shows that the dominant colors of roofs and walls are concentrated in the yellow-red system, which provides a sustainable method for the conservation and renewal of urban buildings and historical coordination.