This paper proposes a deep learning framework for feature extraction and architectural form planning guidance of city blocks. Street view images, remote sensing Windows, building contours, and facade labels from 126 sampling blocks are registered, and a data set consisting of 12,840 street view images, 126 groups of remote sensing Windows, 38 building form indicators, and 6 types of style labels is constructed. The model uses a dual-vision branch encoder and a morphological representation module to extract facade texture, roof contour, height rhythm and interface continuity information, and then maps the style representation into planning control parameters. Experimental results show that the Accuracy of the model in the landscape feature extraction task reaches 94.1%, and Macro-F1 reaches 0.912. In the architectural form planning guidance experiment, the guidance consistency of the proposed method reaches 91.6%, the boundary satisfaction reaches 93.3%, the block coordination reaches 90.9%, and the form deviation is 18.7%. The results show that the framework can transform the visual features of blocks into computable planning basis, and provide stable technical support for planning and design.