Landscape image refers to the integrated manifestation of public cognition and affect formed through the memory and imagination of landscape spaces. Existing research has primarily emphasized the material dimension of urban spaces, while giving insufficient attention to their emotional dimension and lacking a thorough examination of the internal mechanism by which landscape images are constructed. Drawing on the cognition–emotion theoretical framework, this study develops a multidimensional construct of landscape image. By combining questionnaire surveys with cognitive mapping, we reveal the characteristics and spatial distribution patterns of landscape images in Harbin’s historic districts, and further identify the hierarchical mechanism through which cognition and emotion jointly shape landscape image formation. The results show that landscape images in Harbin’s historic districts are characterized by: a strong presence of folk culture, distinctive architectural forms, high commercial vitality, relatively open boundaries, strong emotional attachment, and a high level of place identity. Spatially, landscape images exhibit a “dense-west and sparse-east” pattern, with clusters concentrated in Nan’er and Nansan subdistricts. Cognitive image exerts a stronger influence on overall landscape image than emotional image; stronger cognitive perceptions enhance emotional responses, and emotional image functions as the key mediator through which cognitive image is transformed into landscape image.