As one of the needs of computer vision, image content recognition is the process of identifying objects and scenes in an image through learning visual representations. A fixed ResNet50 backbone and multi-task learning for the contents of images are proposed as the hybrid deep learning model in this paper. The model is also used to regress features and classify images by way of shared representation learning. The Office-Home dataset is employed for the experimental work because it comprises 65 kinds of objects and several visual domains. The classification accuracy and F1 score of the proposed framework are about 74% and 0.73, respectively; thus, good feature learning and domain adaptation have been achieved. Ablation experiments have confirmed that the proposed strategies of data augmentation, depth of backbones and stabilisation during training (such as early stopping and learning rate schedules) are indeed effective. Although the regression branch is less semantically interpretable due to the use of one-hot targets, all the results show that the proposed framework will provide a good and effective basis for tasks requiring the understanding of image content. Based on the above results, we can analyze the strengths and weaknesses of learning about multi-task in real-world applications of computer vision.