This research puts forward a multi-modal artificial intelligence frame for semantic mark and knowledge graph building of the theme of dead trees, bamboos and stones in Chinese scholar painting. A motif-directional dataset called WTBR-LP is built from open museum pictures, metadata files, names, carved words, description words, and expert labels. The marks explanation system includes three layers: vision things, pen skill shape, and culture meaning. The put-forward model incorporates open-vocabulary visual location-finding, area cutting, vision-language expression, metadata embedding, and graph-neighbor restrictions to produce region-level multi-label notes. The annotation outcomes are further changed into WTBR-KG, which is a source-conscious knowledge graph that connects art works, people, visual motifs, brush stroke characteristics, inscriptions, collection origins, and cultural concepts. In the early-stage experiment configuration, our put-forward method obtains a Macro-F1 of 0.864, mAP@0.5 of 0.803, Hits@1 of 0.912, and triple accuracy of 0.923. The work flow that is assisted by AI cuts down the time of expert checking from 8.70 minutes to 2.05 minutes on each single artwork. The obtained results show that multimodal fusing and graph restriction can promote the accuracy, interpretability, and traceability of semantic marking for scholar painting.