This study constructs a target language difference mining model that integrates principal component analysis, XGBoost cluster identification, and multi-output regression. Based on six-dimensional feature encoding, the model performs structural characterization, category discrimination, and effect estimation for English, Vietnamese, and Thai learner groups.The model uses national identity and international outlook as joint outputs. Comparative results show that the Vietnamese group’s overall mean scores for national identity and international outlook were 4.97 and 4.86, respectively, higher than those of the English group (4.91 and 4.69) and the Thai group (4.90 and 4.71);significant differences were found in the cultural identity dimension (F=5.696, p=.004), and similarly in the dimension of attention to international/regional issues (F=3.778, p=.024).