Hypertension risk identification has strong practical significance for occupational health monitoring and cardiovascular early warning. Based on the NHANES database, this paper screens the samples of textile practitioners, and constructs an R language computational analysis framework around the association between blood metabolites and hypertension. In the data processing stage, occupational information matching, missing value imputation, outlier correction, variable standardization, hypertension label coding and class imbalance correction were completed. In the feature recognition stage, a three-stage process of “correlation constraint-sparse projection-stable screening” is used to realize high-dimensional metabolite compression and recombination. In the classification stage, a double hidden layer discriminative model is constructed, and logistic regression, random forest and gradient boosting models are set as controls. The results showed that 1234 samples were included, of which 456 cases were hypertension group, accounting for 36.9%. After feature optimization, the accuracy of the optimal model is 86.2%, the AUC is 91.4%, and the F1 value is 83.6%, which is better than that of the unoptimized model. The results show that the proposed method can enhance the classification and discrimination ability of hypertension while maintaining computational efficiency, and provide a reusable data modeling path for occupational health risk identification of textile practitioners. At the same time, it can provide computational support for occupational sample screening stratification and risk early warning.
Povzetek: Na podlagi baze NHANES je bila izbrana skupina tekstilnih delavcev za vzpostavitev postopka prepoznavanja hipertenzije s krvnimi metaboliti v jeziku R. Postopek je vključeval imputacijo manjkajočih vrednosti, kodiranje oznak, uravnoteženje razredov, izbiro značilk in klasifikacijsko modeliranje. V analizo je bilo vključenih 1234 primerov; delež hipertenzivne skupine je bil 36,9 %, nehipertenzivne pa 63,1 %. Najboljši model je dosegel natančnost 86,2 %, AUC 91,4 % in F1 83,6 %, kar je bilo boljše od neoptimiziranega modela. Metoda kaže dobro klasifikacijsko sposobnost in podpira prepoznavanje tveganja za hipertenzijo pri tekstilnih delavcih。