To solve the prediction accuracy is not enough because the energy consumption sequence of the tobacco industry has a high degree of volatility and multi-modality. This study proposes an integrated prediction model based on improved hierarchical sampling and extreme random forest to overcome the inherent high volatility and multi-modal characteristics of energy consumption sequences in the tobacco industry that cause low accuracy in predicting models. During data processing, we developed a more advanced hierarchical sampling method which included multi-dimensional combination stratification and key event weighting, improving sample quality. Model building, traditional random forests go through dual progressive optimization, first adding weighted feature selection then building up a segmental point randomizing mechanism for making it into an extremely randomized tree. In the validation of the sampling method, improved hierarchical sampling covered 95.5% of important events with only 218 samples, but just 56.1% were covered by simple random sampling. Energy consumption forecast using the extreme random forest model predicted about 5850 kW·h around day 10 during the energy consumption trough, which was close to the real 5900 kW·h. Production Day – Heating Season operating condition. The AAE of the Final Energy Consumption Prediction Model is 6.1%. The proposed model captures energy consumption dynamics in complicated tobacco plant operations, providing technical support for companies transitioning from empirical scheduling to data-driven precise energy management decisions.