This paper constructs an efficient image generation model based on a hybrid LG-MLP multilayer perceptron. First, we propose a hybrid generator architecture integrating convolutional neural networks and multilayer perceptrons to achieve high-quality image generation within the WGAN-GP framework. Furthermore, the LG-MLFormer network is designed, integrating a local-global MLP encoder with a cross-domain memory-enhanced decoder to effectively enhance visual feature fusion and language generation capabilities. Experiments on the MS COCO dataset demonstrate that the proposed model significantly outperforms mainstream baselines in image realism. Under HED edge conditions, the model achieves FID scores of 6.12 and 6.32 for consistent and inconsistent scenarios respectively, outperforming comparators like ControlNet (7.03, 8.41). Under Midas depth map conditions, FID further decreases to 4.46 and 4.54, demonstrating superior cross-condition generalization. The design group employing the LG-MLP method achieved average scores of 4.73, 4.87, and 4.75 in creativity, final product quality, and expert/user satisfaction, respectively, significantly surpassing the control group. The study demonstrates that the proposed generative model not only exhibits strong technical performance but also holds high application potential and practical value in real-world creative design.