Aiming at the diversity of clothing patterns and the complexity of design, in order to better understand the creative expression of clothing patterns, this paper utilizes the CLIP text encoder to preprocess textual information labels. Combined with the text requirements, the potential diffusion model is selected to generate simple tuples and complex patterns of clothing patterns. Stable diffusion model is selected and diffusion inversion method is used to alleviate the limitation of text shape. Based on the shape grammar theory, the samples of clothing patterns generated based on diffusion inversion are varied to realize pattern redesign. Select public datasets and multiple pattern generation models, and compare the results of FID scores, IS scores and other indexes of each modeling method. For the clothing pattern samples generated based on diffusion inversion, the amateur group and the expert group are invited to score the samples in four dimensions, namely, image quality, overall aesthetics, interpretation of the pattern to the textual content, and overall coordination between the pattern and the text, respectively. On the MSCOCO dataset, the FID scores of the diffusion inversion-based method for generating garment image samples decreased by 41.52%, 44.97%, 49.42%, and 52.87% compared to the VLMGAN model, DM-GAN model, SSA-GAN model, and AttnGAN model, respectively. On the CUB-200 dataset, this paper’s method improves the IS score by 30.83% compared with the SSA-GAN model. Combining the comparison results of each index and the subjective evaluation results, the clothing pattern samples generated based on diffusion inversion fit the text requirements, have good image quality and are generally well received.