With the breakthrough progress of the large language model in the field of natural language processing, the LLM-based intelligent body technology is gradually moving from theoretical research to practical application. The study first clarifies the complex task decomposition representation, takes reinforcement learning and multi-intelligent body reinforcement learning as the theoretical basis, elaborates the principle of LLaMA large language model, then introduces the PPO algorithm for strategy optimization, obtains the reward signal by interacting with the environment, and uses the dominance function and the pruning strategy to ensure the stability of the training and the convergence, realizing the application of large language model in the task decomposition. Finally, experiments are conducted in home and warehouse task scenarios for analysis. The results show that the performance of the model after fine-tuning using the PPO algorithm is significantly improved, and the average reward value during task decomposition is increased from 0.141 to 0.780, and the efficiency and stability of task decomposition are better than the original model. In the A/B test, LLaMA-PPO has a 3.46% improvement, which shows that the algorithm’s training speed and final performance have been improved while the task decomposition has been automated efficiently.