Outline

Ingegneria Sismica

Ingegneria Sismica

Counting Optimization of a Spatio-Temporally Decoupled YOLOv8 Model in Scenes with Dense Pods

Author(s): Han Yang1
1School of Liberal Arts and Sciences, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
Yang, Han. “Counting Optimization of a Spatio-Temporally Decoupled YOLOv8 Model in Scenes with Dense Pods.” Ingegneria Sismica Volume 43 Issue 3: 1-19, doi:10.65102/is20261241.

Abstract

To improve counting accuracy in dense soybean pod scenes under small-object occlusion, overlap, and repeated response conditions, this study proposes a spatiotemporal feature-decoupled improved YOLOv8 model that differs from detection-then-tracking counting methods. In this paper, “spatiotemporal decoupling” is defined as encoding soybean pod boundaries, contour gradients, and neighborhood occlusion relationships within the detection network as spatial structural representations, while encoding cross-frame center displacement, scale fluctuation, and short-term visibility variation as temporal association representations. Before the detection head, gated fusion is used to calibrate candidate box confidence and constrain counting bias. Unlike post-processing methods such as DeepSORT and ByteTrack, which rely on detection results for trajectory association, the temporal branch in the proposed method directly participates in candidate generation, candidate filtering, and quantity regression, allowing dense target responses to be corrected before NMS over-suppression and short-term missed detections occur. To address the susceptibility of conventional YOLOv8 to single-frame texture interference, weakened slender pod boundaries, and candidate drift in highly overlapping regions, the model constructs a spatial structural branch and a temporal association branch, and further introduces a P2 fine-grained fidelity branch, multi-scale semantic fusion, candidate target filtering constraints, and repeated-counting and missed-counting bias correction methods. On this basis, the model establishes a joint optimization strategy using localization loss, quantity regression loss, and temporal consistency loss. Experimental results show that the improved model achieves MAE/RMSE/F1 values of 4.2/6.8/0.91, 3.1/5.0/0.94, and 6.4/8.9/0.88 on the self-built soybean field dataset, PlantCrop subset, and occlusion-enhanced synthetic sequence, respectively, significantly reducing counting errors compared with the YOLOv8n baseline.The model operates at 51.7 FPS with a single-frame inference time of 19.3 ms on an NVIDIA RTX 4090 platform, meeting the real-time requirements of field counting.

Keywords
pod counting; YOLOv8; spatio-temporal feature decoupling; small object detection; counting optimization

Related Articles

Liqin Zheng1, Dongrui Qing2, Yan Zhang1
1School of Mathematics and Statistics, Shaan Xi Xue Qian Normal University Xi’an 710100, P.R.China
2School of Marxism, Xi’an University of Finance and Economics Xi’an 710100, P.R.China
Yanan Gao1, Aiqun Peng2, Nina Ma2
1Management School of Anhui Business and Technology College Hefei 230000, Anhui, China
2Economics and Trade School of Anhui Business and Technology College Hefei 230000, Anhui, China
Ya’ning Liu1, Ping Ma1
1School of Teacher Education, Shihezi University, Shihezi, Xinjiang, 832000, China
Yuhui Li1, Zhongliang Gong1
1College of Mechanical and Intelligent Manufacturing, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
Hanqing Hu1, Chengjin Liu1, Tianmu Tian1
1School of Management Science and Engineering, Beijing Information Science & Technology University, Beijing 100192