Deep Learning Ensemble for Automated Pallet Counting: A Multi-Target Regression Approach with Explainable AI

Deep Learning Ensemble for Automated Pallet Counting: A Multi-Target Regression Approach with Explainable AI

Pallet counting in warehouses demands accuracy and speed under heavy occlusions and lighting variability.

We present a multi-target regression approach that predicts CHEP and EPAL pallet counts in a single forward pass and derives total pallets count as:

\[\hat{y}_\mathrm{total}=\hat{y}_\mathrm{CHEP}+\hat{y}_\mathrm{EPAL}\]

We fine-tune three CNNs:

  • EfficientNet-B3
  • ResNet-50
  • ConvNeXt-Tiny

with an AdamW optimizer and a one-cycle schedule on $224{\times}224$ inputs.

Augmentations include resized crops, flips, affine transforms, color jitter, blur, JPEG noise, coarse dropout, and ImageNet normalization. We compute the SmoothL1 loss on the total pallet count, which is obtained by summing the predicted CHEP and EPAL counts.

Final predictions are the mean of the three models, leveraging architectural diversity. Grad-CAM provides heatmaps for interpretability (edges, stack boundaries, CHEP color cues).

The dataset consists of 130 proprietary warehouse images, with an 80-20 train-validation split.

The ensemble achieves MAE 1.151 and $R^2=0.875$ for total pallets, with individual backbones excelling on CHEP (MAE 0.721) and EPAL (MAE 1.009) types.

Takeaway: Different backbones specialize on different pallet types; the ensemble captures complementary cues and improves overall accuracy while remaining efficient for near real-time use.

Simone De Giorgi

Simone De Giorgi

MSc in Economics at Bocconi University