POTTER:用于高效人体网格恢复的集中注意力转换器 POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

作者:Ce Zheng Xianpeng Liu Guo-Jun Qi Chen Chen

Transformer架构在从单目图像中恢复人体网格(HMR)方面实现了SOTA性能。然而,性能的提高是以大量内存和计算开销为代价的。现实世界的应用需要一个轻量级和高效的模型来重建精确的人体网格。在本文中,我们提出了一种名为POoling aTtenten transformer(POTTER)的纯转换器架构,用于单图像的HMR任务。注意到传统的注意力模块是内存的,并且计算成本很高,我们提出了一种高效的池化注意力模块,它在不牺牲性能的情况下显著降低了内存和计算成本。此外,我们通过为HMR任务集成高分辨率(HR)流,设计了一种新的转换器架构。来自HR流的高分辨率局部和全局特征可以用于恢复更精确的人体网格。我们的POTTER只需要7

Transformer architectures have achieved SOTA performance on the human mesh recovery (HMR) from monocular images. However, the performance gain has come at the cost of substantial memory and computational overhead. A lightweight and efficient model to reconstruct accurate human mesh is needed for real-world applications. In this paper, we propose a pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images. Observing that the conventional attention module is memory and computationally expensive, we propose an efficient pooling attention module, which significantly reduces the memory and computational cost without sacrificing performance. Furthermore, we design a new transformer architecture by integrating a High-Resolution (HR) stream for the HMR task. The high-resolution local and global features from the HR stream can be utilized for recovering more accurate human mesh. Our POTTER outperforms the SOTA method METRO by only requiring 7% of total parameters and 14% of the Multiply-Accumulate Operations on the Human3.6M (PA-MPJPE metric) and 3DPW (all three metrics) datasets. The project webpage is https://zczcwh.github.io/potter_page.



Related posts