DDT:一种基于扩散驱动变压器的视频人体网格恢复框架 DDT: A Diffusion-Driven Transformer-based Framework for Human Mesh Recovery from a Video

作者:Ce Zheng Guo-Jun Qi Chen Chen


Human mesh recovery (HMR) provides rich human body information for variousreal-world applications such as gaming, human-computer interaction, and virtualreality. Compared to single image-based methods, video-based methods canutilize temporal information to further improve performance by incorporatinghuman body motion priors. However, many-to-many approaches such as VIBE sufferfrom motion smoothness and temporal inconsistency. While many-to-one approachessuch as TCMR and MPS-Net rely on the future frames, which is non-causal andtime inefficient during inference. To address these challenges, a novelDiffusion-Driven Transformer-based framework (DDT) for video-based HMR ispresented. DDT is designed to decode specific motion patterns from the inputsequence, enhancing motion smoothness and temporal consistency. As amany-to-many approach, the decoder of our DDT outputs the human mesh of all theframes, making DDT more viable for real-world applications where timeefficiency is crucial and a causal model is desired. Extensive experiments areconducted on the widely used datasets (Human3.6M, MPI-INF-3DHP, and 3DPW),which demonstrated the effectiveness and efficiency of our DDT.



Related posts