MV-JAR:用于基于激光雷达的自监督预训练的掩模体素拼图和重建 MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

作者:Runsen Xu Tai Wang Wenwei Zhang Runjian Chen Jinkun Cao Jiangmiao Pang Dahua Lin


This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR)method for LiDAR-based self-supervised pre-training and a carefully designeddata-efficient 3D object detection benchmark on the Waymo dataset. Inspired bythe scene-voxel-point hierarchy in downstream 3D object detectors, we designmasking and reconstruction strategies accounting for voxel distributions in thescene and local point distributions within the voxel. We employ aReversed-Furthest-Voxel-Sampling strategy to address the uneven distribution ofLiDAR points and propose MV-JAR, which combines two techniques for modeling theaforementioned distributions, resulting in superior performance. Ourexperiments reveal limitations in previous data-efficient experiments, whichuniformly sample fine-tuning splits with varying data proportions from eachLiDAR sequence, leading to similar data diversity across splits. To addressthis, we propose a new benchmark that samples scene sequences for diversefine-tuning splits, ensuring adequate model convergence and providing a moreaccurate evaluation of pre-training methods. Experiments on our Waymo benchmarkand the KITTI dataset demonstrate that MV-JAR consistently and significantlyimproves 3D detection performance across various data scales, achieving up to a6.3% increase in mAPH compared to training from scratch. Codes and thebenchmark will be available at .



Related posts