作者：Zeqi Xiao Wenwei Zhang Tai Wang Chen Change Loy Dahua Lin Jiangmiao Pang
DEtection TRansformer (DETR) started a trend that uses a group of learnablequeries for unified visual perception. This work begins by applying thisappealing paradigm to LiDAR-based point cloud segmentation and obtains a simpleyet effective baseline. Although the naive adaptation obtains fair results, theinstance segmentation performance is noticeably inferior to previous works. Bydiving into the details, we observe that instances in the sparse point cloudsare relatively small to the whole scene and often have similar geometry butlack distinctive appearance for segmentation, which are rare in the imagedomain. Considering instances in 3D are more featured by their positionalinformation, we emphasize their roles during the modeling and design a robustMixed-parameterized Positional Embedding (MPE) to guide the segmentationprocess. It is embedded into backbone features and later guides the maskprediction and query update processes iteratively, leading to Position-AwareSegmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impelthe queries to attend to specific regions and identify various instances. Themethod, named Position-guided Point cloud Panoptic segmentation transFormer(P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQon SemanticKITTI and nuScenes benchmark, respectively. The source code andmodels are available at https://github.com/SmartBot-PJLab/P3Former .