NS3D:三维物体和关系的神经符号基础 NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

作者:Joy Hsu Jiayuan Mao Jiajun Wu


Grounding object properties and relations in 3D scenes is a prerequisite fora wide range of artificial intelligence tasks, such as visually groundeddialogues and embodied manipulation. However, the variability of the 3D domaininduces two fundamental challenges: 1) the expense of labeling and 2) thecomplexity of 3D grounded language. Hence, essential desiderata for models areto be data-efficient, generalize to different data distributions and tasks withunseen semantic forms, as well as ground complex language semantics (e.g.,view-point anchoring and multi-object reference). To address these challenges,we propose NS3D, a neuro-symbolic framework for 3D grounding. NS3D translateslanguage into programs with hierarchical structures by leveraging largelanguage-to-code models. Different functional modules in the programs areimplemented as neural networks. Notably, NS3D extends prior neuro-symbolicvisual reasoning methods by introducing functional modules that effectivelyreason about high-arity relations (i.e., relations among more than twoobjects), key in disambiguating objects in complex 3D scenes. Modular andcompositional architecture enables NS3D to achieve state-of-the-art results onthe ReferIt3D view-dependence task, a 3D referring expression comprehensionbenchmark. Importantly, NS3D shows significantly improved performance onsettings of data-efficiency and generalization, and demonstrate zero-shottransfer to an unseen 3D question-answering task.



Related posts