3D-Aware Object Goal Navigation
via Simultaneous Exploration and Identification

CVPR 2023


Jiazhao Zhang1,2*, Liu Dai3*, Fanpeng Meng4, Qingnan Fan5, Xuelin Chen5, Kai Xu6, He Wang1†

1 CFCS, Peking University   2 BAAI   3 CEIE, Tongji University   4 Huazhong University of Science and Technology  
5 Tencent AI Lab   6 National University of Defense Technology  

* equal contributions   corresponding author  


pipeline

Teaser. We present a 3D-aware ObjectNav framework along with simultaneous exploration and identification policies: A$\rightarrow$B , the agent was guided by an exploration policy to look for its target; B$\rightarrow$ C , the agent consistently identified a target object and finally called STOP.


Abstract


Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.


Methods


input

We take in a posed RGB-D image at time step $t$ and perform point-based construction algorithm to online fuse a 3D scene representation ($M_{3D}^{(t)}$), along with a $M_{2D}^{(t)}$ from semantics projection. Then, we simultaneously leverage two policies, including a corner-guided exploration policy $\pi_e$ and category-awre identification policy $\pi_f$, to predict a discrete corner goal $g_e^{(t)}$ and a target goal $g_f^{(t)}$ (if exist) respectively. Finally, the local planning module will drive the agent to the given target goal $g_f^{(t)}$ (top priority) or the corner goal $g_e^{(t)}$.


Online points fusion*

input

Left: A robot takes multi-view observations during navigation. Right: The points $p$ are organized by dynamically allocated blocks $B$ and per-point octrees $O$, which can be used to query neighborhood points of any given point.




Visualization


input

More results can be found in our paper.


Video





Team


Image
Image
Image
Image
Image
Image
Image
Jiazhao Zhang1,2*
Liu Dai3*
Fanpeng Meng4
Qingnan Fan5
Xuelin Chen5
Kai Xu6
He Wang1†

1 CFCS, Peking University   2 BAAI   3 CEIE, Tongji University   4 Huazhong University of Science and Technology   5 Tencent AI Lab   6 National University of Defense Technology  
* equal contributions   corresponding author  


Citation



@article{zhang20223d,
  title={3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification},
  author={Zhang, Jiazhao and Dai, Liu and Meng, Fanpeng and Fan, Qingnan and Chen, Xuelin and Xu, Kai and Wang, He},
  journal={arXiv preprint arXiv:2212.00338},
  year={2022}
}

Contact


If you have any questions, please feel free to contact Jiazhao Zhang at zhngjizh_at_gmail_dot_com, Liu Dai at dailiu_dot_cndl_at_gmail_dot_com, and He Wang at hewang_at_pku_dot_edu_dot_cn