GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

CVPR 2024 (highlight)

1Shanghai AI Laboratory 2Fudan University 3Hong Kong University of Science and Technology
4Northwestern Polytechnical University 5Institute of Artificial Intelligence, China Telecom Corp Ltd *denotes equal contribution

Overview

In this paper, we introduce GS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussians in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.

Method Overview

Overview of the proposed method. We aim to use 3D Gaussians to represent the scene and use the rendered RGB-D image for inverse camera tracking. GS-SLAM proposes a novel Gaussian expansion strategy to make the 3D Gaussian feasible to reconstruct the whole scene. For camera tracking of every input frame, we derive an analytical formula for backward optimization with re-rendering RGB-D loss, and further introduce an effective coarse-to-fine technique to minimize re-rendering losses to achieve efficient and accurate pose estimation. GS-SLAM can achieve real-time tracking, mapping, and rendering performance on GPU.

Tracking Result

Result on Replica dataset                                                                                                           Result on TUM-RGBD dataset

Render Comparisons

Ours
NICE-SLAM
Ours
Co-SLAM
Ours
Point-SLAM
Ours
ESLAM

Result on Replica dataset. We outperform existing dense neural RGB-D methods on the commonly reported rendering metrics. Note that GS-SLAM achieves 386 FPS on average, benefitting from the efficient 3DGS scene representation.

BibTeX

@inproceedings{yan2023gs,
  author    = {Yan, Chi and Qu, Delin and Xu, Dan and Zhao, Bin and Wang, Zhigang and Wang, Dong and Li, Xuelong},
  title     = {GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting},
  booktitle = {CVPR},
  year      ={2024},
}