Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

High-resolution, wide-coverage, scene-level 3D Gaussian reconstruction in 1 second.
Accepted by ICCV 2025 (Highlight).

Ziwen Chen1, Hao Tan2, Kai Zhang2, Sai Bi2, Fujun Luan2, Yicong Hong2, Fuxin Li1, Zexiang Xu3

1Oregon State University
2Adobe Research 3Hillbot

Abstract

We propose Long-LRM, a feed-forward 3D Gaussian reconstruction model for instant, high-resolution, 360° wide-coverage, scene-level reconstruction. Specifically, it takes in 32 input images at a resolution of 960×540 and produces the Gaussian reconstruction in just 1 second on a single A100 GPU. To handle the long sequence of 250K tokens brought by the large input size, Long-LRM features a mixture of the recent Mamba2 blocks and the classical transformer blocks, enhanced by a light-weight token merging module and Gaussian pruning steps that balance between quality and efficiency. We evaluate Long-LRM on the large-scale DL3DV benchmark and Tanks&Temples, demonstrating reconstruction quality comparable to the optimization-based methods while achieving an 800× speedup w.r.t. the optimization-based approaches and an input size at least 60× larger than the previous feed-forward approaches. We conduct extensive ablation studies on our model design choices for both rendering quality and computation efficiency. We also explore Long-LRM's compatibility with other Gaussian variants such as 2D GS, which enhances Long-LRM's ability in geometry reconstruction.







Qualitative comparison with optimization-based Gaussian methods





Architecture of Long-LRM

Long-LRM takes 32 input images along with their Plücker ray embeddings as model input, which are then patchified into a token sequence. These tokens are processed through a series of Mamba2 and transformer blocks ({7M1T}×3). Fully processed, the tokens are decoded into per-pixel Gaussian parameters, followed by a Gaussian pruning step. The bottom section illustrates the resulting wide-coverage Gaussian reconstruction and photo-realistic novel view synthesis.

Zero-shot reconstruction examples on ScanNetv2

BibTeX

@inproceedings{ziwen2025llrm,
  title={Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats},
  author={Ziwen, Chen and Tan, Hao and Zhang, Kai and Bi, Sai and Luan, Fujun and Hong, Yicong and Fuxin, Li and Xu, Zexiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}