SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning

Ziwei Chen1*, Ziling Liu1*, Zitong Huang1*, Mingqi Gao1,2, Feng Zheng1,3†
1Southern University of Science and Technology 2University of Sheffield 3Spatialtemporal AI
*Equal Contribution   Corresponding Author
Paper Cover

Overview of SCORP. Left: Object segmentation & proxy synthesis. From multi-view inputs, we reconstruct an initial 3DGS scene, obtain text-prompted object masks, select informative views, and synthesize a proxy object for each target object with a generative model. Middle: Initial object registration. Each proxy is inserted into the scene via coarse alignment and matching-based pose adjustment using origin-proxy correspondences, yielding a scene with pre-registered targets. Right: Registration-constrained fine alignment. We perform scale-undistorted shape refinement followed by appearance refinement under pose constraints, producing fine-aligned objects that recover geometry and textures while preserving scene-level consistency. Bottom rows show the rendered scene after each stage.

Abstract

Viewpoint missing of objects is common in scene reconstruction, as camera paths typically prioritize capturing the overall scene structure rather than individual objects. This makes it highly challenging to achieve high-fidelity object-level modeling while maintaining accurate scene-level representation. Addressing this issue is critical for advancing downstream tasks requiring high-fidelity object reconstruction. In this paper, we introduce Scene-Consistent Object Refinement via Proxy Generation and Tuning (SCORP), a novel 3D enhancement framework that leverages 3D generative priors to recover fine-grained object geometry and appearance under missing views. Starting with proxy generation by substituting degraded objects using a 3D generation model, SCORP then progressively refines geometry and texture by aligning each proxy to its degraded counterpart in 7-DoF pose, followed by correcting spatial and appearance inconsistencies through registration-constrained enhancement. This two-stage proxy tuning ensures the high-fidelity geometry and appearance of the original object in unseen views while maintaining consistency in spatial positioning, observed geometry, and appearance. Across challenging benchmarks, SCORP achieves consistent gains over recent state-of-the-art baselines on both novel view synthesis and geometry completion tasks. SCORP is available at https://github.com/PolySummit/SCORP.

Qualitative comparison with other methods.

Interactive qualitative comparisons with other methods. The columns from left to right are resulted target objects from 3D Gaussian Splatting (3DGS), 2D Gaussian Splatting (2DGS), DNGaussian, GenFusion, and SCORP correspondingly. These results are all chosen from results of the medium difficulty setting.

Quantitative comparison on appearance.

Quantitative comparisons in the rendering quality are calculated on average in all scenes mentioned in the paper. The best, the second best, and the third best are highlighted.

app_quantitative

Quantitative comparison on geometry.

Quantitative comparisons in the geometry quality. The data listed represents the average values of the relevant data across 50 constructed scenarios.

geo_quantitative

BibTeX

@misc{chen2025scorpsceneconsistentobjectrefinement,
      title={SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning}, 
      author={Ziwei Chen and Ziling Liu and Zitong Huang and Mingqi Gao and Feng Zheng},
      year={2025},
      eprint={2506.23835},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.23835}, 
}