site.btaBulgaria's INSAIT Releases World’s Largest Open-Source 3D Dataset and Benchmark for Language-Aware AI Systems
INSAIT, part of Sofia University, together with leading international research institutions, said on Tuesday that it has released SceneSplat-49k, the largest open-source collection of high-quality, complex 3D scenes in Gaussian Splatting format, along with SceneSplat-Benchmark, a comprehensive evaluation benchmark for Language Gaussian Splatting.
This work is the result of a collaboration between INSAIT, the University of Amsterdam, ETH Zurich (Computer Vision Lab), Nanjing University of Aeronautics and Astronautics, Johns Hopkins University, the University of Pisa, and the University of Trento. The project represents an important step toward the next generation of 3D vision-language systems, with applications in robotics, virtual and augmented reality, and human-centered AI.
SceneSplat-49k comprises 48,856 reconstructed indoor and outdoor scenes, of which 12,061 scenes are enriched with language features. The dataset was produced through extensive human effort and computational resources amounting to 861 GPU-days, ensuring high realism and diversity of real-world environments.
Language Gaussian Splatting enables natural language interaction within immersive 3D environments, allowing models to reason about spatial relationships and semantic concepts directly in three dimensions. Until now, progress in this field has been constrained by the absence of large-scale, high-quality 3D datasets and standardized evaluation protocols.
To address this gap, SceneSplat-Benchmark introduces substantially more realistic and challenging evaluation settings. It covers 1,060 scenes and 325 semantic classes and evaluates models directly in 3D, rather than relying on 2D projections, enabling a more faithful assessment of 3D scene-level understanding.
This work is the result of a collaboration between INSAIT, the University of Amsterdam, ETH Zurich (Computer Vision Lab), Nanjing University of Aeronautics and Astronautics, Johns Hopkins University, the University of Pisa, and the University of Trento. The project represents an important step toward the next generation of 3D vision-language systems, with applications in robotics, virtual and augmented reality, and human-centered AI.
/DS/
news.modal.header
news.modal.text