site.btaINSAIT Unveils World’s First Generative Model for Understanding Photorealistic 3D Content


The Institute of Computer Science, Artificial Intelligence and Technology (INSAIT) at Sofia University has introduced the world’s first generative model named GaussianVLM, designed to combine computer vision and natural language processing for understanding photorealistic 3D content, the university’s press center said on Friday.
Just one week after publication, the scientific paper describing the model ranks among the ten most-read worldwide according to the Scholar Inbox ranking, reflecting significant international academic interest.
GaussianVLM enables robotic systems to analyze real three-dimensional scenes based on ordinary video footage captured with a consumer camera, without requiring specialized hardware.
The model can answer questions such as “What is on the table?” or “Are there enough seats for all the guests?”, demonstrating an understanding of the overall spatial and semantic structure of the environment, the university explained.
It is the first model to support questions without predefined linguistic constraints and can effectively process large-scale 3D scenes. A major innovation is the compression of visual information, from over 40,000 elements down to just 132 tokens, allowing fast and efficient processing by large language models, the university added.
/PP/
news.modal.header
news.modal.text